hlins-0.39/0040775000076400001440000000000007654273266011161 5ustar rtusershlins-0.39/source/0040755000076400001440000000000007654273266012457 5ustar rtusershlins-0.39/source/INSTALLATION0100664000076400001440000000146707654273266014312 0ustar rtusers Hlins Installation ================== To compile Hlins you need the Objective CAML compiler from INRIA which is available for most platforms. See the ocaml home page http://pauillac.inria.fr/caml/ To compile, do from the hlins main directory make configure; make The native code compiler is used when autoconf succeeds in finding it, otherwise (this occurs in particular on m68k architectures where a native code compiler is not available at time of this writing) the byte code compiler is used and a copy of the runtime system is included in the executable. Hence in neither of the two cases you need ocaml on your machine to execute hlins. To install (check the variables defining the target installation directories in doc/Makefile and source/Makefile.in) do make install Ralf Treinen [treinen@lri.fr] hlins-0.39/source/.depend0100664000076400001440000000254007654273266013717 0ustar rtusersaddr_lex.cmo: addr_lex.cmi addr_lex.cmx: addr_lex.cmi automaton.cmo: automaton.cmi automaton.cmx: automaton.cmi build_automaton.cmo: automaton.cmi names.cmi build_automaton.cmi build_automaton.cmx: automaton.cmx names.cmx build_automaton.cmi cyclic_buffer.cmo: cyclic_buffer.cmi cyclic_buffer.cmx: cyclic_buffer.cmi dumpdb.cmo: dumpdb.cmi dumpdb.cmx: dumpdb.cmi errors.cmo: errors.cmi errors.cmx: errors.cmi files.cmo: errors.cmi files.cmi files.cmx: errors.cmx files.cmi hlins.cmo: build_automaton.cmi dumpdb.cmi errors.cmi files.cmi \ read_databases.cmi replace.cmi run_automaton.cmi version.cmo hlins.cmx: build_automaton.cmx dumpdb.cmx errors.cmx files.cmx \ read_databases.cmx replace.cmx run_automaton.cmx version.cmx html_chars.cmo: html_chars.cmi html_chars.cmx: html_chars.cmi names.cmo: names.cmi names.cmx: names.cmi read_databases.cmo: addr_lex.cmi errors.cmi html_chars.cmi read_databases.cmi read_databases.cmx: addr_lex.cmx errors.cmx html_chars.cmx read_databases.cmi read_html.cmo: read_html.cmi read_html.cmx: read_html.cmi replace.cmo: replace.cmi replace.cmx: replace.cmi run_automaton.cmo: automaton.cmi cyclic_buffer.cmi read_html.cmi \ run_automaton.cmi run_automaton.cmx: automaton.cmx cyclic_buffer.cmx read_html.cmx \ run_automaton.cmi build_automaton.cmi: automaton.cmi run_automaton.cmi: automaton.cmi hlins-0.39/source/LICENSE0100644000076400001440000003545607654273266013476 0ustar rtusers GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS hlins-0.39/source/addr_lex.mli0100644000076400001440000000247107654273266014745 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (***************************************************************************) (* lexer for address databases *) (***************************************************************************) exception Error_lex of int*string;; exception Eof;; (* returns the next pair (name,url) . Raises Eof when the end of input is reached, and Error_lex(linenumber,error_description) in case of an error. *) val next_name_sep_url: Lexing.lexbuf -> (string*string) ;; hlins-0.39/source/README0100644000076400001440000000014607654273266013335 0ustar rtusersThis is Hlins. For more information and updates see http://www.lri.fr/~treinen/hlins. Ralf Treinen.hlins-0.39/source/addr_lex.mll0100644000076400001440000000403307654273266014744 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) { let ln = ref 1;; (* line number *) exception Error_lex of int*string;; exception Eof;; } let linespace = [' ''\t'] (* next_name_tab_url: called when we expect a name. Ignore any line that starts with the comment sign, and any white space. We accept as a name any string that does not contain "=" or newline, and that does not end or start on white space. The character "=" can be escaped as "==". *) rule next_name_sep_url = parse '#'[^'\n']*('\n') { incr ln; next_name_sep_url lexbuf } | linespace+ { next_name_sep_url lexbuf } | '\n' { incr ln; next_name_sep_url lexbuf } | [^' ''#''\t''\n''='](([^'=''\n']|"==")*[^' ''\t''\n''='])? { let name = Lexing.lexeme lexbuf in (name , sep_url lexbuf) } | eof { raise Eof } (* separator *) and sep_url = parse linespace* '=' linespace* { url lexbuf } | _ { raise (Error_lex (!ln,"No separator found"))} (* any non-empty string up to the end of the line is accepted as url *) and url = parse [^'\n']+ { Lexing.lexeme lexbuf } | ('\n'|eof) { raise (Error_lex (!ln,"No URL found"))} hlins-0.39/source/names.ml0100644000076400001440000001030107654273266014104 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open Str;; open String;; (* (products l1 l2) yields the list of strings of the form s1s2 where si is an element of list li. The first element of the result list is obtained from the first element of l1 and l2, respectively. *) let rec products sep l1 l2 = match l1 with | h::r -> (List.map (function s -> h^sep^s) l2) @ (products sep r l2) | [] -> [] ;; (* (allproducts [l1;...;ln]) yields the list of strings of the form s1s2...sn where si is an element of list li. The first element of the result list is the string obtained from the respective first elements of the input lists. *) let rec allproducts sep = function [h] -> h | h::r -> (products sep h (allproducts sep r)) | [] -> [] ;; (* abbreviation of a last name: the name itself plus, in case of a composite name, the first component. *) let abbreviate_lastname name = try [ name ; string_before name (rindex name '-') ] with Not_found -> [name] ;; (* list of abbreviations of a non-composite first name, starting with the name itself. *) let abbreviate_simple_firstname s = if length s <= 1 or get s (length s - 1) = '.' then [s] else if sub s 0 2 = "St" && length s > 2 then [s;"S.";"St."] else if sub s 0 2 = "Ch" && length s > 2 then [s;"C.";"Ch."] else [s;(sub s 0 1)^"."] ;; (* abbreviate a first name which may contain a dash *) let abbreviate_firstname s = try let n = rindex s '-' in s :: products "-" (List.tl (abbreviate_simple_firstname (string_before s n))) (List.tl (abbreviate_simple_firstname (string_after s (n + 1)))) with Not_found -> (abbreviate_simple_firstname s) ;; (* (expand_namelist l), where l is the non-empty list of parts of a name, yields the list of lists of abbreviation of these parts. *) let rec expand_namelist = function | [s] -> if get s 0 = '<' && get s (length s - 1) = '>' then [[(sub s 1 (length s - 2))]] else [abbreviate_lastname s] | s::r -> (if get s 0 = '<' && get s (length s - 1) = '>' then [(sub s 1 (length s - 2))] else abbreviate_firstname s) :: (expand_namelist r) | [] -> failwith "this cannot happen" ;; (* (pathlength l), where l is obtained as result of expand_name, returns the number of nodes in the tree. *) let rec pathlength = function | [h] -> String.length (List.hd h) (* h contains the abbreviations of the last name (hd h) which are hence all prefixes of (hd h), that is h does not count for the size of the tree. *) | h::r -> (List.fold_left (fun n s -> n + String.length s) 0 h) + (List.length h) * (1 + pathlength r) (* h is a list of abbreviations of first names. We assume the worst case that the tree diverges from the beginning. Hence we have a treenode for every position in a word in h, and the tree for the rest r is copied as many times as there are strings in h (and we add one node for the blank separating name components). *) | [] -> failwith "this cannot happen" ;; (* return the list of abbreviations of a name and the pathlength *) let abbreviate name = let parts = (split (regexp "[ \t\n]+") (global_replace (regexp "==") "=" name)) in if List.length parts = 1 then ( parts , String.length (List.hd parts) ) else let elist = expand_namelist parts in ( allproducts " " elist , pathlength elist ) ;; hlins-0.39/source/automaton.mli0100664000076400001440000001062507654273266015174 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (***************************************************************************) (* Module defining the type "automaton" for multi-string search *) (***************************************************************************) (* Data type "automaton" used for multi-string matching à la Morris - Pratt. See Chapter 7.1 in Crochemore & Rytter, Text Algorithms, Oxford University Press, 1994. *) (* Type of the transition tables of a single state. An element of type transitions is a partial mapping from characters to integers. *) type transitions;; (* create an empty transition table *) val empty_transitions : transitions;; (* (get_transition t c) returns the result of the transition t under character c. Raises Not_found if the transition is not defined. *) val get_transition : transitions -> char -> int;; (* (add_transition t c p) yields the transition t plus c -> p *) val add_transition : transitions -> char -> int -> transitions;; (* fold function on transitions: let the transition function t be [c1 -> q1; ... ; cn -> qn] (where this could have been in any order). Then (transitions_fold f i t) yields f ( ... (f (f i c1 q1) c2 q2) ... ) cn qn *) val transitions_fold : ('a -> char -> int -> 'a) -> 'a -> transitions -> 'a;; type automaton = { max_path_length: int; number_of_states: int; tree: transitions array; level: int array; board: int array; suf: int array; found: (int list) array; expand: (int list) array };; (* First some terminology: - a word x is a prefix of a word y if exists z with y = x^z - suffix if exists z with y = z^x - factor if exists z1,z2 with y = z1^x^z2 A prefix (resp. suffix) is proper if z is not empty. When building the automaton we are given an array W of length L of non-empty words. Furthermore, there is some notion of an abbreviation of a word. Let V be the set consisting of the elements of W plus all their abbreviations. Let N be the cardinality of the prefix-closure of V. We call "nodes" the numbers n with 0<=n= 1: board(n) is the node m such that path(m) is the longest proper suffix of path(n) that is a prefix of a word in V. - suf.(n) is the node m such that path(m) is the longest (not necessarily proper) suffix of path(n) that is a word in V. suf.(n) = 0 if such a m does not exist. Furthermore, we require for all nodes n that - found.(n) = list of indices in W of path(n) - expand.(n) = list of all indices of words w in W such that path(n) is an abbreviation of w. Hence, we have the following equivalence: suf(n) = n iff path(n) is a word in V iff found.(n) @ expand.(n) <> nil. *) hlins-0.39/source/build_automaton.ml0100664000076400001440000001353707654273266016207 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open Array;; open Automaton;; open Names;; (* see automaton.ml for the explication of the type automaton *) (* (ins_word root tree level firstfree i l s abbrev) where s is a word of length l and i<=l, stores the subword of s starting at position i in tree and level where root is assumed to be the root node. firstfree is the first unused node. abbrev is true if s is an abbreviation and false otherwise. Returns the pair (n,m) where n is the node with path(n)=s, and m is the first unused node. *) let rec ins_word root tree level suf found expand firstfree i l s ind abbrev = (* fresh_ins_word does the same thing than ins_word in the special case where we already know that root is a fresh node. The parameter firstfree is not needed since we know that it must be equal to root+1 *) let rec fresh_ins_word root tree level suf found expand i l s ind abbrev = if i=l then begin suf.(root) <- root; if abbrev then expand.(root) <- ind::expand.(root) else found.(root) <- ind::found.(root); root+1 end else let rootinc = root + 1 in begin tree.(root) <- add_transition tree.(root) (String.get s i) rootinc; level.(rootinc) <- i+1; fresh_ins_word rootinc tree level suf found expand (i+1) l s ind abbrev end in if i=l then begin suf.(root) <- root; if abbrev then expand.(root) <- ind::expand.(root) else found.(root) <- ind::found.(root); firstfree end else try ins_word (get_transition tree.(root) (String.get s i) ) tree level suf found expand firstfree (i+1) l s ind abbrev with Not_found -> begin tree.(root) <- add_transition tree.(root) (String.get s i) firstfree; level.(firstfree) <- i+1; fresh_ins_word firstfree tree level suf found expand (i+1) l s ind abbrev end;; (* (ins_abbrevs tree level suf found firstfree l i) inserts all words of l, considered as abbreviations of the word with index i, into tree, level, suf, found. element of l. Returns the first unused node. *) let rec ins_abbrev tree level suf found expand firstfree l i = match l with [] -> firstfree | h::r -> let newfirstfree = ins_word 0 tree level suf found expand firstfree 0 (String.length h) h i true in ins_abbrev tree level suf found expand newfirstfree r i ;; (* (ins_list tree level suf found firstfree l i) inserts all words of l into tree, level, suf, found. i is the index number of the first element of l. Returns the first unused node. *) let rec ins_list tree level suf found expand firstfree l i = match l with [] -> firstfree | (name::abbrevs,_)::r -> let newfirstfree = ins_word 0 tree level suf found expand firstfree 0 (String.length name) name i false in let anewfirstfree = ins_abbrev tree level suf found expand newfirstfree abbrevs i in ins_list tree level suf found expand anewfirstfree r (i+1) | _ -> failwith "This cannot happen (Build_automaton.ins_list)" ;; (* (propagate tree board suf l ll) propagates the board function for all nodes in l@ll to their respective sons. The second argument is used as an accumulating parameter for new propagation tasks. *) let rec propagate tree board suf l ll = (* (setboard tree board suf z c s), where s is a son at edge c and z is an iterated board of the father of s, sets the board of s under the hypothesis that the board of all its ancestors is set. returns s. *) let rec setboard tree board suf z c s = if z = -1 then begin board.(s) <- 0; s end else try let b = get_transition tree.(z) c in begin board.(s) <- b; if suf.(s) <> s then suf.(s) <- suf.(b); s end with Not_found -> setboard tree board suf board.(z) c s in match l with [] -> (match ll with [] -> () | _ -> propagate tree board suf ll l) | h::r -> propagate tree board suf r (transitions_fold (fun l c s -> (setboard tree board suf board.(h) c s)::l) ll tree.(h)) let build l = let el = List.map abbreviate l in let size = (List.fold_right (fun (l,n) m -> n + m) el 1 ) and maxl = (List.fold_right (fun (l,n) m -> max n m) el 0 ) in let suf = init size (function i -> 0) and level = init size (function i -> 0) and tree = init size (function i-> empty_transitions ) and board = init size (function i -> 0) and found = init size (function i -> []) and expand = init size (function i -> []) in begin (* build level and tree *) let n = ins_list tree level suf found expand 1 el 0 in begin (* compute board and suf *) board.(0) <- -1; suf.(0) <- -1; propagate tree board suf [0] []; {number_of_states=n; max_path_length = maxl; level=level; tree=tree; board=board; suf=suf; found=found; expand=expand } end end ;; hlins-0.39/source/cyclic_buffer.ml0100644000076400001440000000432207654273266015606 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open String;; type buffer = { buf: string; (* stores the contents *) len: int; (* length of buf *) mutable sta: int; (* position of the first char *) mutable uno: int (* first unoccupied position *) } ;; let fresh n = { buf = create n; len = n; sta = 0; uno = 0 };; let is_empty {sta=s;uno=e} = ( s = e );; let addc b c = b.buf.[b.uno] <- c; b.uno <- (b.uno + 1) mod b.len ;; let getc b = let res = b.buf.[b.sta] in b.sta <- (b.sta + 1) mod b.len; res ;; let gets b n = let oldsta = b.sta and oldstan= b.sta+n in if oldstan < b.len then begin b.sta <- oldstan; sub b.buf oldsta n end else begin b.sta <- oldstan mod b.len; (sub b.buf oldsta (b.len - oldsta)) ^ (sub b.buf 0 b.sta) end ;; let getall b = let olduno = b.uno in begin b.uno <- b.sta; if olduno >= b.sta then (String.sub b.buf b.sta (olduno-b.sta)) else (String.sub b.buf b.sta (b.len-b.sta))^ (String.sub b.buf 0 olduno); end ;; let push b s = let l = String.length s in if l <= b.sta then begin String.blit s 0 b.buf (b.sta-l) l; b.sta <- b.sta - l end else begin String.blit s (l-b.sta) b.buf 0 b.sta; String.blit s 0 b.buf (b.len-l+b.sta) (l-b.sta); b.sta <- b.len-l+b.sta end ;; hlins-0.39/source/files.ml0100644000076400001440000000705707654273266014121 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open Unix;; open Str;; open String;; open Errors;; let samefile s1 s2 = s1 <> "" && s2 <> "" && try let stat1 = stat s1 and stat2 = stat s2 in stat1.st_dev = stat2.st_dev && stat1.st_ino = stat2.st_ino with Unix_error _ -> false ;; exception Error_tmpfile of string;; let newtmpfile tmpdir = let tmp = tmpdir ^ "/hlins" ^ (string_of_int (getpid ())) in try let _ = stat tmp in raise (Error_tmpfile tmp) (* file tmp exists already *) with Unix_error _ -> tmp (* file tmp does not exist *) ;; exception Error_move of string;; (* copy file f_in on f_out *) let copy f_in f_out = let rec copy_canal c_in c_out = try begin output_string c_out ( (input_line c_in)^"\n"); copy_canal c_in c_out end with End_of_file -> () in if f_in = f_out then raise (Error_move "move: file names must be different") else try let c_in = open_in f_in and c_out = open_out f_out in begin copy_canal c_in c_out; close_in c_in; close_out c_out end with Sys_error s -> raise (Error_move "cannot open files") ;; let move s1 s2 = if (stat s1).st_dev = (stat s2).st_dev then rename s1 s2 else begin copy s1 s2; unlink s1 end ;; (* reads a dirhandle into a list and closes it *) let rec dirhandle_to_list dirhandle = try let entry = readdir dirhandle in entry::(dirhandle_to_list dirhandle) with End_of_file -> begin closedir dirhandle; [] end ;; (* check wether file f is a regular file and ends on .html *) let ishtmlfile f = try (stat f).st_kind = S_REG && length f >= 5 && last_chars f 5 = ".html" with Unix_error _ -> stopwitherror ("cannot get status of file "^f) ;; (* check wether file f is a directory *) let isdirectory f = try (stat f).st_kind = S_DIR with Unix_error _ -> stopwitherror ("cannot get status of file "^f) ;; (* (select l), where l is a list of filenames, returns the pair (hl,dl) where hl is the list of html files and sl is the list of directories in l *) let rec select path = function [] -> ([],[]) | h::r -> let (hl,dl) = select path r and pathh = path^"/"^h in if ishtmlfile pathh then (h::hl,dl) else if isdirectory pathh && h <> "." && h <> ".." then (hl,h::dl) else (hl,dl) ;; let scandir path dir = try let pathdir = path^"/"^dir in let dirhandle = opendir pathdir in select pathdir (dirhandle_to_list dirhandle) with Unix_error (_,_,d) -> stopwitherror ("cannot read directory "^d) ;; let rec expanddot = function [] -> [] | h::r -> if h = "." then (dirhandle_to_list (opendir "."))@r else h::(expanddot r) hlins-0.39/source/cyclic_buffer.mli0100644000076400001440000000315107654273266015756 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (**************************************************************************) (* fixed-length cyclic character buffer. *) (**************************************************************************) (* abstract type of a buffer *) type buffer;; (* test whether the buffer is empty *) val is_empty: buffer -> bool;; (* (fresh n) returns a buffer with n-1 positions *) val fresh: int -> buffer;; (* append a character *) val addc: buffer -> char -> unit;; (* put a string in front of the buffer *) val push: buffer -> string -> unit;; (* take a character *) val getc: buffer -> char;; (* take a string of length n *) val gets: buffer -> int -> string;; (* take all the contents from the buffer *) val getall: buffer -> string;; hlins-0.39/source/errors.ml0100644000076400001440000000170107654273266014321 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) let stopwitherror s = begin prerr_string ("Error: "^s^"\n"); exit 1 end ;; hlins-0.39/source/errors.mli0100644000076400001440000000173107654273266014475 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (* some handy functions for error handling *) (* does what the name says *) val stopwitherror: string -> 'a;; hlins-0.39/source/names.mli0100644000076400001440000000315207654273266014263 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (**************************************************************************) (* useful operations on names *) (**************************************************************************) (* (abbreviate s) returns a pair consisting of - the list of abbreviations of a string, having as first element s itself (hence the list is always non-empty). - an upper bound of the number of nodes that are taken when all abbreviations of s are arranged in a tree ( without the root node). It takes into account the fact that the abbreviation of a composite family name does not count (since it is a prefix of the full familyname). *) val abbreviate : string -> string list * int ;; hlins-0.39/source/read_databases.ml0100644000076400001440000000370607654273266015736 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open Lexing;; open Addr_lex;; open Html_chars;; open Errors;; (* (read_dblexbuf filename lb names urls) reads from the lexbuf lb the rest of a database file named filename, and returns the pair of lists (names',urls'), where names' (urls') is the list of names (urls) read in reverse order, concatenated with names (urls). *) let rec read_dblexbuf filename lb names urls = try let (name,url) = next_name_sep_url lb in read_dblexbuf filename lb (name::names) (url::urls) with Eof -> (names,urls) | Error_lex (n,s) -> stopwitherror ("Data base "^filename^", line "^(string_of_int n) ^": "^s) ;; let read_dbs filelist = List.fold_left (fun (names,urls) filename -> try let c = open_in filename in let lb = from_channel c in let result = read_dblexbuf filename (from_function (fun s _ -> try s.[0] <- next_char lb; 1 with End_of_file -> 0)) names urls in begin close_in c; result end with Sys_error s -> stopwitherror ("Cannot read data base file "^s)) ([],[]) filelist ;; hlins-0.39/source/read_databases.mli0100644000076400001440000000246607654273266016111 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (***************************************************************************) (* modules to read a list of data base files *) (***************************************************************************) (* (read_dbs l) reads the data base files of the list l and returns the pair consisting of the lists of names and the list of urls, both in reverse order. *) val read_dbs : string list -> string list * string list;; hlins-0.39/source/read_html.mli0100644000076400001440000000301607654273266015116 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (*********************************************************************) (* lexer that replaces 8-bit characters for HTML special characters *) (*********************************************************************) (* get the next character but ignore everything inside of - < and > -
and
- and - and - and - and where the tag name can be in any combination of upper and lower case *) (* tokens are either are character or a string that is to be printed verbatim *) type token = CHAR of char | VERB of string ;; val next_html : Lexing.lexbuf -> token ;; hlins-0.39/source/read_html.mll0100644000076400001440000001212307654273266015120 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) { open Lexing;; open Buffer;; (* translates an html character representation "&#nnn;" to the corresponding 8-bit char. *) let special_to_char s = Char.chr (int_of_string (String.sub s 2 3)) ;; type token = CHAR of char | VERB of string;; let b = create 1024;; } let space = [' ''\n''\t''\r'] rule next_html = parse | "À" { CHAR 'À' } | "Á" { CHAR 'Á' } | "Â" { CHAR 'Â' } | "Ã" { CHAR 'Ã' } | "Ä" { CHAR 'Ä' } | "Å" { CHAR 'Å' } | "Æ" { CHAR 'Æ' } | "È" { CHAR 'È' } | "É" { CHAR 'É' } | "Ê" { CHAR 'Ê' } | "Ë" { CHAR 'Ë' } | "Ì" { CHAR 'Ì' } | "Í" { CHAR 'Í' } | "Î" { CHAR 'Î' } | "Ï" { CHAR 'Ï' } | "Ð" { CHAR 'Ð' } | "Ñ" { CHAR 'Ñ' } | "Ò" { CHAR 'Ò' } | "Ó" { CHAR 'Ó' } | "Ô" { CHAR 'Ô' } | "Õ" { CHAR 'Õ' } | "Ö" { CHAR 'Ö' } | "Ø" { CHAR 'Ø' } | "Ù" { CHAR 'Ù' } | "Ú" { CHAR 'Ú' } | "Û" { CHAR 'Û' } | "Ü" { CHAR 'Ü' } | "Ý" { CHAR 'Ý' } | "Þ" { CHAR 'Þ' } | "ß" { CHAR 'ß' } | "à" { CHAR 'à' } | "á" { CHAR 'á' } | "â" { CHAR 'â' } | "ã" { CHAR 'ã' } | "ä" { CHAR 'ä' } | "å" { CHAR 'å' } | "æ" { CHAR 'æ' } | "&ccdil;" { CHAR 'ç' } | "è" { CHAR 'è' } | "é" { CHAR 'é' } | "ê" { CHAR 'ê' } | "ë" { CHAR 'ë' } | "ì" { CHAR 'í' } | "í" { CHAR 'í' } | "î" { CHAR 'î' } | "ï" { CHAR 'ï' } | "ð" { CHAR 'ð' } | "ñ" { CHAR 'ñ' } | "ò" { CHAR 'ò' } | "ó" { CHAR 'ó' } | "ô" { CHAR 'ô' } | "õ" { CHAR 'õ' } | "ù" { CHAR 'ù' } | "ú" { CHAR 'ú' } | "û" { CHAR 'û' } | "ü" { CHAR 'ü' } | "ý" { CHAR 'ý' } | "þ" { CHAR 'þ' } (* special characters between 192 and 255 *) | "&#" ( "19"['2'-'9'] | '2'['0'-'4']['0'-'9'] | "25"['0'-'5'] ) ";" { CHAR (special_to_char (lexeme lexbuf)) } | '<'('H'|'h')('E'|'e')('A'|'a')('D'|'d')'>' { add_string b (lexeme lexbuf); skip_head lexbuf } | '<'('C'|'c')('O'|'o')('D'|'d')('E'|'e')'>' { add_string b (lexeme lexbuf); skip_code lexbuf } | '<'('S'|'s')('A'|'a')('M'|'m')('P'|'p')'>' { add_string b (lexeme lexbuf); skip_samp lexbuf } | '<'('K'|'k')('B'|'b')('D'|'d')'>' { add_string b (lexeme lexbuf); skip_kbd lexbuf } | '<'('D'|'d')('I'|'i')('V'|'v') space+ "nohlins>" { add_string b (lexeme lexbuf); skip_div lexbuf } | '<'('A'|'a') space+ ([^'>']*space)? ('H'|'h')('R'|'r')('E'|'e')('F'|'f') '=' [^'>']* '>' { add_string b (lexeme lexbuf); skip_a lexbuf } | '<' [^'>']* '>' { VERB(lexeme lexbuf)} | eof { raise End_of_file } | _ { CHAR(lexeme_char lexbuf 0)} and skip_head = parse | "' { add_string b (lexeme lexbuf); let c = contents b in reset b; VERB(c)} | _ { add_char b (lexeme_char lexbuf 0); skip_head lexbuf } and skip_code = parse | "' { add_string b (lexeme lexbuf); let c = contents b in reset b; VERB(c) } | _ { add_char b (lexeme_char lexbuf 0); skip_code lexbuf } and skip_samp = parse | "' { add_string b (lexeme lexbuf); let c = contents b in reset b; VERB(c) } | _ { add_char b (lexeme_char lexbuf 0); skip_samp lexbuf } and skip_kbd = parse | "' { add_string b (lexeme lexbuf); let c = contents b in reset b; VERB(c) } | _ { add_char b (lexeme_char lexbuf 0); skip_kbd lexbuf } and skip_pre = parse | "' { add_string b (lexeme lexbuf); let c = contents b in reset b; VERB(c) } | _ { add_char b (lexeme_char lexbuf 0); skip_pre lexbuf } and skip_div = parse | "' { add_string b (lexeme lexbuf); let c = contents b in reset b; VERB(c) } | _ { add_char b (lexeme_char lexbuf 0); skip_div lexbuf } and skip_a = parse | "' { add_string b (lexeme lexbuf); let c = contents b in reset b; VERB(c) } | _ { add_char b (lexeme_char lexbuf 0); skip_a lexbuf } hlins-0.39/source/run_automaton.mli0100644000076400001440000000310707654273266016053 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (**************************************************************************) (* module to run an automaton for multi-pattern string searching *) (**************************************************************************) open Automaton;; (* (run auto inc subst outc) runs the automaton auto on the input read from the lexbuf inc. Writes to outc the resulting string obtained by replacing every matched subword s, with list of indices in the array of names found, and with list of indices of names s is an abbreviation of expand, by the results of (subst found expand s). *) val run : automaton -> Lexing.lexbuf -> (int list -> int list -> string -> string) -> out_channel -> unit ;; hlins-0.39/source/html_chars.mll0100644000076400001440000000507207654273266015312 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) { open Lexing;; (* translates an html character representation "&#nnn;" to the corresponding 8-bit char. *) let special_to_char s = Char.chr (int_of_string (String.sub s 2 3)) ;; } rule next_char = parse | "À" { 'À' } | "Á" { 'Á' } | "Â" { 'Â' } | "Ã" { 'Ã' } | "Ä" { 'Ä' } | "Å" { 'Å' } | "Æ" { 'Æ' } | "È" { 'È' } | "É" { 'É' } | "Ê" { 'Ê' } | "Ë" { 'Ë' } | "Ì" { 'Ì' } | "Í" { 'Í' } | "Î" { 'Î' } | "Ï" { 'Ï' } | "Ð" { 'Ð' } | "Ñ" { 'Ñ' } | "Ò" { 'Ò' } | "Ó" { 'Ó' } | "Ô" { 'Ô' } | "Õ" { 'Õ' } | "Ö" { 'Ö' } | "Ø" { 'Ø' } | "Ù" { 'Ù' } | "Ú" { 'Ú' } | "Û" { 'Û' } | "Ü" { 'Ü' } | "Ý" { 'Ý' } | "Þ" { 'Þ' } | "ß" { 'ß' } | "à" { 'à' } | "á" { 'á' } | "â" { 'â' } | "ã" { 'ã' } | "ä" { 'ä' } | "å" { 'å' } | "æ" { 'æ' } | "&ccdil;" { 'ç' } | "è" { 'è' } | "é" { 'é' } | "ê" { 'ê' } | "ë" { 'ë' } | "ì" { 'í' } | "í" { 'í' } | "î" { 'î' } | "ï" { 'ï' } | "ð" { 'ð' } | "ñ" { 'ñ' } | "ò" { 'ò' } | "ó" { 'ó' } | "ô" { 'ô' } | "õ" { 'õ' } | "ù" { 'ù' } | "ú" { 'ú' } | "û" { 'û' } | "ü" { 'ü' } | "ý" { 'ý' } | "þ" { 'þ' } (* special chaarcters between 192 and 255 *) | "&#" ( "19"['2'-'9'] | '2'['0'-'4']['0'-'9'] | "25"['0'-'5'] ) ";" { special_to_char (lexeme lexbuf) } | eof { raise End_of_file } | _ { lexeme_char lexbuf 0} hlins-0.39/source/html_chars.mli0100644000076400001440000000230707654273266015305 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (*********************************************************************) (* lexer that replaces 8-bit characters for HTML special characters *) (*********************************************************************) (* get the next character, replace HTML special characters by 8-bit characters *) val next_char : Lexing.lexbuf -> char ;; hlins-0.39/source/configure0100755000076400001440000007231007654273266014366 0ustar rtusers#! /bin/sh # Guess values for system-dependent variables and create Makefiles. # Generated automatically using autoconf version 2.13 # Copyright (C) 1992, 93, 94, 95, 96 Free Software Foundation, Inc. # # This configure script is free software; the Free Software Foundation # gives unlimited permission to copy, distribute and modify it. # Defaults: ac_help= ac_default_prefix=/usr/local # Any additions from configure.in: ac_default_prefix=/usr # Initialize some variables set by options. # The variables have the same names as the options, with # dashes changed to underlines. build=NONE cache_file=./config.cache exec_prefix=NONE host=NONE no_create= nonopt=NONE no_recursion= prefix=NONE program_prefix=NONE program_suffix=NONE program_transform_name=s,x,x, silent= site= srcdir= target=NONE verbose= x_includes=NONE x_libraries=NONE bindir='${exec_prefix}/bin' sbindir='${exec_prefix}/sbin' libexecdir='${exec_prefix}/libexec' datadir='${prefix}/share' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' libdir='${exec_prefix}/lib' includedir='${prefix}/include' oldincludedir='/usr/include' infodir='${prefix}/info' mandir='${prefix}/man' # Initialize some other variables. subdirs= MFLAGS= MAKEFLAGS= SHELL=${CONFIG_SHELL-/bin/sh} # Maximum number of lines to put in a shell here document. ac_max_here_lines=12 ac_prev= for ac_option do # If the previous option needs an argument, assign it. if test -n "$ac_prev"; then eval "$ac_prev=\$ac_option" ac_prev= continue fi case "$ac_option" in -*=*) ac_optarg=`echo "$ac_option" | sed 's/[-_a-zA-Z0-9]*=//'` ;; *) ac_optarg= ;; esac # Accept the important Cygnus configure options, so we can diagnose typos. case "$ac_option" in -bindir | --bindir | --bindi | --bind | --bin | --bi) ac_prev=bindir ;; -bindir=* | --bindir=* | --bindi=* | --bind=* | --bin=* | --bi=*) bindir="$ac_optarg" ;; -build | --build | --buil | --bui | --bu) ac_prev=build ;; -build=* | --build=* | --buil=* | --bui=* | --bu=*) build="$ac_optarg" ;; -cache-file | --cache-file | --cache-fil | --cache-fi \ | --cache-f | --cache- | --cache | --cach | --cac | --ca | --c) ac_prev=cache_file ;; -cache-file=* | --cache-file=* | --cache-fil=* | --cache-fi=* \ | --cache-f=* | --cache-=* | --cache=* | --cach=* | --cac=* | --ca=* | --c=*) cache_file="$ac_optarg" ;; -datadir | --datadir | --datadi | --datad | --data | --dat | --da) ac_prev=datadir ;; -datadir=* | --datadir=* | --datadi=* | --datad=* | --data=* | --dat=* \ | --da=*) datadir="$ac_optarg" ;; -disable-* | --disable-*) ac_feature=`echo $ac_option|sed -e 's/-*disable-//'` # Reject names that are not valid shell variable names. if test -n "`echo $ac_feature| sed 's/[-a-zA-Z0-9_]//g'`"; then { echo "configure: error: $ac_feature: invalid feature name" 1>&2; exit 1; } fi ac_feature=`echo $ac_feature| sed 's/-/_/g'` eval "enable_${ac_feature}=no" ;; -enable-* | --enable-*) ac_feature=`echo $ac_option|sed -e 's/-*enable-//' -e 's/=.*//'` # Reject names that are not valid shell variable names. if test -n "`echo $ac_feature| sed 's/[-_a-zA-Z0-9]//g'`"; then { echo "configure: error: $ac_feature: invalid feature name" 1>&2; exit 1; } fi ac_feature=`echo $ac_feature| sed 's/-/_/g'` case "$ac_option" in *=*) ;; *) ac_optarg=yes ;; esac eval "enable_${ac_feature}='$ac_optarg'" ;; -exec-prefix | --exec_prefix | --exec-prefix | --exec-prefi \ | --exec-pref | --exec-pre | --exec-pr | --exec-p | --exec- \ | --exec | --exe | --ex) ac_prev=exec_prefix ;; -exec-prefix=* | --exec_prefix=* | --exec-prefix=* | --exec-prefi=* \ | --exec-pref=* | --exec-pre=* | --exec-pr=* | --exec-p=* | --exec-=* \ | --exec=* | --exe=* | --ex=*) exec_prefix="$ac_optarg" ;; -gas | --gas | --ga | --g) # Obsolete; use --with-gas. with_gas=yes ;; -help | --help | --hel | --he) # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat << EOF Usage: configure [options] [host] Options: [defaults in brackets after descriptions] Configuration: --cache-file=FILE cache test results in FILE --help print this message --no-create do not create output files --quiet, --silent do not print \`checking...' messages --version print the version of autoconf that created configure Directory and file names: --prefix=PREFIX install architecture-independent files in PREFIX [$ac_default_prefix] --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX [same as prefix] --bindir=DIR user executables in DIR [EPREFIX/bin] --sbindir=DIR system admin executables in DIR [EPREFIX/sbin] --libexecdir=DIR program executables in DIR [EPREFIX/libexec] --datadir=DIR read-only architecture-independent data in DIR [PREFIX/share] --sysconfdir=DIR read-only single-machine data in DIR [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data in DIR [PREFIX/com] --localstatedir=DIR modifiable single-machine data in DIR [PREFIX/var] --libdir=DIR object code libraries in DIR [EPREFIX/lib] --includedir=DIR C header files in DIR [PREFIX/include] --oldincludedir=DIR C header files for non-gcc in DIR [/usr/include] --infodir=DIR info documentation in DIR [PREFIX/info] --mandir=DIR man documentation in DIR [PREFIX/man] --srcdir=DIR find the sources in DIR [configure dir or ..] --program-prefix=PREFIX prepend PREFIX to installed program names --program-suffix=SUFFIX append SUFFIX to installed program names --program-transform-name=PROGRAM run sed PROGRAM on installed program names EOF cat << EOF Host type: --build=BUILD configure for building on BUILD [BUILD=HOST] --host=HOST configure for HOST [guessed] --target=TARGET configure for TARGET [TARGET=HOST] Features and packages: --disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no) --enable-FEATURE[=ARG] include FEATURE [ARG=yes] --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) --x-includes=DIR X include files are in DIR --x-libraries=DIR X library files are in DIR EOF if test -n "$ac_help"; then echo "--enable and --with options recognized:$ac_help" fi exit 0 ;; -host | --host | --hos | --ho) ac_prev=host ;; -host=* | --host=* | --hos=* | --ho=*) host="$ac_optarg" ;; -includedir | --includedir | --includedi | --included | --include \ | --includ | --inclu | --incl | --inc) ac_prev=includedir ;; -includedir=* | --includedir=* | --includedi=* | --included=* | --include=* \ | --includ=* | --inclu=* | --incl=* | --inc=*) includedir="$ac_optarg" ;; -infodir | --infodir | --infodi | --infod | --info | --inf) ac_prev=infodir ;; -infodir=* | --infodir=* | --infodi=* | --infod=* | --info=* | --inf=*) infodir="$ac_optarg" ;; -libdir | --libdir | --libdi | --libd) ac_prev=libdir ;; -libdir=* | --libdir=* | --libdi=* | --libd=*) libdir="$ac_optarg" ;; -libexecdir | --libexecdir | --libexecdi | --libexecd | --libexec \ | --libexe | --libex | --libe) ac_prev=libexecdir ;; -libexecdir=* | --libexecdir=* | --libexecdi=* | --libexecd=* | --libexec=* \ | --libexe=* | --libex=* | --libe=*) libexecdir="$ac_optarg" ;; -localstatedir | --localstatedir | --localstatedi | --localstated \ | --localstate | --localstat | --localsta | --localst \ | --locals | --local | --loca | --loc | --lo) ac_prev=localstatedir ;; -localstatedir=* | --localstatedir=* | --localstatedi=* | --localstated=* \ | --localstate=* | --localstat=* | --localsta=* | --localst=* \ | --locals=* | --local=* | --loca=* | --loc=* | --lo=*) localstatedir="$ac_optarg" ;; -mandir | --mandir | --mandi | --mand | --man | --ma | --m) ac_prev=mandir ;; -mandir=* | --mandir=* | --mandi=* | --mand=* | --man=* | --ma=* | --m=*) mandir="$ac_optarg" ;; -nfp | --nfp | --nf) # Obsolete; use --without-fp. with_fp=no ;; -no-create | --no-create | --no-creat | --no-crea | --no-cre \ | --no-cr | --no-c) no_create=yes ;; -no-recursion | --no-recursion | --no-recursio | --no-recursi \ | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) no_recursion=yes ;; -oldincludedir | --oldincludedir | --oldincludedi | --oldincluded \ | --oldinclude | --oldinclud | --oldinclu | --oldincl | --oldinc \ | --oldin | --oldi | --old | --ol | --o) ac_prev=oldincludedir ;; -oldincludedir=* | --oldincludedir=* | --oldincludedi=* | --oldincluded=* \ | --oldinclude=* | --oldinclud=* | --oldinclu=* | --oldincl=* | --oldinc=* \ | --oldin=* | --oldi=* | --old=* | --ol=* | --o=*) oldincludedir="$ac_optarg" ;; -prefix | --prefix | --prefi | --pref | --pre | --pr | --p) ac_prev=prefix ;; -prefix=* | --prefix=* | --prefi=* | --pref=* | --pre=* | --pr=* | --p=*) prefix="$ac_optarg" ;; -program-prefix | --program-prefix | --program-prefi | --program-pref \ | --program-pre | --program-pr | --program-p) ac_prev=program_prefix ;; -program-prefix=* | --program-prefix=* | --program-prefi=* \ | --program-pref=* | --program-pre=* | --program-pr=* | --program-p=*) program_prefix="$ac_optarg" ;; -program-suffix | --program-suffix | --program-suffi | --program-suff \ | --program-suf | --program-su | --program-s) ac_prev=program_suffix ;; -program-suffix=* | --program-suffix=* | --program-suffi=* \ | --program-suff=* | --program-suf=* | --program-su=* | --program-s=*) program_suffix="$ac_optarg" ;; -program-transform-name | --program-transform-name \ | --program-transform-nam | --program-transform-na \ | --program-transform-n | --program-transform- \ | --program-transform | --program-transfor \ | --program-transfo | --program-transf \ | --program-trans | --program-tran \ | --progr-tra | --program-tr | --program-t) ac_prev=program_transform_name ;; -program-transform-name=* | --program-transform-name=* \ | --program-transform-nam=* | --program-transform-na=* \ | --program-transform-n=* | --program-transform-=* \ | --program-transform=* | --program-transfor=* \ | --program-transfo=* | --program-transf=* \ | --program-trans=* | --program-tran=* \ | --progr-tra=* | --program-tr=* | --program-t=*) program_transform_name="$ac_optarg" ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) silent=yes ;; -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ | --sbi=* | --sb=*) sbindir="$ac_optarg" ;; -sharedstatedir | --sharedstatedir | --sharedstatedi \ | --sharedstated | --sharedstate | --sharedstat | --sharedsta \ | --sharedst | --shareds | --shared | --share | --shar \ | --sha | --sh) ac_prev=sharedstatedir ;; -sharedstatedir=* | --sharedstatedir=* | --sharedstatedi=* \ | --sharedstated=* | --sharedstate=* | --sharedstat=* | --sharedsta=* \ | --sharedst=* | --shareds=* | --shared=* | --share=* | --shar=* \ | --sha=* | --sh=*) sharedstatedir="$ac_optarg" ;; -site | --site | --sit) ac_prev=site ;; -site=* | --site=* | --sit=*) site="$ac_optarg" ;; -srcdir | --srcdir | --srcdi | --srcd | --src | --sr) ac_prev=srcdir ;; -srcdir=* | --srcdir=* | --srcdi=* | --srcd=* | --src=* | --sr=*) srcdir="$ac_optarg" ;; -sysconfdir | --sysconfdir | --sysconfdi | --sysconfd | --sysconf \ | --syscon | --sysco | --sysc | --sys | --sy) ac_prev=sysconfdir ;; -sysconfdir=* | --sysconfdir=* | --sysconfdi=* | --sysconfd=* | --sysconf=* \ | --syscon=* | --sysco=* | --sysc=* | --sys=* | --sy=*) sysconfdir="$ac_optarg" ;; -target | --target | --targe | --targ | --tar | --ta | --t) ac_prev=target ;; -target=* | --target=* | --targe=* | --targ=* | --tar=* | --ta=* | --t=*) target="$ac_optarg" ;; -v | -verbose | --verbose | --verbos | --verbo | --verb) verbose=yes ;; -version | --version | --versio | --versi | --vers) echo "configure generated by autoconf version 2.13" exit 0 ;; -with-* | --with-*) ac_package=`echo $ac_option|sed -e 's/-*with-//' -e 's/=.*//'` # Reject names that are not valid shell variable names. if test -n "`echo $ac_package| sed 's/[-_a-zA-Z0-9]//g'`"; then { echo "configure: error: $ac_package: invalid package name" 1>&2; exit 1; } fi ac_package=`echo $ac_package| sed 's/-/_/g'` case "$ac_option" in *=*) ;; *) ac_optarg=yes ;; esac eval "with_${ac_package}='$ac_optarg'" ;; -without-* | --without-*) ac_package=`echo $ac_option|sed -e 's/-*without-//'` # Reject names that are not valid shell variable names. if test -n "`echo $ac_package| sed 's/[-a-zA-Z0-9_]//g'`"; then { echo "configure: error: $ac_package: invalid package name" 1>&2; exit 1; } fi ac_package=`echo $ac_package| sed 's/-/_/g'` eval "with_${ac_package}=no" ;; --x) # Obsolete; use --with-x. with_x=yes ;; -x-includes | --x-includes | --x-include | --x-includ | --x-inclu \ | --x-incl | --x-inc | --x-in | --x-i) ac_prev=x_includes ;; -x-includes=* | --x-includes=* | --x-include=* | --x-includ=* | --x-inclu=* \ | --x-incl=* | --x-inc=* | --x-in=* | --x-i=*) x_includes="$ac_optarg" ;; -x-libraries | --x-libraries | --x-librarie | --x-librari \ | --x-librar | --x-libra | --x-libr | --x-lib | --x-li | --x-l) ac_prev=x_libraries ;; -x-libraries=* | --x-libraries=* | --x-librarie=* | --x-librari=* \ | --x-librar=* | --x-libra=* | --x-libr=* | --x-lib=* | --x-li=* | --x-l=*) x_libraries="$ac_optarg" ;; -*) { echo "configure: error: $ac_option: invalid option; use --help to show usage" 1>&2; exit 1; } ;; *) if test -n "`echo $ac_option| sed 's/[-a-z0-9.]//g'`"; then echo "configure: warning: $ac_option: invalid host type" 1>&2 fi if test "x$nonopt" != xNONE; then { echo "configure: error: can only configure for one host and one target at a time" 1>&2; exit 1; } fi nonopt="$ac_option" ;; esac done if test -n "$ac_prev"; then { echo "configure: error: missing argument to --`echo $ac_prev | sed 's/_/-/g'`" 1>&2; exit 1; } fi trap 'rm -fr conftest* confdefs* core core.* *.core $ac_clean_files; exit 1' 1 2 15 # File descriptor usage: # 0 standard input # 1 file creation # 2 errors and warnings # 3 some systems may open it to /dev/tty # 4 used on the Kubota Titan # 6 checking for... messages and results # 5 compiler messages saved in config.log if test "$silent" = yes; then exec 6>/dev/null else exec 6>&1 fi exec 5>./config.log echo "\ This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. " 1>&5 # Strip out --no-create and --no-recursion so they do not pile up. # Also quote any args containing shell metacharacters. ac_configure_args= for ac_arg do case "$ac_arg" in -no-create | --no-create | --no-creat | --no-crea | --no-cre \ | --no-cr | --no-c) ;; -no-recursion | --no-recursion | --no-recursio | --no-recursi \ | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) ;; *" "*|*" "*|*[\[\]\~\#\$\^\&\*\(\)\{\}\\\|\;\<\>\?]*) ac_configure_args="$ac_configure_args '$ac_arg'" ;; *) ac_configure_args="$ac_configure_args $ac_arg" ;; esac done # NLS nuisances. # Only set these to C if already set. These must not be set unconditionally # because not all systems understand e.g. LANG=C (notably SCO). # Fixing LC_MESSAGES prevents Solaris sh from translating var values in `set'! # Non-C LC_CTYPE values break the ctype check. if test "${LANG+set}" = set; then LANG=C; export LANG; fi if test "${LC_ALL+set}" = set; then LC_ALL=C; export LC_ALL; fi if test "${LC_MESSAGES+set}" = set; then LC_MESSAGES=C; export LC_MESSAGES; fi if test "${LC_CTYPE+set}" = set; then LC_CTYPE=C; export LC_CTYPE; fi # confdefs.h avoids OS command line length limits that DEFS can exceed. rm -rf conftest* confdefs.h # AIX cpp loses on an empty file, so make sure it contains at least a newline. echo > confdefs.h # A filename unique to this package, relative to the directory that # configure is in, which we can look for to find out if srcdir is correct. ac_unique_file=hlins.ml # Find the source files, if location was not specified. if test -z "$srcdir"; then ac_srcdir_defaulted=yes # Try the directory containing this script, then its parent. ac_prog=$0 ac_confdir=`echo $ac_prog|sed 's%/[^/][^/]*$%%'` test "x$ac_confdir" = "x$ac_prog" && ac_confdir=. srcdir=$ac_confdir if test ! -r $srcdir/$ac_unique_file; then srcdir=.. fi else ac_srcdir_defaulted=no fi if test ! -r $srcdir/$ac_unique_file; then if test "$ac_srcdir_defaulted" = yes; then { echo "configure: error: can not find sources in $ac_confdir or .." 1>&2; exit 1; } else { echo "configure: error: can not find sources in $srcdir" 1>&2; exit 1; } fi fi srcdir=`echo "${srcdir}" | sed 's%\([^/]\)/*$%\1%'` # Prefer explicitly selected file to automatically selected ones. if test -z "$CONFIG_SITE"; then if test "x$prefix" != xNONE; then CONFIG_SITE="$prefix/share/config.site $prefix/etc/config.site" else CONFIG_SITE="$ac_default_prefix/share/config.site $ac_default_prefix/etc/config.site" fi fi for ac_site_file in $CONFIG_SITE; do if test -r "$ac_site_file"; then echo "loading site script $ac_site_file" . "$ac_site_file" fi done if test -r "$cache_file"; then echo "loading cache $cache_file" . $cache_file else echo "creating cache $cache_file" > $cache_file fi ac_ext=c # CFLAGS is not in ac_cpp because -g, -O, etc. are not valid cpp options. ac_cpp='$CPP $CPPFLAGS' ac_compile='${CC-cc} -c $CFLAGS $CPPFLAGS conftest.$ac_ext 1>&5' ac_link='${CC-cc} -o conftest${ac_exeext} $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS 1>&5' cross_compiling=$ac_cv_prog_cc_cross ac_exeext= ac_objext=o if (echo "testing\c"; echo 1,2,3) | grep c >/dev/null; then # Stardent Vistra SVR4 grep lacks -e, says ghazi@caip.rutgers.edu. if (echo -n testing; echo 1,2,3) | sed s/-n/xn/ | grep xn >/dev/null; then ac_n= ac_c=' ' ac_t=' ' else ac_n=-n ac_c= ac_t= fi else ac_n= ac_c='\c' ac_t= fi # Check for OCaml programs # Extract the first word of "ocamlc", so it can be a program name with args. set dummy ocamlc; ac_word=$2 echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 echo "configure:534: checking for $ac_word" >&5 if eval "test \"`echo '$''{'ac_cv_prog_OCAMLC'+set}'`\" = set"; then echo $ac_n "(cached) $ac_c" 1>&6 else if test -n "$OCAMLC"; then ac_cv_prog_OCAMLC="$OCAMLC" # Let the user override the test. else IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" ac_dummy="$PATH" for ac_dir in $ac_dummy; do test -z "$ac_dir" && ac_dir=. if test -f $ac_dir/$ac_word; then ac_cv_prog_OCAMLC="ocamlc" break fi done IFS="$ac_save_ifs" test -z "$ac_cv_prog_OCAMLC" && ac_cv_prog_OCAMLC="no" fi fi OCAMLC="$ac_cv_prog_OCAMLC" if test -n "$OCAMLC"; then echo "$ac_t""$OCAMLC" 1>&6 else echo "$ac_t""no" 1>&6 fi if test "$OCAMLC" = no ; then { echo "configure: error: Cannot find ocamlc." 1>&2; exit 1; } fi # Extract the first word of "ocamlopt", so it can be a program name with args. set dummy ocamlopt; ac_word=$2 echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 echo "configure:567: checking for $ac_word" >&5 if eval "test \"`echo '$''{'ac_cv_prog_OCAMLOPT'+set}'`\" = set"; then echo $ac_n "(cached) $ac_c" 1>&6 else if test -n "$OCAMLOPT"; then ac_cv_prog_OCAMLOPT="$OCAMLOPT" # Let the user override the test. else IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" ac_dummy="$PATH" for ac_dir in $ac_dummy; do test -z "$ac_dir" && ac_dir=. if test -f $ac_dir/$ac_word; then ac_cv_prog_OCAMLOPT="ocamlopt" break fi done IFS="$ac_save_ifs" test -z "$ac_cv_prog_OCAMLOPT" && ac_cv_prog_OCAMLOPT="no" fi fi OCAMLOPT="$ac_cv_prog_OCAMLOPT" if test -n "$OCAMLOPT"; then echo "$ac_t""$OCAMLOPT" 1>&6 else echo "$ac_t""no" 1>&6 fi if test "$OCAMLOPT" = no ; then echo "configure: warning: Cannot find ocamlopt." 1>&2 BEST=byte else BEST=opt fi # Extract the first word of "ocamldep", so it can be a program name with args. set dummy ocamldep; ac_word=$2 echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 echo "configure:603: checking for $ac_word" >&5 if eval "test \"`echo '$''{'ac_cv_prog_OCAMLDEP'+set}'`\" = set"; then echo $ac_n "(cached) $ac_c" 1>&6 else if test -n "$OCAMLDEP"; then ac_cv_prog_OCAMLDEP="$OCAMLDEP" # Let the user override the test. else IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" ac_dummy="$PATH" for ac_dir in $ac_dummy; do test -z "$ac_dir" && ac_dir=. if test -f $ac_dir/$ac_word; then ac_cv_prog_OCAMLDEP="ocamldep" break fi done IFS="$ac_save_ifs" test -z "$ac_cv_prog_OCAMLDEP" && ac_cv_prog_OCAMLDEP="no" fi fi OCAMLDEP="$ac_cv_prog_OCAMLDEP" if test -n "$OCAMLDEP"; then echo "$ac_t""$OCAMLDEP" 1>&6 else echo "$ac_t""no" 1>&6 fi if test "$OCAMLC" = no ; then { echo "configure: error: Cannot find ocamldep." 1>&2; exit 1; } fi # Finally create all the generated files trap '' 1 2 15 cat > confcache <<\EOF # This file is a shell script that caches the results of configure # tests run on this system so they can be shared between configure # scripts and configure runs. It is not useful on other systems. # If it contains results you don't want to keep, you may remove or edit it. # # By default, configure uses ./config.cache as the cache file, # creating it if it does not exist already. You can give configure # the --cache-file=FILE option to use a different cache file; that is # what configure does when it calls configure scripts in # subdirectories, so they share the cache. # Giving --cache-file=/dev/null disables caching, for debugging configure. # config.status only pays attention to the cache file if you give it the # --recheck option to rerun configure. # EOF # The following way of writing the cache mishandles newlines in values, # but we know of no workaround that is simple, portable, and efficient. # So, don't put newlines in cache variables' values. # Ultrix sh set writes to stderr and can't be redirected directly, # and sets the high bit in the cache file unless we assign to the vars. (set) 2>&1 | case `(ac_space=' '; set | grep ac_space) 2>&1` in *ac_space=\ *) # `set' does not quote correctly, so add quotes (double-quote substitution # turns \\\\ into \\, and sed turns \\ into \). sed -n \ -e "s/'/'\\\\''/g" \ -e "s/^\\([a-zA-Z0-9_]*_cv_[a-zA-Z0-9_]*\\)=\\(.*\\)/\\1=\${\\1='\\2'}/p" ;; *) # `set' quotes correctly as required by POSIX, so do not add quotes. sed -n -e 's/^\([a-zA-Z0-9_]*_cv_[a-zA-Z0-9_]*\)=\(.*\)/\1=${\1=\2}/p' ;; esac >> confcache if cmp -s $cache_file confcache; then : else if test -w $cache_file; then echo "updating cache $cache_file" cat confcache > $cache_file else echo "not updating unwritable cache $cache_file" fi fi rm -f confcache trap 'rm -fr conftest* confdefs* core core.* *.core $ac_clean_files; exit 1' 1 2 15 test "x$prefix" = xNONE && prefix=$ac_default_prefix # Let make expand exec_prefix. test "x$exec_prefix" = xNONE && exec_prefix='${prefix}' # Any assignment to VPATH causes Sun make to only execute # the first set of double-colon rules, so remove it if not needed. # If there is a colon in the path, we need to keep it. if test "x$srcdir" = x.; then ac_vpsub='/^[ ]*VPATH[ ]*=[^:]*$/d' fi trap 'rm -f $CONFIG_STATUS conftest*; exit 1' 1 2 15 # Transform confdefs.h into DEFS. # Protect against shell expansion while executing Makefile rules. # Protect against Makefile macro expansion. cat > conftest.defs <<\EOF s%#define \([A-Za-z_][A-Za-z0-9_]*\) *\(.*\)%-D\1=\2%g s%[ `~#$^&*(){}\\|;'"<>?]%\\&%g s%\[%\\&%g s%\]%\\&%g s%\$%$$%g EOF DEFS=`sed -f conftest.defs confdefs.h | tr '\012' ' '` rm -f conftest.defs # Without the "./", some shells look in PATH for config.status. : ${CONFIG_STATUS=./config.status} echo creating $CONFIG_STATUS rm -f $CONFIG_STATUS cat > $CONFIG_STATUS </dev/null | sed 1q`: # # $0 $ac_configure_args # # Compiler output produced by configure, useful for debugging # configure, is in ./config.log if it exists. ac_cs_usage="Usage: $CONFIG_STATUS [--recheck] [--version] [--help]" for ac_option do case "\$ac_option" in -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r) echo "running \${CONFIG_SHELL-/bin/sh} $0 $ac_configure_args --no-create --no-recursion" exec \${CONFIG_SHELL-/bin/sh} $0 $ac_configure_args --no-create --no-recursion ;; -version | --version | --versio | --versi | --vers | --ver | --ve | --v) echo "$CONFIG_STATUS generated by autoconf version 2.13" exit 0 ;; -help | --help | --hel | --he | --h) echo "\$ac_cs_usage"; exit 0 ;; *) echo "\$ac_cs_usage"; exit 1 ;; esac done ac_given_srcdir=$srcdir trap 'rm -fr `echo "Makefile" | sed "s/:[^ ]*//g"` conftest*; exit 1' 1 2 15 EOF cat >> $CONFIG_STATUS < conftest.subs <<\\CEOF $ac_vpsub $extrasub s%@SHELL@%$SHELL%g s%@CFLAGS@%$CFLAGS%g s%@CPPFLAGS@%$CPPFLAGS%g s%@CXXFLAGS@%$CXXFLAGS%g s%@FFLAGS@%$FFLAGS%g s%@DEFS@%$DEFS%g s%@LDFLAGS@%$LDFLAGS%g s%@LIBS@%$LIBS%g s%@exec_prefix@%$exec_prefix%g s%@prefix@%$prefix%g s%@program_transform_name@%$program_transform_name%g s%@bindir@%$bindir%g s%@sbindir@%$sbindir%g s%@libexecdir@%$libexecdir%g s%@datadir@%$datadir%g s%@sysconfdir@%$sysconfdir%g s%@sharedstatedir@%$sharedstatedir%g s%@localstatedir@%$localstatedir%g s%@libdir@%$libdir%g s%@includedir@%$includedir%g s%@oldincludedir@%$oldincludedir%g s%@infodir@%$infodir%g s%@mandir@%$mandir%g s%@OCAMLC@%$OCAMLC%g s%@OCAMLOPT@%$OCAMLOPT%g s%@OCAMLDEP@%$OCAMLDEP%g s%@BEST@%$BEST%g CEOF EOF cat >> $CONFIG_STATUS <<\EOF # Split the substitutions into bite-sized pieces for seds with # small command number limits, like on Digital OSF/1 and HP-UX. ac_max_sed_cmds=90 # Maximum number of lines to put in a sed script. ac_file=1 # Number of current file. ac_beg=1 # First line for current file. ac_end=$ac_max_sed_cmds # Line after last line for current file. ac_more_lines=: ac_sed_cmds="" while $ac_more_lines; do if test $ac_beg -gt 1; then sed "1,${ac_beg}d; ${ac_end}q" conftest.subs > conftest.s$ac_file else sed "${ac_end}q" conftest.subs > conftest.s$ac_file fi if test ! -s conftest.s$ac_file; then ac_more_lines=false rm -f conftest.s$ac_file else if test -z "$ac_sed_cmds"; then ac_sed_cmds="sed -f conftest.s$ac_file" else ac_sed_cmds="$ac_sed_cmds | sed -f conftest.s$ac_file" fi ac_file=`expr $ac_file + 1` ac_beg=$ac_end ac_end=`expr $ac_end + $ac_max_sed_cmds` fi done if test -z "$ac_sed_cmds"; then ac_sed_cmds=cat fi EOF cat >> $CONFIG_STATUS <> $CONFIG_STATUS <<\EOF for ac_file in .. $CONFIG_FILES; do if test "x$ac_file" != x..; then # Support "outfile[:infile[:infile...]]", defaulting infile="outfile.in". case "$ac_file" in *:*) ac_file_in=`echo "$ac_file"|sed 's%[^:]*:%%'` ac_file=`echo "$ac_file"|sed 's%:.*%%'` ;; *) ac_file_in="${ac_file}.in" ;; esac # Adjust a relative srcdir, top_srcdir, and INSTALL for subdirectories. # Remove last slash and all that follows it. Not all systems have dirname. ac_dir=`echo $ac_file|sed 's%/[^/][^/]*$%%'` if test "$ac_dir" != "$ac_file" && test "$ac_dir" != .; then # The file is in a subdirectory. test ! -d "$ac_dir" && mkdir "$ac_dir" ac_dir_suffix="/`echo $ac_dir|sed 's%^\./%%'`" # A "../" for each directory in $ac_dir_suffix. ac_dots=`echo $ac_dir_suffix|sed 's%/[^/]*%../%g'` else ac_dir_suffix= ac_dots= fi case "$ac_given_srcdir" in .) srcdir=. if test -z "$ac_dots"; then top_srcdir=. else top_srcdir=`echo $ac_dots|sed 's%/$%%'`; fi ;; /*) srcdir="$ac_given_srcdir$ac_dir_suffix"; top_srcdir="$ac_given_srcdir" ;; *) # Relative path. srcdir="$ac_dots$ac_given_srcdir$ac_dir_suffix" top_srcdir="$ac_dots$ac_given_srcdir" ;; esac echo creating "$ac_file" rm -f "$ac_file" configure_input="Generated automatically from `echo $ac_file_in|sed 's%.*/%%'` by configure." case "$ac_file" in *Makefile*) ac_comsub="1i\\ # $configure_input" ;; *) ac_comsub= ;; esac ac_file_inputs=`echo $ac_file_in|sed -e "s%^%$ac_given_srcdir/%" -e "s%:% $ac_given_srcdir/%g"` sed -e "$ac_comsub s%@configure_input@%$configure_input%g s%@srcdir@%$srcdir%g s%@top_srcdir@%$top_srcdir%g " $ac_file_inputs | (eval "$ac_sed_cmds") > $ac_file fi; done rm -f conftest.s* EOF cat >> $CONFIG_STATUS <> $CONFIG_STATUS <<\EOF exit 0 EOF chmod +x $CONFIG_STATUS rm -fr confdefs* $ac_clean_files test "$no_create" = yes || ${CONFIG_SHELL-/bin/sh} $CONFIG_STATUS || exit 1 hlins-0.39/source/automaton.ml0100664000076400001440000000363407654273266015025 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (* (* implementation of transitions tables by lists *) type transitions = (char*int) list;; let empty_transitions = [];; let get_transition t c = List.assoc c t;; let add_transition t c p = (c,p)::t;; let transitions_fold f = List.fold_left (fun x (c,q) -> (f x c q));; *) (* implementation of transition tables by finite functions *) module OrderedChar = struct type t = char (* let compare x y = (Char.code x) - (Char.code y) *) let compare = Pervasives.compare end;; module M = Map.Make(OrderedChar);; type transitions = int M.t;; let empty_transitions = M.empty;; let get_transition t c = M.find c t;; let add_transition t c q = M.add c q t;; let transitions_fold f i t = M.fold (fun c i x -> f x c i) t i;; type automaton = { max_path_length: int; number_of_states: int; tree: transitions array; level: int array; board: int array; suf: int array; found: (int list) array; expand: (int list) array };; hlins-0.39/source/Makefile.in0100664000076400001440000000413507654273266014526 0ustar rtusers# Makefile for Hlins ######################################################################### include ../Version CAMLC = @OCAMLC@ CAMLCOPT = @OCAMLOPT@ CAMLLEX = ocamllex CAMLDEP = @OCAMLDEP@ # use this one for profiling # CAMLCOPT = ocamlopt -p # installation BINDIR = /usr/bin # BINDIR = /users/demons/demons/bin/$(OSTYPE) DESTDIR = ########################################################################### .SUFFIXES: .cmi .cmx .cmo .ml .mli .mll MODULES=html_chars.cmx addr_lex.cmx read_html.cmx errors.cmx names.cmx\ read_databases.cmx automaton.cmx build_automaton.cmx files.cmx\ replace.cmx dumpdb.cmx cyclic_buffer.cmx run_automaton.cmx\ version.cmx hlins.cmx # automatic selection native / byte code all: @BEST@ opt: hlins byte: hlins.bc # native code compilation hlins: $(MODULES) $(CAMLCOPT) -o hlins str.cmxa unix.cmxa $(MODULES) -cclib -lstr strip hlins # byte code compilation hlins.bc: $(MODULES:.cmx=.cmo) $(CAMLC) -custom -o hlins.bc str.cma unix.cma $(MODULES:.cmx=.cmo)\ -cclib -lstr .mli.cmi: $(CAMLC) -c $< .ml.cmx: $(CAMLCOPT) -c $< .ml.cmo: $(CAMLC) -c $< .mll.ml: $(CAMLLEX) $< clean: -rm -f *.cm[iox] *.o a.out\ read_html.ml addr_lex.ml html_chars.ml version.ml distclean: clean -rm -f hlins hlins.bc gmon.out Makefile config.* *~ # automatic selectionj native/byte install: if test @BEST@ = opt ; then \ cp hlins $(DESTDIR)/$(BINDIR) ; \ else \ cp hlins.bc $(DESTDIR)/$(BINDIR)/hlins ; \ fi # for byte code install.bc: hlins.bc cp hlins.bc $(DESTDIR)/$(BINDIR)/hlins depend: $(CAMLDEP) *.ml *.mli > .depend ########################################################################## include .depend read_html.cmi: read_html.mli read_html.ml: read_html.mll read_html.cmx read_html.cmo: read_html.ml read_html.cmi addr_lex.cmi: addr_lex.mli addr_lex.ml: addr_lex.mll addr_lex.cmx addr_lex.cmo: addr_lex.ml addr_lex.cmi html_chars.cmi: html_chars.mli html_chars.ml: html_chars.mll html_chars.cmx html_chars.cmo: html_chars.ml html_chars.cmi version.ml: Makefile echo "let version = \""$(VERSION)"\"" > version.ml hlins-0.39/source/files.mli0100644000076400001440000000506007654273266014262 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (*************************************************************************) (* Functions on files. This is Unix-specific. *) (*************************************************************************) (* test wether two filenames designate physically the same file (that is, wether they are located on the same devices and have same inode). the test result is false if either of the files does not exist or if its status is not accessible. An empty filename is considered different from any other filename. *) val samefile : string -> string -> bool;; (* (tmpfile dir) returns the name of a temporary file in dir *) exception Error_tmpfile of string;; val newtmpfile: string -> string;; (* (move s1 s2) moves file s1 to s2, by changing links when both are on the same filesystem, and copying else *) exception Error_move of string;; val move: string -> string -> unit;; (* (scandir path dir) yields the pair (hl,dl) where - hl is the list of regular files *.html in path/dir - dl is the list of directories in in path/dir. Stops with error when the directory path/dir cannot be read. *) val scandir: string -> string -> string list * string list;; (* (select p l), where l is a list of filenames and p a path, returns the pair (hl,dl) where hl is the list of html files and sl is the list of directories in l (except . and ..). Stops with error when the status of one of the files in l is not accessible. *) val select : string -> string list -> string list * string list;; (* (expanddot fl), where l is a list of files and directories, replaces the first occurrence of . by a listing of the current directory. *) val expanddot : string list -> string list;; hlins-0.39/source/build_automaton.mli0100644000076400001440000000231507654273266016346 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (***************************************************************************) (* Module to build an automaton for multi-string search *) (***************************************************************************) open Automaton;; (* (build l) returns the automaton for the string list l *) val build: string list -> automaton;; hlins-0.39/source/run_automaton.ml0100664000076400001440000002312207654273266015703 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open Array;; open Cyclic_buffer;; open Lexing;; open Automaton;; open Read_html;; let isblank c = c=' ' || c='\t' || c='\n' || c='\r' ;; let run {max_path_length=m;level=level;tree=tree;board=board;suf=suf; found=found;expand=expand} inbuf subst outc = let s = fresh (m+1) (* s serves as additional input buffer that takes priority over the lex buffer inbuf, that is when taking the next character we check first with s (see function read). In some cases we have to put symbols already read back into s. However, the length of s is at most the length of the longest search pattern. *) and ext = fresh (m+1) (* the buffer ext contains the part of the input that we have already seen but that is not yet known to be a match. Except in case of the function ex_lock this is identical with the window (see below). The length of ext is at most the length of the longest search pattern. *) in let rec read () = (* read () tries to read first from the buffer s, then from inbuf using lexer. If it succeeds then it returns (c,vs,false), otherwise it returns (' ',"",true) . In case of success: - if a character token has been read then c is this character and vs = "" - if a verb token has been read then vs is the string of this verb token (always non-empty) and c is ' ' *) if is_empty s then try (match next_html inbuf with (CHAR c) -> (c,"", false) | (VERB s) -> ' ',s,false) with End_of_file -> (' ',"",true) else (getc s,"",false) and gettrans q c = (* get the new state from q with letter c, taking tree and board into consideration *) try get_transition tree.(q) (if isblank c then ' ' else c) with Not_found -> if q=0 then 0 else gettrans board.(q) c and gettree q c = (* return the state obtained from node q with character c with tree transition, 0 when a tree transition is not possible *) try get_transition tree.(q) c with Not_found -> 0 (***************************************************************************) (* At every moment, the search engine has stored in a "window" the part of past input that still is under consideration. The search engine tries to find the earliest position in the string window^rest_of_input such that some search pattern is prefix of the substring of window^rest_of_input starting at that position. Once this position fixed, the search engine tries to find the longest such prefix. In the window, multiple white space is compressed to one white space character. In any case, the window is a prefix of some search pattern. The engine can be in one of three possible states, realised by the three main functions: - go: no factor of the window is a search pattern. - lock: some prefix of the window is a search pattern. In this case we just try to extend the prefix to an ever longer prefix that is a search pattern. - try: some factor that is not a prefix of the window is a search pattern (that is the search patterns starts at a later position in the window). In this case there is still hope that we might find a earlier position in the window where a search patterns starts. The search engine uses of course the automaton (see explications in automaton.mli). For each of the three functions, we have that - q is a node of the automaton, and path(q) = window - lastblank = true iff the last character of window is white space. If the window is empty then lastblank has no significance. *) (*****************************************************************************) in let rec run_go q lastblank = (* This is the initial function called. We execute the automaton, changing into "lock" when we find a final state and into "try" when we find an internal final state. *) let (c,vs,stop) = read () in if stop then output_string outc (getall ext) else if vs <> "" then (* reset the automaton *) begin output_string outc (getall ext); output_string outc vs; run_go 0 false end else if isblank(c) && lastblank then begin if q=0 then output_char outc c; run_go q true end else let nq = gettrans q c in if nq = 0 then begin output_string outc (getall ext); output_char outc c; run_go 0 (isblank c); end else begin addc ext c; output_string outc (gets ext (level.(q)-level.(nq)+1)); if suf.(nq) = nq then (* ext is a pattern *) run_lock nq nq (getall ext) false else if suf.(nq) <> 0 then (* some proper suffix of ext that is a pattern. *) run_try nq suf.(nq) (level.(nq)-level.(suf.(nq))) (isblank c) else (* no factor of ext is a pattern *) run_go nq (isblank c) end (******************************************************************************) and run_lock q foundstate foundname lastblank = (* window = foundname ^ (contents_of ext) foundname is the longest prefix of window that is a pattern. path(foundstate) = foundname We hence just try to extend found by tree transitions. We don't care for the board or internal final states here. If we cannot proceed with tree transitions we output foundname and start over with what we have buffered in ext. *) let (c,vs,stop) = read () in if stop then (* no more input. print what we have found so far and start over *) begin output_string outc (subst found.(foundstate) expand.(foundstate) foundname); push s (getall ext); run_go 0 false end else if vs <> "" then (* reset the automaton *) begin output_string outc (subst found.(foundstate) expand.(foundstate) foundname); output_string outc vs; push s (getall ext); run_go 0 false end else if isblank(c) && lastblank then (* q can not be 0 in go_lock *) run_lock q foundstate foundname true else begin addc ext c; let nq = gettree q c in if nq = 0 then (* No more tree transition possible. Print what we have so far in foundname, put the contents of ext plus c back into the input buffer, and start over in state 0. *) begin output_string outc (subst found.(foundstate) expand.(foundstate) foundname); push s (getall ext); run_go 0 false end else if suf.(nq)=nq then (* nq is again a final state, extend foundname *) run_lock nq nq (foundname^(getall ext)) false else (* nq is not a final state *) run_lock nq foundstate foundname (isblank c) end (**************************************************************************) and run_try q bq off lastblank = (* This is the most complicated case. ext is the window, and path(q) = window off is the earliest position of the window such that some search pattern is a prefix of the sub-string of the window starting at that position. path(bq) is this search pattern. *) let (c,vs,stop) = read () in if stop then begin (let fo = gets ext (level.(bq)) in output_string outc (subst found.(bq) expand.(bq) fo)); push s (getall ext); run_go 0 false end else if vs <> "" then (* reset the automaton *) begin (let fo = gets ext (level.(bq)) in output_string outc (subst found.(bq) expand.(bq) fo)); output_string outc vs; push s (getall ext); run_go 0 false end else if isblank(c) && lastblank then (* q can not be 0 in go_try *) run_try q bq off true else begin addc ext c; let nq = gettrans q c in let offset = level.(q) - level.(nq) + 1 in if offset < off then (* we can do the transition *) begin output_string outc (gets ext offset); if suf.(nq) = nq then (* final state *) run_lock nq nq (getall ext) false else if suf.(nq) <> 0 then (* internal final state *) let newoff = level.(nq) - level.(suf.(nq)) in if newoff < off - offset then run_try nq suf.(nq) (off-offset-newoff) (isblank c) else run_try nq bq (off-offset) (isblank c) else (* nq not final and not internal final *) run_try nq bq (off-offset) (isblank c) end else (* offset >= off *) (* we can not advance the start position of the window since this would cut the pattern that we already have found in the interior of ext. hence we just commit to the factor that we have found. *) begin output_string outc (gets ext off); let foundname = gets ext level.(bq) in begin push s (getall ext); run_lock bq bq foundname false end end end in run_go 0 false ;; hlins-0.39/source/hlins.ml0100664000076400001440000001526407654273266014135 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open Arg;; open Str;; open Errors;; open Files;; open Dumpdb;; (*************************************************************************) (* process options *) let dblist = ref [];; let out_filename = ref "";; let in_filename = ref "";; let mlist = ref [];; let quiet = ref false;; let recursivemode = ref false;; let dbtohtmlmode = ref false;; let tmpdir = ref (try Unix.getenv "TMPDIR" with Not_found -> "/tmp");; let description = "Hlins: insert urls into a file." in let rec dbact = function s -> dblist := !dblist @ (split (regexp "[ \t\n]+") s) and mact = function s -> mlist := !mlist @ (split (regexp "[ \t\n]+") s) and oact = function s -> if !out_filename = "" then out_filename := s else stopwitherror "Multiple \"-o\" option" and tact = function s -> tmpdir := s and printversion = function () -> begin print_string ("Hlins v"^Version.version); print_newline (); exit 0 end and printusage = function () -> begin usage optionsspec description; exit 0 end and optionsspec = [ ( "-db", String (dbact), "list of address database files" ); ( "--data-bases", String (dbact), "list of address database files" ); ( "-o", String (oact), "output file" ); ( "--output-file", String (oact), "output file" ); ( "-m", String (mact), "file to be modified" ); ( "--modify", String (mact), "file to be modified" ); ( "-R", Set recursivemode, "descend recursively directories" ); ( "--recursive", Set recursivemode, "descend recursively directories" ); ( "-td", String (tact), "temp directory" ); ( "--temp-dir", String (tact), "temp directory" ); ( "--db-to-html", Set dbtohtmlmode, "dump databases as html to stdout" ); ( "-q", Set quiet, "supress diagnostic output" ); ( "--quiet", Set quiet, "supress diagnostic output" ); ( "-v", Unit(printversion), "show version and exit" ); ( "--version", Unit(printversion), "show version and exit" ); ( "-h", Unit(printusage), "show this usage info" ); ( "--help", Unit(printusage), "show this usage info" ) ] in parse optionsspec (function s -> if !in_filename = "" then in_filename := s else stopwitherror "Multiple input files.") description;; if samefile !out_filename !in_filename then stopwitherror "Input and output files must differ";; (* check for mutually exclusive options *) let exclusiveopts o1 o2 = stopwitherror ( "options " ^o1^ " and " ^o2^ " exclude are each other." ) ;; if !dbtohtmlmode then if !mlist <> [] then exclusiveopts "--db-to-html" "-m (--modify)" else if !out_filename <> "" then exclusiveopts "--db-to-html" "-o (--output-file)" else () else if !mlist <> [] && !out_filename <> "" then exclusiveopts "-o (--output-file)" "-m (--modify)" (***************************************************************************) (* functions for the different modi of hlins *) (* function that acts like a filter on channels. On termination both channels are closed. *) let hlins_filter automaton replace inc ouc = Run_automaton.run automaton (Lexing.from_channel inc) replace ouc; close_in inc; close_out ouc ;; (* function that acts by modification on a file *) let hlins_modify automaton replace tmpfile filename = hlins_filter automaton replace (try open_in filename with Sys_error s -> stopwitherror ("Cannot read input file "^filename^": "^s)) (try open_out tmpfile with Sys_error s -> stopwitherror ("Cannot open temporary file "^tmpfile^": "^s)); move tmpfile filename ;; (* function that acts on input file and output file *) let hlins_fromto automaton replace infile outfile = hlins_filter automaton replace (if infile <> "" then try open_in infile with Sys_error s -> stopwitherror ("Cannot read input file "^s) else stdin ) (if outfile <> "" then try open_out outfile with Sys_error s -> stopwitherror ("Cannot open output file "^s) else stdout ) ;; (* function that acts recursively on a directory *) let rec hlins_dir automaton replace tmpfile path dir = let (htmlfiles,subdirectories) = scandir path dir in begin List.iter (function f -> hlins_modify automaton replace tmpfile (path^"/"^dir^"/"^f)) htmlfiles; List.iter (function d -> hlins_dir automaton replace tmpfile (path^"/"^dir) d) subdirectories end ;; (***************************************************************************) (* now do it *) let hitcount = ref 0;; if not !quiet then prerr_string ("Hlins version "^Version.version^".\n");; let (name_list,url_list) = Read_databases.read_dbs !dblist in let urls = Array.of_list url_list in if !dbtohtmlmode then dumpdb name_list urls else let names = Array.of_list name_list in let automaton = Build_automaton.build name_list and replace = Replace.replace_function !quiet names urls hitcount in Gc.compact (); if !mlist = [] then hlins_fromto automaton replace !in_filename !out_filename else let tmpfile = newtmpfile !tmpdir in if !recursivemode then let (htmlfiles,subdirectories) = select "." (expanddot !mlist) in begin List.iter (function f -> hlins_modify automaton replace tmpfile f) htmlfiles; List.iter (function d -> hlins_dir automaton replace tmpfile "." d) subdirectories end else List.iter (function f -> hlins_modify automaton replace tmpfile f) !mlist ;; if not !quiet then prerr_string ("Number of insertions: " ^ (string_of_int !hitcount) ^"\n");; flush stderr;; hlins-0.39/source/replace.ml0100644000076400001440000000407007654273266014422 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (* (allequal l), where l is a list, tests whether all elements of l are equal *) let allequal = function [] -> true | (h::l) -> List.for_all (function x -> x=h) l;; (* (geturl names urls foundinds expandinds) returns a url for name where name is the array of names, urls the array of urls, foundinds the set of indices of the word found and expandinds the set of indices of words that the word found is an abbreviation of. *) let geturl quiet names urls foundinds expandinds s = let foundurls = List.map (Array.get urls) (foundinds@expandinds) in begin if quiet || allequal foundurls then List.hd foundurls else begin prerr_string ("A conflict occured on the name "^s^":\n"); List.iter (function i -> prerr_string (" has url "^urls.(i)^"\n")) foundinds; List.iter (function i -> prerr_string (" abbreviates "^names.(i)^ " which has url "^ urls.(i)^"\n")) expandinds; prerr_endline ("Choosing " ^ (List.hd foundurls)); List.hd foundurls end; end ;; let replace_function quiet names urls hitcounter foundinds expandinds s= incr hitcounter; "" ^ s ^ "") ;; hlins-0.39/source/replace.mli0100644000076400001440000000277207654273266014602 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (*********************************************************************) (* function that does the actual replacement *) (*********************************************************************) open Array;; val replace_function : bool (* quiet operation mode ? *) -> string array (* the array of names *) -> string array (* the corresponding array of urls *) -> int ref (* hit counter *) -> int list (* indices of full names that match *) -> int list (* indices of abbreviations that match *) -> string (* string that matches *) -> string hlins-0.39/source/dumpdb.mli0100644000076400001440000000243307654273266014434 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) (*************************************************************************) (* dump an adress data base in html format *) (*************************************************************************) (* (dumpdb al ul) dumps to stdout an html listing of the address list al with links in the list ul. Both lists are asumed to be of same length. *) val dumpdb : string list -> string array -> unit;; hlins-0.39/source/dumpdb.ml0100644000076400001440000000514107654273266014262 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) open String;; open Str;; open List;; (* (addindices [s1;...;sn]) yields [(1,p1,l1),...,(n,pn,ln)] where pi is the part of string s1 up to the last blank, and ln is the word form the last blank on. *) let addindices l = let rec addindices_aux i = function [] -> [] | h::r -> (try let bi = rindex h ' ' in (i,(string_before h bi),(string_after h (bi+1))) with Not_found -> (i,"",h) )::(addindices_aux (i+1) r) in addindices_aux 0 l ;; let preamble = " Hlins Database Listing

Hlins Database Listing

";; let postamble = " ";; let dumpdb al ul = begin print_string preamble; fold_left (fun (lastname, lastfirstname, lasturl, lastchar) (i,p,l) -> let lc = String.get l 0 and u = ul.(i) in begin if lastname <> l || lastfirstname <> p then print_string "
\n"; if lc <> lastchar then begin print_string "Alternative Site]\n" end else () else begin print_string ""; print_string p; if String.length p <> 0 then print_char ' '; print_string ""; print_string l; print_string "\n" end; (l,p,u,lc) end) ("","","",Char.chr(0)) (sort (fun (_,p1,l1) (_,p2,l2) -> let lc = compare l1 l2 in if lc = 0 then compare p1 p2 else lc ) (addindices al)); print_string postamble end hlins-0.39/source/.#addr_lex.mll0120777000076400001440000000000007654273266021177 2rt@localhost.10704:1051722211ustar rtusershlins-0.39/source/#addr_lex.mll#0100644000076400001440000000403307654273266015052 0ustar rtusers(* HLins: insert http-links into HTML documents. See http://www.lri.fr/~treinen/hlins Copyright (C) 1999 Ralf Treinen This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *) { let ln = ref 1;; (* line number *) exception Error_lex of int*string;; exception Eof;; } let linespace = [' ''\t'] (* next_name_tab_url: called when we expect a name. Ignore any line that starts with the comment sign, and any white space. We accept as a name any string that does not contain "=" or newline, and that does not end or start on white space. The character "=" can be escaped as "==". *) rule next_name_sep_url = parse '#'[^'\n']*('\n') { incr ln; next_name_sep_url lexbuf } | linespace+ { next_name_sep_url lexbuf } | '\n' { incr ln; next_name_sep_url lexbuf } | [^' ''#''\t''\n''='](([^'=''\n']|"==")*[^' ''\t''\n''='])? { let name = Lexing.lexeme lexbuf in (name , sep_url lexbuf) } | eof { raise Eof } (* separator *) and sep_url = parse linespace* '=' linespace* { url lexbuf } | _ { raise (Error_lex (!ln,"No separator found"))} (* any non-empty string up to the end of the line is accepted as url *) and url = parse [^'\n']+ { Lexing.lexeme lexbuf } | ('\n'|eof) { raise (Error_lex (!ln,"No URL found"))} hlins-0.39/doc/0040755000076400001440000000000007654273266011724 5ustar rtusershlins-0.39/doc/Makefile0100664000076400001440000000124507654273266013365 0ustar rtusers# Makefile for the Hlins documentation ####################################################################### FILES = hlins-doc.html changelog LICENSE examples DOCDIR = /usr/share/doc/hlins MANDIR = /usr/share/man/man1 DESTDIR = ######################################################################## hlins-doc.html: install: hlins-doc.html mkdir -p $(DESTDIR)/$(DOCDIR) \cp -r $(FILES) $(DESTDIR)/$(DOCDIR) cp hlins.1 $(DESTDIR)/$(MANDIR) clean: (cd examples/hlins-documentation && make clean) (cd examples/test-examples && make clean) distclean: -rm -f *~ (cd examples/hlins-documentation && make distclean) (cd examples/test-examples && make distclean) hlins-0.39/doc/changelog0100664000076400001440000000341107654273266013574 0ustar rtusersv0.39 [1/5/2003] ----------------- fixed a bug with line breaks in names: run_automaton now always normalises white space to one blank when looking up a transition in the automaton. v0.38 [10/10/00] ---------------- added autonconf (stuff shamelessly copied from bibtex2html) improved data base dump for multiple addresses of one person v0.37 [17/9/00] --------------- reorganised make targets to allow smoother generation of byte code version v0.36 [2/9/00] -------------- changed implementation of transition lists in the automaton to Map. v0.35 [25/07/00] ---------------- revised the reading of html files and introduced a verbatim token. automaton resets on verbatim token. revised documentation Makefiles to avoid dependency on hevea. restructured documentation (examples directory). documentation/examples: tests. abbreviation of names on all components, use < > to protect. added modify mode added db-to-html mode v0.34 --------------- html scanner bypasses html contexts that are to be ignored (head etc.) v0.33 [10/03/00] --------------- much faster processing of data base files. fixed a bug with anchors coming from the input document. accept GNU style long options. option "-quiet" does no longer exist, use "-q" or "--quiet" now. v0.32 [16/12/99] ---------------- changed internal representation of the data base, some optimisations, code revision and cleanup, documentation of the code. v0.30 [22/11/99] ---------------- complete rewrite of the search engine. uses now an algorithm based on Knuth-Morris-Pratt for multiple patterns. v0.21 [22/10/99] ---------------- changed the field seperator from to ":" v0.20 [21/10/99] ---------------- works now on html instead of as an HeVeA pre-processor v0.10 [20/10/99] ---------------- first alpha version hlins-0.39/doc/hlins-doc.adr0100644000076400001440000000074007654273266014272 0ustar rtusers# address data base used for the hlins documentation # name with 8-bit character Jean-Christophe Filliâtre = http://www.lri.fr/~filliatr # name with HTML special character Claude Marché = http://www.lri.fr/~marche # multiple spaces in a name are ignored Ralf Treinen = http://www.lri.fr/~treinen # surpress abbreviation of the "first name" Caml = http://caml.inria.fr/ HeVeA = http://para.inria.fr/~maranget/hevea/index.html hlins-0.39/doc/hlins.10100664000076400001440000000475107654273266013131 0ustar rtusers.\" Hey, EMACS: -*- nroff -*- .\" First parameter, NAME, should be all caps .\" Second parameter, SECTION, should be 1-8, maybe w/ subsection .\" other parameters are allowed: see man(7), man(1) .TH HLINS 1 "July 25, 2000" .\" Please adjust this date whenever revising the manpage. .\" .\" Some roff macros, for reference: .\" .nh disable hyphenation .\" .hy enable hyphenation .\" .ad l left justify .\" .ad b justify to both left and right margins .\" .nf disable filling .\" .fi enable filling .\" .br insert line break .\" .sp insert n+1 empty lines .\" for manpage-specific macros, see man(7) .SH NAME hlins \- insert url's into html documents .SH SYNOPSIS .B hlins .RI [ options ] .RI [ infile ] .SH DESCRIPTION \fBhlins\fP is a program that inserts hypertext links into html documents, according to one or several data bases associating addresses (url's) to names. \fBhlins\fP is designed for inserting links for persons: It knows about abbreviations of first and middle names and tolerated dropping of the last part of a composite last name. If no file argument is given then input is taken from stdin; when no output option (see below) is given then output goes to stdout. For a complete description, see the documention in html format. .SH OPTIONS \fBhlins\fP follows the usual GNU command line syntax, with long options starting with two dashes (`--'). .TP .B \-o, \-\-output-file \fIfile\fP Write to \fIfile\fP instead of standard output .TP .B \-h, \-\-help Show a summary of options. .TP .B \-v, \-\-version Show the version of the program. .TP .B \-q, \-\-quiet Surpress diagnostic output. .TP .B \-db, \-\-data-bases \fIfiles\fP ... Use \fIfiles\fP ... as data bases .TP .B \-m, \-\-modify \fIfiles\fP ... Dont't act as filter but perform in-pace modifications of \fIfiles\fP. .TP .B \-R, \-\-recursive Recursively descend into directories and act on all files with names ending on \fI.html\fP. .TP .B \-td, \-\-tmp-dir \fIdir\fP Use directory \fIdir\fP to create temporary files. .TP .B \-\-db-to-html List the address data bases in HTML format to stdout. .SH ENVIRONMENT .TP .B TMPDIR default directory for creating temporay files. .SH VERSION This manual pages describes version 0.39. .SH SEE ALSO The full documentation with examples should be available in \fB/usr/share/doc/hlins/\fP. .br See also the hlins home page \fBhttp://www.lsv.ens-cachan.fr/~treinen/hlins/\fP. .SH AUTHOR Ralf Treinen . hlins-0.39/doc/LICENSE0100644000076400001440000003545607654273266012743 0ustar rtusers GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS hlins-0.39/doc/examples/0040755000076400001440000000000007654273266013542 5ustar rtusershlins-0.39/doc/examples/hlins-documentation/0040755000076400001440000000000007654273266017526 5ustar rtusershlins-0.39/doc/examples/hlins-documentation/Makefile0100664000076400001440000000122307654273266021163 0ustar rtusers# Makefile for the Hlins documentation # Requires a running hlins and hevea ####################################################################### hlins-doc.html: hlins-doc-nolinks.html ../../hlins-doc.adr hlins -db ../../hlins-doc.adr -o hlins-doc.html hlins-doc-nolinks.html cp hlins-doc.html ../.. hlins-doc-nolinks.html: hlins-doc.tex hevea < hlins-doc.tex > hlins-doc-nolinks.html hlins-doc.dvi: hlins-doc.tex latex hlins-doc hlins-doc.ps: hlins-doc.dvi dvips -o hlins-doc.ps hlins-doc.dvi clean: -rm -f *.aux *.log hlins-doc.{dvi,ps,html,log,aux} distclean: clean -rm -f *~ genclean: clean -rm -f hlins-doc-nolinks.html hlins-doc.html hlins-0.39/doc/examples/hlins-documentation/README0100644000076400001440000000050107654273266020377 0ustar rtusersREADME for hlins examples/hlins-documentation ============================================= This example shows how to compile the hlins documentation from a LaTeX source using HeVeA as LaTeX to HTML converter, and than hlins to insert links. HeVeA can be obtained from http://para.inria.fr/~maranget/hevea/index.html. hlins-0.39/doc/examples/hlins-documentation/hlins-doc-nolinks.html0100664000076400001440000003205407654273266023752 0ustar rtusers

Hlins: Hyper-Link Insertions in HTML documents Version 0.39

Hlins: Hyper-Link Insertions in HTML documents
Version 0.39

Ralf Treinen

May 1, 2003

1  An Introductory Example

Hlins inserts in a HTML document the url's (uniform resource locator) for certain names (normally the names of people), according to a data base associating url's to names.

First you have to create a data base that associates url's to names, let's call it addresses:
Donald Knuth        =  http://www-cs-staff.stanford.edu/...
Leslie Lamport      =  http://www.research.digital.com/...
Suppose that you have a HTML document mytext.html that contains text as
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor Donald
Knuth, which was used by L. Lamport as a base to build the more user-friendly
(but less powerful) LaTeX system.
Calling hlins -db addresses -o newtext.html mytext.html will generate a file newtext.html that contains now the piece of text
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor <a
href="http://www-cs-staff.stanford.edu/...">Donald Knuth</a>, which
was used by <a
href="http://www.research.digital.com/...">L. Lamport</a> as a base to
build the more user-friendly (but less powerful) LaTeX system.
which will eventually be rendered by a browser as something like
A milestone in the development of digital typesetting was the TeX system developed by Stanford computer science professor Donald Knuth, which was used by L. Lamport as a base to build the more user-friendly (but less powerful) LaTeX system.
Note that the url insertion knows about abbreviating first names (as for Leslie Lamport) and works over line breaks (as for Donald Knuth).

2  Usage

hlins [options] [inputfile]
Hlins can be used in three different modes (see below). The following general options exist:
-h, --help
Show summary of options and exit.
-v, --version
Show version of program ad exit.
-q, --quiet
Surpress diagnostic output.
-db, --data-bases files ...
Use files as address data bases. The string files is a blank-separated list of data base files, which means that you have to protect the blanks from your shell when using several data base files. Multiple -db options are accepted. Examples of usage strings in the csh shell are
hlins -db myaddresses
hlins -db "friends groupmembers"
hlins -db friends -db groupmembers
The last two invocations are equivalent.

2.1  Usage in filter mode

In filter mode, hlins reads html from one source and writes to a different target. Input is taken from the inputfile argument if existent, otherwise from stdin. Output goes by default to stdout.
-o, --output-file file
Write to file instead of standard output.

2.2  Usage in modify mode

In modify mode, hlins modifies html files in place.
-m, --modify-files files ...
Modify the files in-place..
-R, --recursive
recursively descend into directories and operate on all files with names ending on .html. Only effective in with the --modify option.

For instance, ``hlins -db ... -m  /WWW -R'' makes hlins operate on your complete WWW tree.
-td,--tempdir dir
When doing in-place modifications of files use the directory dir to create temporary files. Default is the value of the TMPDIR environment variable, and /tmp if TMPDIR is not set.

2.3  Usage in database list mode

--db-to-html
Lists the contents of the databases in html to standard output. This can be handy to create an adress book.

3  Secondary Effects on the HTML Text

Hlins replaces special characters of HTML (as &eacute; or &#233) by the corresponding ISO-8859-1 character, which is in this case é. Hence, you can use Hlins without any database argument to replace HTML special characters in a HTML document.

In some cases, non-empty sequences of white space characters may be replaced by one space. However, this happens only when the white space is part of a prefix of some name in the data base. Anyway, this replacement is irrelevant for the rendering of HTML documents.

4  Address Data Bases

Every line of the file must be either a comment line or an address specification. A comment-line is a line that either consists only of white space, or that starts with the comment-symbol # (possibly preceded by white space).

An address specification consists of a name and a url that are separated by the character = . Leading white space of the line is ignored. In the name, the character = must be written as ==.

Special characters in the name can be either written in HTML or as 8bit characters. The number of spaces separating the words of a name is not relevant.

The syntax of the url is not checked.

5  Variants of Names

Several variants of the names in the data base are recognized as well. To find the variants of a name we first split it at white spaces into components.
  • If a name consists of just one component than it has no variant other than itself.
  • Otherwise, the variants of the name are obtained by considering all possible combinations of variants of the components. The last component is treated differently from the other components:
    • If the last component contains the symbol - then the name without this - and everything behind is also recognized. Hence, if you have an entry for Egon Müller-Meier then Egon Müller is also recognized.
    • A component which is not the last component may be abbreviated, unless it consists of one only one letter or it terminates on a dot. The abbreviation of a first name is its first letter followed by a dot. In case of a word starting with St a further abbreviation is St followed by a dot, and a word starting on Ch has additional abbreviation Ch followed by a dot. Composite first names are abbreviated in both components, hence Marc-Stephane becomes M.-St. (but not, for instance, M.-Stephane).
    • In any case generation of variants is surpressed if you write the component in angular brackets like <Marc-Stephane>. This mechanism is used in the data base to produce this document, to have matching of Objective Caml but to avoid matching of O. Caml.

6  The Exact Rules of Searching Names

Names are searched starting from the beginning of the text. If there are overlapping matches then the match starting at the earlier position wins. For example, if the data base contains entries for Egon Meier and for Hans Egon Meier-Müller then the second one matches on input Hans Egon Meier-Müller.

A match is extended to longer matches if possible. That is, if the data base contains entries for Hans Egon and for Hans Egon Meier then the second one matches on input Hans Egon Meier.

7  The Exact Rules of URL Insertion

Hlins does not touch any text between <a ... href= ...> and </a>. Note that this applies only if the <a> tag contains the href attribute, that is hlins does look at text inside of <a name=...> and </a>. As a consequence, hlins is idempotent, that is if you apply hlins twice (for instance using the --modify option) to a file you get the same effect than with just one application. Hence, you can, when you extend your database, safely rerun hlins on your html files.

The replacment mechanism (including the normalisation of HTML special charactes) is shortcut for any text inside the following tags:
  • <head> ... </head>
  • <samp> ... </samp>
  • <kbd> ... </kbd>
  • <pre> ... </pre>
  • <div nohlins> ... </div>
The rationale is that the first four tags of this list are intended to mark some kind of verbatim text (see the HTML 4.01 specification). The last one is an escape mechanism in case you have to overrule hlins' mechanism. Text from the beginning of one of the start tags to the first occurrence of the corresping end mark is ignored. The consequence is that among the above list embedded tags of the same kind are not correctly treated.

Furthermore, text inside angular brackets < and > is not treated by hlins.

If there are several different url's for a string foundname then the following rules apply to determine the url inserted:
  1. An address specification ``name = url'' where name matches exactly (modulo white space and HTML special characters) foundname has priority over a name specification ``name = url'' where foundname is an abbreviation for name.
  2. In the list obtained from the above priority rule, the first match is taken.
A warning is issued in case of a conflict, unless the --quiet option has been given.

For instance, your data base might contain something like
Hans Meyer   =  http://address.for.full.name
H. Meyer     =  http://address.for.abbreviated.name
On input H. Meyer, the second address specification is selected (and a warning is issued).

8  Implementation

Hlins is written in Objective Caml.

9  License and Installation

Hins ins covered by the Gnu General Public License. See the Hlins home page for binary and source distributions.

10  Credits

Thanks to Claude Marché and Jean-Christophe Filliâtre for their remarks and suggestions.


This document was translated from LATEX by HEVEA.
hlins-0.39/doc/examples/hlins-documentation/hlins-doc.tex0100664000076400001440000002737507654273266022145 0ustar rtusers % HLins: insert http-links into LaTeX documents. % See http://www.lri.fr/~treinen/hlins % % Copyright (C) 1999 Ralf Treinen % % This program is free software; you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation; either version 2 of the License, or % (at your option) any later version. % % This program is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You should have received a copy of the GNU General Public License % along with this program; if not, write to the Free Software % Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA \documentclass{article} \usepackage[T1]{fontenc} \usepackage{french} \usepackage{alltt} \usepackage{hevea} \title{Hlins: Hyper-Link Insertions in HTML documents\\Version 0.39} \author{Ralf Treinen} \date{May 1, 2003} \def\knuthurl{http://www-cs-staff.stanford.edu/...} \def\lamporturl{http://www.research.digital.com/...} \begin{document} \maketitle \section{An Introductory Example} \emph{Hlins} inserts in a \url{http://www.w3.org/TR/html40}{HTML} document the url's (uniform resource locator) for certain names (normally the names of people), according to a data base associating url's to names. First you have to create a data base that associates url's to names, let's call it \texttt{addresses}: \begin{quote} \small \begin{alltt} Donald Knuth = \knuthurl Leslie Lamport = \lamporturl \end{alltt} \end{quote} Suppose that you have a HTML document \texttt{mytext.html} that contains text as \begin{quote} \small \begin{alltt} A milestone in the development of digital typesetting was the TeX system developed by Stanford computer science professor Donald Knuth, which was used by L. Lamport as a base to build the more user-friendly (but less powerful) LaTeX system. \end{alltt} \end{quote} Calling \texttt{hlins -db addresses -o newtext.html mytext.html} will generate a file \texttt{newtext.html} that contains now the piece of text \begin{quote} \small \begin{verbatim} A milestone in the development of digital typesetting was the TeX system developed by Stanford computer science professor Donald Knuth, which was used by L. Lamport as a base to build the more user-friendly (but less powerful) LaTeX system. \end{verbatim} \end{quote} which will eventually be rendered by a browser as something like \begin{quote} \small A milestone in the development of digital typesetting was the TeX system developed by Stanford computer science professor \url{http://www-cs-staff.stanford.edu/\%7eknuth/index.html}{Donald Knuth}, which was used by \url{http://www.research.digital.com/SRC/personal/Leslie_Lamport/home.html}{L. Lamport} as a base to build the more user-friendly (but less powerful) LaTeX system. \end{quote} Note that the url insertion knows about abbreviating first names (as for Leslie Lamport) and works over line breaks (as for Donald Knuth). \section{Usage} \begin{verbatim} hlins [options] [inputfile] \end{verbatim} Hlins can be used in three different modes (see below). The following general options exist: \begin{description} \item[\texttt{-h}, \texttt{--help}] Show summary of options and exit. \item[\texttt{-v}, \texttt{--version}] Show version of program ad exit. \item[\texttt{-q}, \texttt{--quiet}] Surpress diagnostic output. \item[\texttt{-db}, \texttt{--data-bases} \textit{files} $\ldots$] Use \textit{files} as address data bases. The string \textit{files} is a blank-separated list of data base files, which means that you have to protect the blanks from your shell when using several data base files. Multiple \verb|-db| options are accepted. Examples of usage strings in the \textit{csh} shell are \begin{verbatim} hlins -db myaddresses hlins -db "friends groupmembers" hlins -db friends -db groupmembers \end{verbatim} The last two invocations are equivalent. \end{description} \subsection{Usage in filter mode} In filter mode, hlins reads html from one source and writes to a different target. Input is taken from the \textit{inputfile} argument if existent, otherwise from \textit{stdin}. Output goes by default to \textit{stdout}. \begin{description} \item[\texttt{-o}, \texttt{--output-file} \textit{file}] Write to \textit{file} instead of standard output. \end{description} \subsection{Usage in modify mode} In modify mode, hlins modifies html files in place. \begin{description} \item[\texttt{-m}, \texttt{--modify-files} \textit{files} $\ldots$] Modify the \textit{files} in-place.. \item[\texttt{-R}, \texttt{--recursive}] recursively descend into directories and operate on all files with names ending on \texttt{.html}. Only effective in with the \texttt{--modify} option. For instance, ``\texttt{hlins -db $\ldots$ -m ~/WWW -R}'' makes hlins operate on your complete \texttt{WWW} tree. \item[\texttt{-td},\texttt{--tempdir} \textit{dir}] When doing in-place modifications of files use the directory \textit{dir} to create temporary files. Default is the value of the \texttt{TMPDIR} environment variable, and \texttt{/tmp} if \texttt{TMPDIR} is not set. \end{description} \subsection{Usage in database list mode} \begin{description} \item[\texttt{--db-to-html}] Lists the contents of the databases in html to standard output. This can be handy to create an adress book. \end{description} \section{Secondary Effects on the HTML Text} Hlins replaces special characters of HTML (as \verb|é| or \verb|é|) by the corresponding ISO-8859-1 character, which is in this case \verb|é|. Hence, you can use Hlins without any database argument to replace HTML special characters in a HTML document. In some cases, non-empty sequences of white space characters may be replaced by one space. However, this happens only when the white space is part of a prefix of some name in the data base. Anyway, this replacement is irrelevant for the rendering of HTML documents. \section{Address Data Bases} Every line of the file must be either a comment line or an address specification. A comment-line is a line that either consists only of white space, or that starts with the comment-symbol \verb|#| (possibly preceded by white space). An address specification consists of a name and a url that are separated by the character \verb|=| . Leading white space of the line is ignored. In the name, the character \verb|=| must be written as \verb|==|. Special characters in the name can be either written in HTML or as 8bit characters. The number of spaces separating the words of a name is not relevant. The syntax of the url is not checked. \section{Variants of Names} Several variants of the names in the data base are recognized as well. To find the variants of a name we first split it at white spaces into \emph{components}. \begin{itemize} \item If a name consists of just one component than it has no variant other than itself. \item Otherwise, the variants of the name are obtained by considering all possible combinations of variants of the components. The last component is treated differently from the other components: \begin{itemize} \item If the last component contains the symbol \verb|-| then the name without this \verb|-| and everything behind is also recognized. Hence, if you have an entry for \textit{Egon Müller-Meier} then \textit{Egon Müller} is also recognized. \item A component which is not the last component may be abbreviated, unless it consists of one only one letter or it terminates on a dot. The abbreviation of a first name is its first letter followed by a dot. In case of a word starting with \texttt{St} a further abbreviation is \texttt{St} followed by a dot, and a word starting on \texttt{Ch} has additional abbreviation \texttt{Ch} followed by a dot. Composite first names are abbreviated in both components, hence \texttt{Marc-Stephane} becomes \texttt{M.-St.} (but not, for instance, \texttt{M.-Stephane}). \item In any case generation of variants is surpressed if you write the component in angular brackets like \texttt{}. This mechanism is used in the \url{hlins-doc.adr}{data base to produce this document}, to have matching of \verb|Objective Caml| but to avoid matching of \verb|O. Caml|. \end{itemize} \end{itemize} \section{The Exact Rules of Searching Names} Names are searched starting from the beginning of the text. If there are overlapping matches then the match starting at the earlier position wins. For example, if the data base contains entries for \verb|Egon Meier| and for \verb|Hans Egon Meier-Müller| then the second one matches on input \verb|Hans Egon Meier-Müller|. A match is extended to longer matches if possible. That is, if the data base contains entries for \verb|Hans Egon| and for \verb|Hans Egon Meier| then the second one matches on input \verb|Hans Egon Meier|. \section{The Exact Rules of URL Insertion} Hlins does not touch any text between \verb|| and \verb||. Note that this applies only if the \verb|| tag contains the \verb|href| attribute, that is hlins \emph{does} look at text inside of \verb|| and \verb||. As a consequence, hlins is idempotent, that is if you apply hlins twice (for instance using the \texttt{--modify} option) to a file you get the same effect than with just one application. Hence, you can, when you extend your database, safely rerun hlins on your html files. The replacment mechanism (including the normalisation of HTML special charactes) is shortcut for any text inside the following tags: \begin{itemize} \item \verb|| $\ldots$ \verb|| \item \verb|| $\ldots$ \verb|| \item \verb|| $\ldots$ \verb|| \item \verb|
| $\ldots$ \verb|
| \item \verb|
| $\ldots$ \verb|
| \end{itemize} The rationale is that the first four tags of this list are intended to mark some kind of verbatim text (see the \url{htttp:/www.w3.org/TR/html401/}{HTML 4.01 specification}). The last one is an escape mechanism in case you have to overrule hlins' mechanism. Text from the beginning of one of the start tags to the first occurrence of the corresping end mark is ignored. The consequence is that among the above list embedded tags of the same kind are not correctly treated. Furthermore, text inside angular brackets \verb|<| and \verb|>| is not treated by hlins. If there are several different url's for a string \textit{foundname} then the following rules apply to determine the url inserted: \begin{enumerate} \item An address specification ``\textit{name} = \textit{url}'' where \textit{name} matches exactly (modulo white space and HTML special characters) \textit{foundname} has priority over a name specification ``\textit{name} = \textit{url}'' where \textit{foundname} is an abbreviation for \textit{name}. \item In the list obtained from the above priority rule, the first match is taken. \end{enumerate} A warning is issued in case of a conflict, unless the \verb|--quiet| option has been given. For instance, your data base might contain something like \begin{quote} \begin{alltt} Hans Meyer = http://address.for.full.name H. Meyer = http://address.for.abbreviated.name \end{alltt} \end{quote} On input \verb|H. Meyer|, the second address specification is selected (and a warning is issued). \section{Implementation} Hlins is written in Objective Caml. \section{License and Installation} Hins ins covered by the \url{LICENSE}{Gnu General Public License}. See the \url{http://www.lsv.ens-cachan.fr/\%7etreinen/hlins}{Hlins home page} for binary and source distributions. \section{Credits} Thanks to Claude Marché and Jean-Christophe Filli\^atre for their remarks and suggestions. \end{document} hlins-0.39/doc/examples/hlins-documentation/.haux0100664000076400001440000000151507654273266020475 0ustar rtusers\@addtocsec{htoc}{1}{0}{\@print{1}\quad{}An Introductory Example} \@addtocsec{htoc}{2}{0}{\@print{2}\quad{}Usage} \@addtocsec{htoc}{3}{1}{\@print{2.1}\quad{}Usage in filter mode} \@addtocsec{htoc}{4}{1}{\@print{2.2}\quad{}Usage in modify mode} \@addtocsec{htoc}{5}{1}{\@print{2.3}\quad{}Usage in database list mode} \@addtocsec{htoc}{6}{0}{\@print{3}\quad{}Secondary Effects on the HTML Text} \@addtocsec{htoc}{7}{0}{\@print{4}\quad{}Address Data Bases} \@addtocsec{htoc}{8}{0}{\@print{5}\quad{}Variants of Names} \@addtocsec{htoc}{9}{0}{\@print{6}\quad{}The Exact Rules of Searching Names} \@addtocsec{htoc}{10}{0}{\@print{7}\quad{}The Exact Rules of URL Insertion} \@addtocsec{htoc}{11}{0}{\@print{8}\quad{}Implementation} \@addtocsec{htoc}{12}{0}{\@print{9}\quad{}License and Installation} \@addtocsec{htoc}{13}{0}{\@print{10}\quad{}Credits} hlins-0.39/doc/examples/test-examples/0040755000076400001440000000000007654273266016335 5ustar rtusershlins-0.39/doc/examples/test-examples/directors.db0100644000076400001440000000052707654273266020643 0ustar rtusers# some movie directors Stanley Kubrick = hlins/kubrick Martin Scorsese = hlins/scorsese Francis Ford Coppola = hlins/coppola Fritz Lang = hlins/lang Friedrich Murnau = hlins/murnau Jean Renoir = hlins/renoir François Truffaut = hlins/truffaut Jean-Luc Godard = hlins/goddard Paul Thomas Anderson = hlins/anderson John Woo = hlins/woo hlins-0.39/doc/examples/test-examples/Makefile0100644000076400001440000000042407654273266017772 0ustar rtusersHLINS=../../../source/hlins all: test1.out test2.out test1.out: $(HLINS) directors.db test1.in $(HLINS) -db directors.db -o test1.out test1.in test2.out: $(HLINS) test2.db test2.in $(HLINS) -db test2.db -o test2.out test2.in clean: -rm *.out distclean: clean -rm *~ hlins-0.39/doc/examples/test-examples/test2.in0100644000076400001440000000015607654273266017725 0ustar rtusersTest of the automaton AC BBB AAC AAAAC AAA AAC AAAA XXXYYY XXXY XXYXXYY XXXYZ C C C C CC C C hlins-0.39/doc/examples/test-examples/test1.in0100644000076400001440000000260607654273266017726 0ustar rtusers Martin Scorsese

Test to use with the directors database

Testing protected contexts.

Title

There should be no link for Martin Scorsese in the title.

Code Environment

There is no link for Stanley Kubrick in the following: Stanley Kubrick

Kbd Environment

There is no link for Jean Renoir in the following: Jean Renoir

Div Environment

There should be a link for Fritz Lang in the following:
Fritz Lang
but not in the following:
Fritz Lang
.

Anchors

There should be a link for John Woo in the following: John Woo. In the following there should be only a link to hongkong: John Woo.

Inside Tags

No links should be inserted inside the attributes of tags: Bla Bla.

Testing different character representations

Different representatins of the character ç
Normal characters çFrançois Truffaut
Character entity references &ccdil; Fran&ccdil;ois Truffaut
Decimal numerical reference&#231; François Truffaut
Hexadecimal numerical reference&#xE7; François Truffaut
hlins-0.39/doc/examples/test-examples/test2.db0100644000076400001440000000011307654273266017675 0ustar rtusersA = A AA = AA AAA = AAA XY = XY XXYY = XXYY XXXYYY = XXXYYY C C = cclink hlins-0.39/doc/hlins-doc.html0100664000076400001440000003231307654273266014473 0ustar rtusers

Hlins: Hyper-Link Insertions in HTML documents Version 0.39

Hlins: Hyper-Link Insertions in HTML documents
Version 0.39

Ralf Treinen

May 1, 2003

1  An Introductory Example

Hlins inserts in a HTML document the url's (uniform resource locator) for certain names (normally the names of people), according to a data base associating url's to names.

First you have to create a data base that associates url's to names, let's call it addresses:
Donald Knuth        =  http://www-cs-staff.stanford.edu/...
Leslie Lamport      =  http://www.research.digital.com/...
Suppose that you have a HTML document mytext.html that contains text as
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor Donald
Knuth, which was used by L. Lamport as a base to build the more user-friendly
(but less powerful) LaTeX system.
Calling hlins -db addresses -o newtext.html mytext.html will generate a file newtext.html that contains now the piece of text
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor <a
href="http://www-cs-staff.stanford.edu/...">Donald Knuth</a>, which
was used by <a
href="http://www.research.digital.com/...">L. Lamport</a> as a base to
build the more user-friendly (but less powerful) LaTeX system.
which will eventually be rendered by a browser as something like
A milestone in the development of digital typesetting was the TeX system developed by Stanford computer science professor Donald Knuth, which was used by L. Lamport as a base to build the more user-friendly (but less powerful) LaTeX system.
Note that the url insertion knows about abbreviating first names (as for Leslie Lamport) and works over line breaks (as for Donald Knuth).

2  Usage

hlins [options] [inputfile]
Hlins can be used in three different modes (see below). The following general options exist:
-h, --help
Show summary of options and exit.
-v, --version
Show version of program ad exit.
-q, --quiet
Surpress diagnostic output.
-db, --data-bases files ...
Use files as address data bases. The string files is a blank-separated list of data base files, which means that you have to protect the blanks from your shell when using several data base files. Multiple -db options are accepted. Examples of usage strings in the csh shell are
hlins -db myaddresses
hlins -db "friends groupmembers"
hlins -db friends -db groupmembers
The last two invocations are equivalent.

2.1  Usage in filter mode

In filter mode, hlins reads html from one source and writes to a different target. Input is taken from the inputfile argument if existent, otherwise from stdin. Output goes by default to stdout.
-o, --output-file file
Write to file instead of standard output.

2.2  Usage in modify mode

In modify mode, hlins modifies html files in place.
-m, --modify-files files ...
Modify the files in-place..
-R, --recursive
recursively descend into directories and operate on all files with names ending on .html. Only effective in with the --modify option.

For instance, ``hlins -db ... -m  /WWW -R'' makes hlins operate on your complete WWW tree.
-td,--tempdir dir
When doing in-place modifications of files use the directory dir to create temporary files. Default is the value of the TMPDIR environment variable, and /tmp if TMPDIR is not set.

2.3  Usage in database list mode

--db-to-html
Lists the contents of the databases in html to standard output. This can be handy to create an adress book.

3  Secondary Effects on the HTML Text

Hlins replaces special characters of HTML (as &eacute; or &#233) by the corresponding ISO-8859-1 character, which is in this case é. Hence, you can use Hlins without any database argument to replace HTML special characters in a HTML document.

In some cases, non-empty sequences of white space characters may be replaced by one space. However, this happens only when the white space is part of a prefix of some name in the data base. Anyway, this replacement is irrelevant for the rendering of HTML documents.

4  Address Data Bases

Every line of the file must be either a comment line or an address specification. A comment-line is a line that either consists only of white space, or that starts with the comment-symbol # (possibly preceded by white space).

An address specification consists of a name and a url that are separated by the character = . Leading white space of the line is ignored. In the name, the character = must be written as ==.

Special characters in the name can be either written in HTML or as 8bit characters. The number of spaces separating the words of a name is not relevant.

The syntax of the url is not checked.

5  Variants of Names

Several variants of the names in the data base are recognized as well. To find the variants of a name we first split it at white spaces into components.
  • If a name consists of just one component than it has no variant other than itself.
  • Otherwise, the variants of the name are obtained by considering all possible combinations of variants of the components. The last component is treated differently from the other components:
    • If the last component contains the symbol - then the name without this - and everything behind is also recognized. Hence, if you have an entry for Egon Müller-Meier then Egon Müller is also recognized.
    • A component which is not the last component may be abbreviated, unless it consists of one only one letter or it terminates on a dot. The abbreviation of a first name is its first letter followed by a dot. In case of a word starting with St a further abbreviation is St followed by a dot, and a word starting on Ch has additional abbreviation Ch followed by a dot. Composite first names are abbreviated in both components, hence Marc-Stephane becomes M.-St. (but not, for instance, M.-Stephane).
    • In any case generation of variants is surpressed if you write the component in angular brackets like <Marc-Stephane>. This mechanism is used in the data base to produce this document, to have matching of Objective Caml but to avoid matching of O. Caml.

6  The Exact Rules of Searching Names

Names are searched starting from the beginning of the text. If there are overlapping matches then the match starting at the earlier position wins. For example, if the data base contains entries for Egon Meier and for Hans Egon Meier-Müller then the second one matches on input Hans Egon Meier-Müller.

A match is extended to longer matches if possible. That is, if the data base contains entries for Hans Egon and for Hans Egon Meier then the second one matches on input Hans Egon Meier.

7  The Exact Rules of URL Insertion

Hlins does not touch any text between <a ... href= ...> and </a>. Note that this applies only if the <a> tag contains the href attribute, that is hlins does look at text inside of <a name=...> and </a>. As a consequence, hlins is idempotent, that is if you apply hlins twice (for instance using the --modify option) to a file you get the same effect than with just one application. Hence, you can, when you extend your database, safely rerun hlins on your html files.

The replacment mechanism (including the normalisation of HTML special charactes) is shortcut for any text inside the following tags:
  • <head> ... </head>
  • <samp> ... </samp>
  • <kbd> ... </kbd>
  • <pre> ... </pre>
  • <div nohlins> ... </div>
The rationale is that the first four tags of this list are intended to mark some kind of verbatim text (see the HTML 4.01 specification). The last one is an escape mechanism in case you have to overrule hlins' mechanism. Text from the beginning of one of the start tags to the first occurrence of the corresping end mark is ignored. The consequence is that among the above list embedded tags of the same kind are not correctly treated.

Furthermore, text inside angular brackets < and > is not treated by hlins.

If there are several different url's for a string foundname then the following rules apply to determine the url inserted:
  1. An address specification ``name = url'' where name matches exactly (modulo white space and HTML special characters) foundname has priority over a name specification ``name = url'' where foundname is an abbreviation for name.
  2. In the list obtained from the above priority rule, the first match is taken.
A warning is issued in case of a conflict, unless the --quiet option has been given.

For instance, your data base might contain something like
Hans Meyer   =  http://address.for.full.name
H. Meyer     =  http://address.for.abbreviated.name
On input H. Meyer, the second address specification is selected (and a warning is issued).

8  Implementation

Hlins is written in Objective Caml.

9  License and Installation

Hins ins covered by the Gnu General Public License. See the Hlins home page for binary and source distributions.

10  Credits

Thanks to Claude Marché and Jean-Christophe Filliâtre for their remarks and suggestions.


This document was translated from LATEX by HEVEA.
hlins-0.39/Makefile0100664000076400001440000000152607654273266012622 0ustar rtusersinclude Version PROGVER = hlins-$(VERSION) dummy: all configure: (cd source; ./configure) # if configured with "make configure": all: (cd source; make) install: (cd source; make install) (cd doc; make install) # for ocaml bytecode compilation all.bc: (cd source; make hlins.bc) install.bc: (cd source; make install.bc) (cd doc; make install) distrib: (cd doc/examples/hlins-documentation; make) make tarball # cleanup and building tarball tarball: distclean mkdir $(PROGVER) cp -r source doc Makefile Version $(PROGVER) -rm $(PROGVER)/*~ $(PROGVER)/*/*~ $(PROGVER)/*/*/*~ tar --exclude="CVS" --exclude=".cvsignore"\ -cvf $(PROGVER).tar $(PROGVER) gzip -f $(PROGVER).tar rm -r $(PROGVER) clean: (cd source; make clean) (cd doc; make clean) distclean: (cd source; make distclean) (cd doc; make distclean) -rm -r $(PROGVER) hlins-0.39/Version0100664000076400001440000000001707654273266012524 0ustar rtusersVERSION = 0.39