pax_global_header00006660000000000000000000000064122706051060014511gustar00rootroot0000000000000052 comment=71d0fc17bedc275317009d8ade7e939792166b67 mikmatch-1.0.7/000077500000000000000000000000001227060510600133135ustar00rootroot00000000000000mikmatch-1.0.7/Changes000066400000000000000000000234121227060510600146100ustar00rootroot00000000000000Changes in the successive versions of Mikmatch ============================================== Mikmatch is the port of micmatch 0.700 to the "new" Camlp4 (3.10). Mikmatch is not compatible with Camlp4 <= 3.09 or Camlp5. Please send bug reports, comments or feature requests to Martin Jambon or to the public forum at http://groups.google.com/group/micmatch !!! = some incompatibilities opt = optimizations +ui = additions in the user interface -ui = restrictions in the user interface bug = bug or security fix doc = major changes in the documentation pkg = changes in the structure of the package or in the installation procedure 2012-11-04 1.0.6: [bug] Fix for camlp4 4.00.0 2012-02-03 1.0.5: [pkg] Subpackage "mikmatch_pcre.top" for toplevel usage 2011-10-21 1.0.4: [bug] Fixed bug consisting in Not_found exception being raised in an alternative capturing an empty string. (losing compatibility with pcre-ocaml versions from 2004 or earlier) 1.0.3: [!!!] Builds with camlp4 3.12.0 but no longer with 3.11. Thanks to Jake Donham for the patch. 1.0.2: [bug] Fixed important name scoping bug (definitions were inserted too early). [bug] Fixed bug that was causing directives to be ignored. 1.0.1: [!!!] Support for Camlp4 3.11.0, including toplevel support which wasn't possible with Camlp4 3.10. Dropped support for Camlp4 3.10 (please use mikmatch 1.0.0 instead). Will restore compatibility with Camlp4 3.10 only in case of significant new mikmatch features. [pkg] Added dependency towards tophide. It is not a strict requirement but improves the toplevel experience. 1.0.0: [*] First release of mikmatch, translation of micmatch 0.700 for the "new camlp4" 3.10. [pkg] License: all source code is now distributed under the BSD license. [pkg] All occurrences of "micmatch" have been replaced by "mikmatch". [-ui] Temporarily no toplevel support for the syntax features, waiting for the availability of camlp4 filters for toplevel phrases. [pkg] Custom toplevels are no longer built and installed. [+ui] No more polymorphic value restriction. Functions that use mikmatch patterns can now be polymorphic. Many identifiers with the "__mikmatch" prefix are now visible as module structure items. [-ui] Dropped support for Camlp4's revised syntax. [-ui] No more -thread option. Only matching patterns that contain @ use a table that is shared among threads. User code is responsible for using locks if needed. ------------------------- Micmatch (for camlp4 <= 3.09) ------------------ This is the history for micmatch before the port to Camlp4 3.10, renamed mikmatch. 0.700: [bug] It was not possible to use "Not" outside of micmatch_pcre regexps. This is now fixed. 0.699: [+ui] added CAPTURE and COLLECTOBJ 0.698: [bug] fixed dynamic linking problems on MacOS/NetBSD by removing version_filter. As a consequence, only recent versions of camlp4 are now supported (starting from 3.08.4, maybe earlier). [bug] fixed missing dependency on "common" for "pcre" and "str" targets in main Makefile 0.697: [bug] installation of executables now correctly follows $BINDIR or $PREFIX/bin [+ui] new FILTER macro which returns true or false [+ui] changed grammar entry level of macros (now "expr1" instead of "top"). Allows for less parentheses. [pkg] added dependency to the Unix library [+ui] added filename globbing in the Micmatch library [+ui] added experimental support for views 0.696: [pkg] removed micmatch_pcre.godiva which is a big source of trouble 0.695: [pkg] minor changes for GODI [bug] added .PHONY targets in Makefile 0.694: [+ui] int and float predefined regexps are not experimental anymore [bug] fixed null-character related bugs (Pcre.quote does not escape them) [bug] fixed bug with ocaml 3.08.1 (and probably earlier): Not_found was raised during initialization of the preprocessing library (deletion of grammar rules which didn't exist) 0.693: [+ui] added support for regexps with arguments (gaps of the form @some_expr) which are evaluated at runtime. It uses a cache which stores the most recently used compiled regexps for different sets of arguments. [opt] optionally shared data structures are now created only where necessary (just reduces the code size a little) 0.692: [bug, +ui] added "nan" and "inf" (caseless, optional sign) to the set of strings recognized by the "float" pattern. 0.691: [+ui] experimental addition of predefined regexps "int" and "float" 0.690: [+ui] added support for global shortcut bindings: let / ... / = ... ;; 0.689: [+ui] - added support for shortcuts: let RE ... = ... in ... let / ... / = ... in ... - official support for / ... / as an equivalent of RE ... (patterns) - general support for: let try ... = ... in ... with ... [bug] code with misplaced RE patterns cannot compile anymore 0.688: [+ui] added support for automatic type conversions: (... as x : int) (... as x := int_of_string) (... as x = Zero) [bug] fixed bug which caused some alternative patterns to be ignored. Simplest example that caused the bug: match "c" with (RE "a") | (RE "b") | (RE "c") -> ();; 0.687: [doc] added warning against inaccessible named subgroups or positional markers such as in ("abc" as local !local)* [+ui] added option -do for command-line programs (like sh -c or perl -e) 0.686: [bug] fixed bugs related to patterns matching empty substrings (avoiding infinite loops in such cases; for instance (SPLIT "") and (SPLIT "") ~full:true both work). 0.685: [bug] Fixed the following bugs: - was counting special "any" as 0 characters instead of 1 (lookbehind assertions) - "Not_found" problems with recent versions of Pcre due to the 2004-04-29 change in Pcre.get_substring - silent removal of bindings which don't make sense in assertions (BTW I don't know how to print a precise warning instead) 0.684: [+ui] added support for lookaround assertions (PCRE only): < lookbehind . lookahead > or < lookahead > 0.683: [+ui] added support for positional markers in regexps (e.g. %pos) 0.682: [bug] added compatibility with OCaml 3.09+dev14 0.681: [bug] fixed bug in REPLACE_FIRST 0.680: [bug] replaced the regexp-pp package with the newest version for compatibility with OCaml 3.09+dev6 0.679: [+ui] added ~share option to MATCH and SEARCH_FIRST [doc] updated and commented example/shootout.ml 0.678: [+ui] added experimental / ... / syntax in patterns 0.677: [pkg] fixed bugs in META files [+ui] added "save" and "save_lines" functions to the library [bug] now bos, eos, bol and eol assertions work (micmatch_pcre) 0.676: [pkg] added version ID in the name of the archives [bug] - fixed bug which prevented the use of the Camlp4 syntax extension for stream parsers (was due to a wrong LEVEL) - fixed fatal bug in micmatch_str which was accidentally introduced in the last version (was due to an inexisting LEVEL) [+ui] added a short Micmatch.Fixed module for handling text with fixed-width columns. 0.675: [bug] several bugfixes in the installation procedure (correct clean, any name for gmake OK, camlmix not required) [pkg] separate installation of micmatch_pcre (default) and micmatch_str (now optional) 0.674: [+ui] - added full support for PCRE-OCaml, with many additional macros and specifications - POSIX characters classes are now predefined for both micmatch_str and micmatch_pcre - micmatch and micmatch_str are now binary executables so that micmatch scripts can be made self-executable on Unix-like systems [doc] updated the reference manual and the web page [-ui] deprecated use of {123-}. Use {123+} instead. 0.673: [bug] added support for OCaml 3.08.1 (replaced regexp-pp package) 0.672: [+ui,bug] - added checks for unbalanced or redundant local bindings - added support for local backreferences 0.671: [+ui] - added support for backreferences (!ident) - tries alternatives from left to right, and is greedy by default for optional matches (? operator). Not official, since these properties are not specified in the Str library. - function keyword replaced by fun in the revised syntax (still not tested though) 0.670: [+ui] addition of the ~ operator for ignoring case (uses the OCaml definition of case, i.e. the latin1 charset). [doc] created the reference manual 0.669: [opt] tail-recursivity is now preserved (options -direct/-tailrec) [-ui] regexpr{-12} is not valid anymore because it looks strange and ambiguous. regexpr{0-12} should be used instead. regexpr{12-} is still valid. 0.668: [bug] - fixed abusive simplification (missing try ... with) - added missing binding in multithreaded mode - fixed the buggy Makefile on the web page 0.667: [opt] alternatives between charsets handled like unions of charsets 0.666: [*] initial public release (2004-08-02) mikmatch-1.0.7/INSTALL000066400000000000000000000023531227060510600143470ustar00rootroot00000000000000Installation of Mikmatch version 1.0.0 or higher ================================================ The recommended way of installing mikmatch and a bunch of other packages is from GODI (godi_console) available from http://godi.camlcity.org/ or from your favorite packaging system. Otherwise, there is the manual install: Prerequisites: ------------- You must have installed: - OCaml including Camlp4 (versions 3.10.2 and 3.11.0 should be OK; versions up to 3.09.3 will not work). - PCRE-OCaml (http://www.ocaml.info/home/ocaml_sources.html) Everything will run smoothly if you have: - Findlib, i.e. ocamlfind (http://projects.camlcity.org/projects/findlib.html) - Gnu make - an Sh-compatible shell Compilation: ----------- make Installation: ------------ make install Uninstallation: -------------- make uninstall Options: ------- By default, only mikmatch_pcre is built and installed. Mikmatch_str can be built and installed with the following commands: make str # compilation make install-str # installation make uninstall-str # uninstallation Problems? -------- Questions, comments and bug reports should be sent to: http://groups.google.com/group/micmatch (no subscription required) mikmatch-1.0.7/LICENSE000066400000000000000000000025651227060510600143300ustar00rootroot00000000000000Copyright (c) 2004-2008 Martin Jambon All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. mikmatch-1.0.7/Makefile000066400000000000000000000044631227060510600147620ustar00rootroot00000000000000.PHONY: default install uninstall reinstall \ all opt mikmatch-pcre mikmatch-str \ common install-str install-pcre uninstall-str uninstall-pcre \ backup clean archive pcre str ifndef PREFIX BINDIR = $(shell dirname `which ocaml`) PREFIX = $(shell dirname $(BINDIR)) else BINDIR = $(PREFIX)/bin endif export PREFIX export BINDIR default: mikmatch-pcre install: install-pcre uninstall: uninstall-pcre reinstall: $(MAKE) uninstall $(MAKE) install ## GODIVA/GODI targets all: common cd pcre && $(MAKE) all-bc opt: common cd pcre && $(MAKE) all-nc ## end of GODIVA targets mikmatch-pcre: common pcre mikmatch-str: common str common: cd common && $(MAKE) str: common cd str && $(MAKE) pcre: common cd pcre && $(MAKE) install-str: cd str && $(MAKE) install install-pcre: cd pcre && $(MAKE) install uninstall-str: cd str && $(MAKE) uninstall uninstall-pcre: cd pcre && $(MAKE) uninstall backup: scp -r . $$BACKUP_DIR/mikmatch/ clean:: cd doc && $(MAKE) clean cd common && $(MAKE) clean cd str && $(MAKE) clean cd pcre && $(MAKE) clean VERSION = $(shell ./VERSION) export VERSION # Only for developers; requires camlmix, hevea, pdflatex # and maybe other things. archive: @echo "Making archive for version $(VERSION)" cd str && $(MAKE) version cd pcre && $(MAKE) version cd doc && $(MAKE) rm -rf /tmp/mikmatch /tmp/mikmatch-$(VERSION) && \ cp -r . /tmp/mikmatch && \ cd /tmp/mikmatch && \ $(MAKE) clean && \ rm -rf *~ mikmatch*.tar* `find . -name .svn` && \ cd /tmp && cp -r mikmatch mikmatch-$(VERSION) && \ tar czf mikmatch.tar.gz mikmatch && \ tar cjf mikmatch.tar.bz2 mikmatch && \ tar czf mikmatch-$(VERSION).tar.gz mikmatch-$(VERSION) && \ tar cjf mikmatch-$(VERSION).tar.bz2 mikmatch-$(VERSION) mv /tmp/mikmatch.tar.gz /tmp/mikmatch.tar.bz2 . mv /tmp/mikmatch-$(VERSION).tar.gz /tmp/mikmatch-$(VERSION).tar.bz2 . cp mikmatch.tar.gz mikmatch.tar.bz2 $$WWW/ cp mikmatch-$(VERSION).tar.gz mikmatch-$(VERSION).tar.bz2 $$WWW/ cp LICENSE $$WWW/mikmatch-license.txt cp VERSION $$WWW/mikmatch-version cp Changes $$WWW/mikmatch-changes.txt $(MAKE) install-www-doc install-www-doc: cp doc/mikmatch-manual.pdf $$WWW cp doc/mikmatch-manual.html $$WWW/mikmatch-manual-nocounter.html cp doc/mikmatch-ocamldoc/* $$WWW/mikmatch-ocamldoc touch -c $$WWW/mikmatch.html.mlx mikmatch-1.0.7/OCamlMakefile000066400000000000000000000712141227060510600156740ustar00rootroot00000000000000########################################################################### # OCamlMakefile # Copyright (C) 1999-2004 Markus Mottl # # For updates see: # http://www.oefai.at/~markus/ocaml_sources # # $Id$ # ########################################################################### # Modified by damien for .glade.ml compilation # Set these variables to the names of the sources to be processed and # the result variable. Order matters during linkage! ifndef SOURCES SOURCES := foo.ml endif export SOURCES ifndef RES_CLIB_SUF RES_CLIB_SUF := _stubs endif export RES_CLIB_SUF ifndef RESULT RESULT := foo endif export RESULT export LIB_PACK_NAME ifndef DOC_FILES DOC_FILES := $(filter %.mli, $(SOURCES)) endif export DOC_FILES export BCSUFFIX export NCSUFFIX ifndef TOPSUFFIX TOPSUFFIX := .top endif export TOPSUFFIX # Eventually set include- and library-paths, libraries to link, # additional compilation-, link- and ocamlyacc-flags # Path- and library information needs not be written with "-I" and such... # Define THREADS if you need it, otherwise leave it unset (same for # USE_CAMLP4)! export THREADS export VMTHREADS export ANNOTATE export USE_CAMLP4 export INCDIRS export LIBDIRS export EXTLIBDIRS export RESULTDEPS export OCAML_DEFAULT_DIRS export LIBS export CLIBS export OCAMLFLAGS export OCAMLNCFLAGS export OCAMLBCFLAGS export OCAMLLDFLAGS export OCAMLNLDFLAGS export OCAMLBLDFLAGS ifndef OCAMLCPFLAGS OCAMLCPFLAGS := a endif export OCAMLCPFLAGS export PPFLAGS export YFLAGS export IDLFLAGS export OCAMLDOCFLAGS export OCAMLFIND_INSTFLAGS export DVIPSFLAGS export STATIC # Add a list of optional trash files that should be deleted by "make clean" export TRASH #################### variables depending on your OCaml-installation ifdef MINGW export MINGW WIN32 := 1 CFLAGS_WIN32 := -mno-cygwin endif ifdef MSVC export MSVC WIN32 := 1 ifndef STATIC CFLAGS_WIN32 := -DCAML_DLL endif CFLAGS_WIN32 += -nologo EXT_OBJ := obj EXT_LIB := lib ifeq ($(CC),gcc) # work around GNU Make default value ifdef THREADS CC := cl -MT else CC := cl endif endif ifeq ($(CXX),g++) # work around GNU Make default value CXX := $(CC) endif CFLAG_O := -Fo endif ifdef WIN32 EXT_CXX := cpp EXE := .exe endif ifndef EXT_OBJ EXT_OBJ := o endif ifndef EXT_LIB EXT_LIB := a endif ifndef EXT_CXX EXT_CXX := cc endif ifndef EXE EXE := # empty endif ifndef CFLAG_O CFLAG_O := -o # do not delete this comment (preserves trailing whitespace)! endif export CC export CXX export CFLAGS export CXXFLAGS export LDFLAGS ifndef RPATH_FLAG RPATH_FLAG := -R endif export RPATH_FLAG ifndef MSVC ifndef PIC_FLAGS PIC_FLAGS := -fPIC -DPIC endif endif export PIC_FLAGS BCRESULT := $(addsuffix $(BCSUFFIX), $(RESULT)) NCRESULT := $(addsuffix $(NCSUFFIX), $(RESULT)) TOPRESULT := $(addsuffix $(TOPSUFFIX), $(RESULT)) ifndef OCAMLFIND OCAMLFIND := ocamlfind endif export OCAMLFIND ifndef OCAMLC OCAMLC := ocamlc endif export OCAMLC ifndef OCAMLOPT OCAMLOPT := ocamlopt endif export OCAMLOPT ifndef OCAMLMKTOP OCAMLMKTOP := ocamlmktop endif export OCAMLMKTOP ifndef OCAMLCP OCAMLCP := ocamlcp endif export OCAMLCP ifndef OCAMLDEP OCAMLDEP := ocamldep endif export OCAMLDEP ifndef OCAMLLEX OCAMLLEX := ocamllex endif export OCAMLLEX ifndef OCAMLYACC OCAMLYACC := ocamlyacc endif export OCAMLYACC ifndef OCAMLMKLIB OCAMLMKLIB := ocamlmklib endif export OCAMLMKLIB ifndef OCAML_GLADECC OCAML_GLADECC := lablgladecc2 endif export OCAML_GLADECC ifndef OCAML_GLADECC_FLAGS OCAML_GLADECC_FLAGS := endif export OCAML_GLADECC_FLAGS ifndef CAMELEON_REPORT CAMELEON_REPORT := report endif export CAMELEON_REPORT ifndef CAMELEON_REPORT_FLAGS CAMELEON_REPORT_FLAGS := endif export CAMELEON_REPORT_FLAGS ifndef CAMELEON_ZOGGY CAMELEON_ZOGGY := camlp4o pa_zog.cma pr_o.cmo endif export CAMELEON_ZOGGY ifndef CAMELEON_ZOGGY_FLAGS CAMELEON_ZOGGY_FLAGS := endif export CAMELEON_ZOGGY_FLAGS ifndef OXRIDL OXRIDL := oxridl endif export OXRIDL ifndef CAMLIDL CAMLIDL := camlidl endif export CAMLIDL ifndef CAMLIDLDLL CAMLIDLDLL := camlidldll endif export CAMLIDLDLL ifndef NOIDLHEADER MAYBE_IDL_HEADER := -header endif export NOIDLHEADER export NO_CUSTOM ifndef CAMLP4 CAMLP4 := camlp4 endif export CAMLP4 ifdef PACKS ifndef CREATE_LIB PACKS += threads endif empty := space := $(empty) $(empty) comma := , ifdef PREDS PRE_OCAML_FIND_PREDICATES := $(subst $(space),$(comma),$(PREDS)) PRE_OCAML_FIND_PACKAGES := $(subst $(space),$(comma),$(PACKS)) OCAML_FIND_PREDICATES := -predicates $(PRE_OCAML_FIND_PREDICATES) # OCAML_DEP_PREDICATES := -syntax $(PRE_OCAML_FIND_PREDICATES) OCAML_FIND_PACKAGES := $(OCAML_FIND_PREDICATES) -package $(PRE_OCAML_FIND_PACKAGES) OCAML_DEP_PACKAGES := $(OCAML_DEP_PREDICATES) -package $(PRE_OCAML_FIND_PACKAGES) else OCAML_FIND_PACKAGES := -package $(subst $(space),$(comma),$(PACKS)) OCAML_DEP_PACKAGES := endif OCAML_FIND_LINKPKG := -linkpkg REAL_OCAMLFIND := $(OCAMLFIND) endif export OCAML_FIND_PACKAGES export OCAML_DEP_PACKAGES export OCAML_FIND_LINKPKG export REAL_OCAMLFIND ifndef OCAMLDOC OCAMLDOC := ocamldoc endif export OCAMLDOC ifndef LATEX LATEX := latex endif export LATEX ifndef DVIPS DVIPS := dvips endif export DVIPS ifndef PS2PDF PS2PDF := ps2pdf endif export PS2PDF ifndef OCAMLMAKEFILE OCAMLMAKEFILE := OCamlMakefile endif export OCAMLMAKEFILE ifndef OCAMLLIBPATH OCAMLLIBPATH := \ $(shell $(OCAMLC) 2>/dev/null -where || echo /usr/local/lib/ocaml) endif export OCAMLLIBPATH ifndef OCAML_LIB_INSTALL OCAML_LIB_INSTALL := $(OCAMLLIBPATH)/contrib endif export OCAML_LIB_INSTALL ########################################################################### #################### change following sections only if #################### you know what you are doing! # delete target files when a build command fails .PHONY: .DELETE_ON_ERROR .DELETE_ON_ERROR: # for pedants using "--warn-undefined-variables" export MAYBE_IDL export REAL_RESULT export CAMLIDLFLAGS export THREAD_FLAG export RES_CLIB export MAKEDLL export ANNOT_FLAG export C_OXRIDL export SUBPROJS export CFLAGS_WIN32 INCFLAGS := SHELL := /bin/sh MLDEPDIR := ._d BCDIDIR := ._bcdi NCDIDIR := ._ncdi FILTER_EXTNS := %.mli %.ml %.mll %.mly %.idl %.oxridl %.c %.$(EXT_CXX) %.rep %.zog %.glade FILTERED := $(filter $(FILTER_EXTNS), $(SOURCES)) SOURCE_DIRS := $(filter-out ./, $(sort $(dir $(FILTERED)))) FILTERED_REP := $(filter %.rep, $(FILTERED)) DEP_REP := $(FILTERED_REP:%.rep=$(MLDEPDIR)/%.d) AUTO_REP := $(FILTERED_REP:.rep=.ml) FILTERED_ZOG := $(filter %.zog, $(FILTERED)) DEP_ZOG := $(FILTERED_ZOG:%.zog=$(MLDEPDIR)/%.d) AUTO_ZOG := $(FILTERED_ZOG:.zog=.ml) FILTERED_GLADE := $(filter %.glade, $(FILTERED)) DEP_GLADE := $(FILTERED_GLADE:%.glade=$(MLDEPDIR)/%.d) AUTO_GLADE := $(FILTERED_GLADE:.glade=.ml) FILTERED_ML := $(filter %.ml, $(FILTERED)) DEP_ML := $(FILTERED_ML:%.ml=$(MLDEPDIR)/%.d) FILTERED_MLI := $(filter %.mli, $(FILTERED)) DEP_MLI := $(FILTERED_MLI:.mli=.di) FILTERED_MLL := $(filter %.mll, $(FILTERED)) DEP_MLL := $(FILTERED_MLL:%.mll=$(MLDEPDIR)/%.d) AUTO_MLL := $(FILTERED_MLL:.mll=.ml) FILTERED_MLY := $(filter %.mly, $(FILTERED)) DEP_MLY := $(FILTERED_MLY:%.mly=$(MLDEPDIR)/%.d) $(FILTERED_MLY:.mly=.di) AUTO_MLY := $(FILTERED_MLY:.mly=.mli) $(FILTERED_MLY:.mly=.ml) FILTERED_IDL := $(filter %.idl, $(FILTERED)) DEP_IDL := $(FILTERED_IDL:%.idl=$(MLDEPDIR)/%.d) $(FILTERED_IDL:.idl=.di) C_IDL := $(FILTERED_IDL:%.idl=%_stubs.c) ifndef NOIDLHEADER C_IDL += $(FILTERED_IDL:.idl=.h) endif OBJ_C_IDL := $(FILTERED_IDL:%.idl=%_stubs.$(EXT_OBJ)) AUTO_IDL := $(FILTERED_IDL:.idl=.mli) $(FILTERED_IDL:.idl=.ml) $(C_IDL) FILTERED_OXRIDL := $(filter %.oxridl, $(FILTERED)) DEP_OXRIDL := $(FILTERED_OXRIDL:%.oxridl=$(MLDEPDIR)/%.d) $(FILTERED_OXRIDL:.oxridl=.di) AUTO_OXRIDL := $(FILTERED_OXRIDL:.oxridl=.mli) $(FILTERED_OXRIDL:.oxridl=.ml) $(C_OXRIDL) FILTERED_C_CXX := $(filter %.c %.$(EXT_CXX), $(FILTERED)) OBJ_C_CXX := $(FILTERED_C_CXX:.c=.$(EXT_OBJ)) OBJ_C_CXX := $(OBJ_C_CXX:.$(EXT_CXX)=.$(EXT_OBJ)) PRE_TARGETS += $(AUTO_MLL) $(AUTO_MLY) $(AUTO_IDL) $(AUTO_OXRIDL) $(AUTO_ZOG) $(AUTO_REP) $(AUTO_GLADE) ALL_DEPS := $(DEP_ML) $(DEP_MLI) $(DEP_MLL) $(DEP_MLY) $(DEP_IDL) $(DEP_OXRIDL) $(DEP_ZOG) $(DEP_REP) $(DEP_GLADE) MLDEPS := $(filter %.d, $(ALL_DEPS)) MLIDEPS := $(filter %.di, $(ALL_DEPS)) BCDEPIS := $(MLIDEPS:%.di=$(BCDIDIR)/%.di) NCDEPIS := $(MLIDEPS:%.di=$(NCDIDIR)/%.di) ALLML := $(filter %.mli %.ml %.mll %.mly %.idl %.oxridl %.rep %.zog %.glade, $(FILTERED)) IMPLO_INTF := $(ALLML:%.mli=%.mli.__) IMPLO_INTF := $(foreach file, $(IMPLO_INTF), \ $(basename $(file)).cmi $(basename $(file)).cmo) IMPLO_INTF := $(filter-out %.mli.cmo, $(IMPLO_INTF)) IMPLO_INTF := $(IMPLO_INTF:%.mli.cmi=%.cmi) IMPLX_INTF := $(IMPLO_INTF:.cmo=.cmx) INTF := $(filter %.cmi, $(IMPLO_INTF)) IMPL_CMO := $(filter %.cmo, $(IMPLO_INTF)) IMPL_CMX := $(IMPL_CMO:.cmo=.cmx) IMPL_ASM := $(IMPL_CMO:.cmo=.asm) IMPL_S := $(IMPL_CMO:.cmo=.s) OBJ_LINK := $(OBJ_C_IDL) $(OBJ_C_CXX) OBJ_FILES := $(IMPL_CMO:.cmo=.$(EXT_OBJ)) $(OBJ_LINK) EXECS := $(addsuffix $(EXE), \ $(sort $(TOPRESULT) $(BCRESULT) $(NCRESULT))) ifdef WIN32 EXECS += $(BCRESULT).dll $(NCRESULT).dll endif CLIB_BASE := $(RESULT)$(RES_CLIB_SUF) ifneq ($(strip $(OBJ_LINK)),) RES_CLIB := lib$(CLIB_BASE).$(EXT_LIB) endif ifdef WIN32 DLLSONAME := $(CLIB_BASE).dll else DLLSONAME := dll$(CLIB_BASE).so endif NONEXECS := $(INTF) $(IMPL_CMO) $(IMPL_CMX) $(IMPL_ASM) $(IMPL_S) \ $(OBJ_FILES) $(PRE_TARGETS) $(BCRESULT).cma $(NCRESULT).cmxa \ $(NCRESULT).$(EXT_LIB) $(BCRESULT).cmi $(BCRESULT).cmo \ $(NCRESULT).cmi $(NCRESULT).cmx $(NCRESULT).o \ $(RES_CLIB) $(IMPL_CMO:.cmo=.annot) \ $(LIB_PACK_NAME).cmi $(LIB_PACK_NAME).cmo $(LIB_PACK_NAME).cmx $(LIB_PACK_NAME).o ifndef STATIC NONEXECS += $(DLLSONAME) endif ifndef LIBINSTALL_FILES LIBINSTALL_FILES := $(RESULT).mli $(RESULT).cmi $(RESULT).cma \ $(RESULT).cmxa $(RESULT).$(EXT_LIB) $(RES_CLIB) ifndef STATIC ifneq ($(strip $(OBJ_LINK)),) LIBINSTALL_FILES += $(DLLSONAME) endif endif endif export LIBINSTALL_FILES ifdef WIN32 # some extra stuff is created while linking DLLs NONEXECS += $(BCRESULT).$(EXT_LIB) $(BCRESULT).exp $(NCRESULT).exp $(CLIB_BASE).exp $(CLIB_BASE).lib endif TARGETS := $(EXECS) $(NONEXECS) # If there are IDL-files ifneq ($(strip $(FILTERED_IDL)),) MAYBE_IDL := -cclib -lcamlidl endif ifdef USE_CAMLP4 CAMLP4PATH := \ $(shell $(CAMLP4) -where 2>/dev/null || echo /usr/local/lib/camlp4) INCFLAGS := -I $(CAMLP4PATH) CINCFLAGS := -I$(CAMLP4PATH) endif DINCFLAGS := $(INCFLAGS) $(SOURCE_DIRS:%=-I %) $(OCAML_DEFAULT_DIRS:%=-I %) INCFLAGS := $(DINCFLAGS) $(INCDIRS:%=-I %) CINCFLAGS += $(SOURCE_DIRS:%=-I%) $(INCDIRS:%=-I%) $(OCAML_DEFAULT_DIRS:%=-I%) ifndef MSVC CLIBFLAGS += $(SOURCE_DIRS:%=-L%) $(LIBDIRS:%=-L%) \ $(EXTLIBDIRS:%=-L%) $(EXTLIBDIRS:%=-Wl,$(RPATH_FLAG)%) \ $(OCAML_DEFAULT_DIRS:%=-L%) endif ifndef PROFILING INTF_OCAMLC := $(OCAMLC) else ifndef THREADS INTF_OCAMLC := $(OCAMLCP) -p $(OCAMLCPFLAGS) else # OCaml does not support profiling byte code # with threads (yet), therefore we force an error. ifndef REAL_OCAMLC $(error Profiling of multithreaded byte code not yet supported by OCaml) endif INTF_OCAMLC := $(OCAMLC) endif endif ifndef MSVC COMMON_LDFLAGS := $(LDFLAGS:%=-ccopt %) $(SOURCE_DIRS:%=-ccopt -L%) \ $(LIBDIRS:%=-ccopt -L%) $(EXTLIBDIRS:%=-ccopt -L%) \ $(EXTLIBDIRS:%=-ccopt -Wl,$(RPATH_FLAG)%) \ $(OCAML_DEFAULT_DIRS:%=-ccopt -L%) else COMMON_LDFLAGS := -ccopt "/link -NODEFAULTLIB:LIBC $(LDFLAGS:%=%) $(SOURCE_DIRS:%=-LIBPATH:%) \ $(LIBDIRS:%=-LIBPATH:%) $(EXTLIBDIRS:%=-LIBPATH:%) \ $(OCAML_DEFAULT_DIRS:%=-LIBPATH:%) " endif CLIBS_OPTS := $(CLIBS:%=-cclib -l%) ifdef MSVC ifndef STATIC # MSVC libraries do not have 'lib' prefix CLIBS_OPTS := $(CLIBS:%=-cclib %.lib) endif endif ifneq ($(strip $(OBJ_LINK)),) ifdef CREATE_LIB OBJS_LIBS := -cclib -l$(CLIB_BASE) $(CLIBS_OPTS) $(MAYBE_IDL) else OBJS_LIBS := $(OBJ_LINK) $(CLIBS_OPTS) $(MAYBE_IDL) endif else OBJS_LIBS := $(CLIBS_OPTS) $(MAYBE_IDL) endif # If we have to make byte-code ifndef REAL_OCAMLC BYTE_OCAML := y # EXTRADEPS is added dependencies we have to insert for all # executable files we generate. Ideally it should be all of the # libraries we use, but it's hard to find the ones that get searched on # the path since I don't know the paths built into the compiler, so # just include the ones with slashes in their names. EXTRADEPS := $(addsuffix .cma,$(foreach i,$(LIBS),$(if $(findstring /,$(i)),$(i)))) SPECIAL_OCAMLFLAGS := $(OCAMLBCFLAGS) REAL_OCAMLC := $(INTF_OCAMLC) REAL_IMPL := $(IMPL_CMO) REAL_IMPL_INTF := $(IMPLO_INTF) IMPL_SUF := .cmo DEPFLAGS := MAKE_DEPS := $(MLDEPS) $(BCDEPIS) ifdef CREATE_LIB CFLAGS := $(PIC_FLAGS) $(CFLAGS) ifndef STATIC ifneq ($(strip $(OBJ_LINK)),) MAKEDLL := $(DLLSONAME) ALL_LDFLAGS := -dllib $(DLLSONAME) endif endif endif ifndef NO_CUSTOM ifneq "$(strip $(OBJ_LINK) $(THREADS) $(MAYBE_IDL) $(CLIBS))" "" ALL_LDFLAGS += -custom endif endif ALL_LDFLAGS += $(INCFLAGS) $(OCAMLLDFLAGS) $(OCAMLBLDFLAGS) \ $(COMMON_LDFLAGS) $(LIBS:%=%.cma) CAMLIDLDLLFLAGS := ifdef THREADS ifdef VMTHREADS THREAD_FLAG := -vmthread else THREAD_FLAG := -thread endif ALL_LDFLAGS := $(THREAD_FLAG) $(ALL_LDFLAGS) ifndef CREATE_LIB ifndef REAL_OCAMLFIND ALL_LDFLAGS := unix.cma threads.cma $(ALL_LDFLAGS) endif endif endif # we have to make native-code else EXTRADEPS := $(addsuffix .cmxa,$(foreach i,$(LIBS),$(if $(findstring /,$(i)),$(i)))) ifndef PROFILING SPECIAL_OCAMLFLAGS := $(OCAMLNCFLAGS) PLDFLAGS := else SPECIAL_OCAMLFLAGS := -p $(OCAMLNCFLAGS) PLDFLAGS := -p endif REAL_IMPL := $(IMPL_CMX) REAL_IMPL_INTF := $(IMPLX_INTF) IMPL_SUF := .cmx CFLAGS := -DNATIVE_CODE $(CFLAGS) DEPFLAGS := -native MAKE_DEPS := $(MLDEPS) $(NCDEPIS) ALL_LDFLAGS := $(PLDFLAGS) $(INCFLAGS) $(OCAMLLDFLAGS) \ $(OCAMLNLDFLAGS) $(COMMON_LDFLAGS) CAMLIDLDLLFLAGS := -opt ifndef CREATE_LIB ALL_LDFLAGS += $(LIBS:%=%.cmxa) else CFLAGS := $(PIC_FLAGS) $(CFLAGS) endif ifdef THREADS THREAD_FLAG := -thread ALL_LDFLAGS := $(THREAD_FLAG) $(ALL_LDFLAGS) ifndef CREATE_LIB ifndef REAL_OCAMLFIND ALL_LDFLAGS := unix.cmxa threads.cmxa $(ALL_LDFLAGS) endif endif endif endif export MAKE_DEPS ifdef ANNOTATE ANNOT_FLAG := -dtypes else endif ALL_OCAMLCFLAGS := $(THREAD_FLAG) $(ANNOT_FLAG) $(OCAMLFLAGS) \ $(INCFLAGS) $(SPECIAL_OCAMLFLAGS) ifdef make_deps -include $(MAKE_DEPS) PRE_TARGETS := endif ########################################################################### # USER RULES # Call "OCamlMakefile QUIET=" to get rid of all of the @'s. QUIET=@ # generates byte-code (default) byte-code: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) $(BCRESULT) \ REAL_RESULT="$(BCRESULT)" make_deps=yes bc: byte-code byte-code-nolink: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) nolink \ REAL_RESULT="$(BCRESULT)" make_deps=yes bcnl: byte-code-nolink top: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) $(TOPRESULT) \ REAL_RESULT="$(BCRESULT)" make_deps=yes # generates native-code native-code: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) $(NCRESULT) \ REAL_RESULT="$(NCRESULT)" \ REAL_OCAMLC="$(OCAMLOPT)" \ make_deps=yes nc: native-code native-code-nolink: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) nolink \ REAL_RESULT="$(NCRESULT)" \ REAL_OCAMLC="$(OCAMLOPT)" \ make_deps=yes ncnl: native-code-nolink # generates byte-code libraries byte-code-library: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(RES_CLIB) $(BCRESULT).cma \ REAL_RESULT="$(BCRESULT)" \ CREATE_LIB=yes \ make_deps=yes bcl: byte-code-library # generates native-code libraries native-code-library: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(RES_CLIB) $(NCRESULT).cmxa \ REAL_RESULT="$(NCRESULT)" \ REAL_OCAMLC="$(OCAMLOPT)" \ CREATE_LIB=yes \ make_deps=yes ncl: native-code-library ifdef WIN32 # generates byte-code dll byte-code-dll: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(RES_CLIB) $(BCRESULT).dll \ REAL_RESULT="$(BCRESULT)" \ make_deps=yes bcd: byte-code-dll # generates native-code dll native-code-dll: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(RES_CLIB) $(NCRESULT).dll \ REAL_RESULT="$(NCRESULT)" \ REAL_OCAMLC="$(OCAMLOPT)" \ make_deps=yes ncd: native-code-dll endif # generates byte-code with debugging information debug-code: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) $(BCRESULT) \ REAL_RESULT="$(BCRESULT)" make_deps=yes \ OCAMLFLAGS="-g $(OCAMLFLAGS)" \ OCAMLLDFLAGS="-g $(OCAMLLDFLAGS)" dc: debug-code debug-code-nolink: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) nolink \ REAL_RESULT="$(BCRESULT)" make_deps=yes \ OCAMLFLAGS="-g $(OCAMLFLAGS)" \ OCAMLLDFLAGS="-g $(OCAMLLDFLAGS)" dcnl: debug-code-nolink # generates byte-code libraries with debugging information debug-code-library: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(RES_CLIB) $(BCRESULT).cma \ REAL_RESULT="$(BCRESULT)" make_deps=yes \ CREATE_LIB=yes \ OCAMLFLAGS="-g $(OCAMLFLAGS)" \ OCAMLLDFLAGS="-g $(OCAMLLDFLAGS)" dcl: debug-code-library # generates byte-code for profiling profiling-byte-code: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) $(BCRESULT) \ REAL_RESULT="$(BCRESULT)" PROFILING="y" \ make_deps=yes pbc: profiling-byte-code # generates native-code profiling-native-code: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) $(NCRESULT) \ REAL_RESULT="$(NCRESULT)" \ REAL_OCAMLC="$(OCAMLOPT)" \ PROFILING="y" \ make_deps=yes pnc: profiling-native-code # generates byte-code libraries profiling-byte-code-library: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(RES_CLIB) $(BCRESULT).cma \ REAL_RESULT="$(BCRESULT)" PROFILING="y" \ CREATE_LIB=yes \ make_deps=yes pbcl: profiling-byte-code-library # generates native-code libraries profiling-native-code-library: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(RES_CLIB) $(NCRESULT).cmxa \ REAL_RESULT="$(NCRESULT)" PROFILING="y" \ REAL_OCAMLC="$(OCAMLOPT)" \ CREATE_LIB=yes \ make_deps=yes pncl: profiling-native-code-library # packs byte-code objects pack-byte-code: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) $(BCRESULT).cmo \ REAL_RESULT="$(BCRESULT)" \ PACK_LIB=yes make_deps=yes pabc: pack-byte-code # packs native-code objects pack-native-code: $(PRE_TARGETS) $(QUIET)$(MAKE) -r -f $(OCAMLMAKEFILE) \ $(NCRESULT).cmx $(NCRESULT).o \ REAL_RESULT="$(NCRESULT)" \ REAL_OCAMLC="$(OCAMLOPT)" \ PACK_LIB=yes make_deps=yes panc: pack-native-code # generates HTML-documentation htdoc: doc/$(RESULT)/html # generates Latex-documentation ladoc: doc/$(RESULT)/latex # generates PostScript-documentation psdoc: doc/$(RESULT)/latex/doc.ps # generates PDF-documentation pdfdoc: doc/$(RESULT)/latex/doc.pdf # generates all supported forms of documentation doc: htdoc ladoc psdoc pdfdoc ########################################################################### # LOW LEVEL RULES $(REAL_RESULT): $(REAL_IMPL_INTF) $(OBJ_LINK) $(EXTRADEPS) $(RESULTDEPS) $(REAL_OCAMLFIND) $(REAL_OCAMLC) \ $(OCAML_FIND_PACKAGES) $(OCAML_FIND_LINKPKG) \ $(ALL_LDFLAGS) $(OBJS_LIBS) -o $@$(EXE) \ $(REAL_IMPL) nolink: $(REAL_IMPL_INTF) $(OBJ_LINK) ifdef WIN32 $(REAL_RESULT).dll: $(REAL_IMPL_INTF) $(OBJ_LINK) $(CAMLIDLDLL) $(CAMLIDLDLLFLAGS) $(OBJ_LINK) $(CLIBS) \ -o $@ $(REAL_IMPL) endif %$(TOPSUFFIX): $(REAL_IMPL_INTF) $(OBJ_LINK) $(EXTRADEPS) $(REAL_OCAMLFIND) $(OCAMLMKTOP) \ $(OCAML_FIND_PACKAGES) $(OCAML_FIND_LINKPKG) \ $(ALL_LDFLAGS) $(OBJS_LIBS) -o $@$(EXE) \ $(REAL_IMPL) .SUFFIXES: .mli .ml .cmi .cmo .cmx .cma .cmxa .$(EXT_OBJ) \ .mly .di .d .$(EXT_LIB) .idl %.oxridl .c .$(EXT_CXX) .h .so \ .rep .zog .glade ifndef STATIC ifdef MINGW $(DLLSONAME): $(OBJ_LINK) $(CC) $(CFLAGS) $(CFLAGS_WIN32) $(OBJ_LINK) -shared -o $@ \ -Wl,--whole-archive $(wildcard $(foreach dir,$(LIBDIRS),$(CLIBS:%=$(dir)/lib%.a))) \ $(OCAMLLIBPATH)/ocamlrun.a \ -Wl,--export-all-symbols \ -Wl,--no-whole-archive else ifdef MSVC $(DLLSONAME): $(OBJ_LINK) link /NOLOGO /DLL /OUT:$@ $(OBJ_LINK) \ $(wildcard $(foreach dir,$(LIBDIRS),$(CLIBS:%=$(dir)/%.lib))) \ $(OCAMLLIBPATH)/ocamlrun.lib else $(DLLSONAME): $(OBJ_LINK) $(OCAMLMKLIB) $(INCFLAGS) $(CLIBFLAGS) \ -o $(CLIB_BASE) $(OBJ_LINK) $(CLIBS:%=-l%) \ $(OCAMLMKLIB_FLAGS) endif endif endif ifndef LIB_PACK_NAME $(RESULT).cma: $(REAL_IMPL_INTF) $(MAKEDLL) $(EXTRADEPS) $(RESULTDEPS) $(REAL_OCAMLFIND) $(REAL_OCAMLC) -a $(ALL_LDFLAGS) \ $(OBJS_LIBS) -o $@ $(OCAMLBLDFLAGS) $(REAL_IMPL) $(RESULT).cmxa $(RESULT).$(EXT_LIB): $(REAL_IMPL_INTF) $(EXTRADEPS) $(RESULTDEPS) $(REAL_OCAMLFIND) $(OCAMLOPT) -a $(ALL_LDFLAGS) $(OBJS_LIBS) \ $(OCAMLNLDFLAGS) -o $@ $(REAL_IMPL) else ifdef BYTE_OCAML $(LIB_PACK_NAME).cmi $(LIB_PACK_NAME).cmo: $(REAL_IMPL_INTF) $(REAL_OCAMLFIND) $(REAL_OCAMLC) -pack -o $(LIB_PACK_NAME).cmo $(REAL_IMPL) else $(LIB_PACK_NAME).cmi $(LIB_PACK_NAME).cmx: $(REAL_IMPL_INTF) $(REAL_OCAMLFIND) $(REAL_OCAMLC) -pack -o $(LIB_PACK_NAME).cmx $(REAL_IMPL) endif $(RESULT).cma: $(LIB_PACK_NAME).cmi $(LIB_PACK_NAME).cmo $(MAKEDLL) $(EXTRADEPS) $(RESULTDEPS) $(REAL_OCAMLFIND) $(REAL_OCAMLC) -a $(ALL_LDFLAGS) \ $(OBJS_LIBS) -o $@ $(OCAMLBLDFLAGS) $(LIB_PACK_NAME).cmo $(RESULT).cmxa $(RESULT).$(EXT_LIB): $(LIB_PACK_NAME).cmi $(LIB_PACK_NAME).cmx $(EXTRADEPS) $(RESULTDEPS) $(REAL_OCAMLFIND) $(OCAMLOPT) -a $(ALL_LDFLAGS) $(OBJS_LIBS) \ $(OCAMLNLDFLAGS) -o $@ $(LIB_PACK_NAME).cmx endif $(RES_CLIB): $(OBJ_LINK) ifndef MSVC ifneq ($(strip $(OBJ_LINK)),) $(AR) rcs $@ $(OBJ_LINK) endif else ifneq ($(strip $(OBJ_LINK)),) lib -nologo -debugtype:cv -out:$(RES_CLIB) $(OBJ_LINK) endif endif .mli.cmi: $(EXTRADEPS) $(QUIET)pp=`sed -n -e '/^#/d' -e 's/(\*pp \([^*]*\) \*)/\1/p;q' $<`; \ if [ -z "$$pp" ]; then \ echo $(REAL_OCAMLFIND) $(INTF_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c $(THREAD_FLAG) $(ANNOT_FLAG) \ $(OCAMLFLAGS) $(INCFLAGS) $<; \ $(REAL_OCAMLFIND) $(INTF_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c $(THREAD_FLAG) $(ANNOT_FLAG) \ $(OCAMLFLAGS) $(INCFLAGS) $<; \ else \ echo $(REAL_OCAMLFIND) $(INTF_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c -pp \"$$pp $(PPFLAGS)\" $(THREAD_FLAG) $(ANNOT_FLAG) \ $(OCAMLFLAGS) $(INCFLAGS) $<; \ $(REAL_OCAMLFIND) $(INTF_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c -pp "$$pp $(PPFLAGS)" $(THREAD_FLAG) $(ANNOT_FLAG) \ $(OCAMLFLAGS) $(INCFLAGS) $<; \ fi .ml.cmi .ml.$(EXT_OBJ) .ml.cmx .ml.cmo: $(EXTRADEPS) $(QUIET)pp=`sed -n -e '/^#/d' -e 's/(\*pp \([^*]*\) \*)/\1/p;q' $<`; \ if [ -z "$$pp" ]; then \ echo $(REAL_OCAMLFIND) $(REAL_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c $(ALL_OCAMLCFLAGS) $<; \ $(REAL_OCAMLFIND) $(REAL_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c $(ALL_OCAMLCFLAGS) $<; \ else \ echo $(REAL_OCAMLFIND) $(REAL_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c -pp \"$$pp $(PPFLAGS)\" $(ALL_OCAMLCFLAGS) $<; \ $(REAL_OCAMLFIND) $(REAL_OCAMLC) $(OCAML_FIND_PACKAGES) \ -c -pp "$$pp $(PPFLAGS)" $(ALL_OCAMLCFLAGS) $<; \ fi ifdef PACK_LIB $(REAL_RESULT).cmo $(REAL_RESULT).cmx $(REAL_RESULT).o: $(REAL_IMPL_INTF) $(OBJ_LINK) $(EXTRADEPS) $(REAL_OCAMLFIND) $(REAL_OCAMLC) -pack $(ALL_LDFLAGS) \ $(OBJS_LIBS) -o $@ $(REAL_IMPL) endif .PRECIOUS: %.ml %.ml: %.mll $(OCAMLLEX) $< .PRECIOUS: %.ml %.mli %.ml %.mli: %.mly $(OCAMLYACC) $(YFLAGS) $< .PRECIOUS: %.ml %.ml: %.rep $(CAMELEON_REPORT) $(CAMELEON_REPORT_FLAGS) -gen $< .PRECIOUS: %.ml %.ml: %.zog $(CAMELEON_ZOGGY) $(CAMELEON_ZOGGY_FLAGS) -impl $< > $@ .PRECIOUS: %.ml %.ml: %.glade $(OCAML_GLADECC) $(OCAML_GLADECC_FLAGS) $< > $@ .PRECIOUS: %.ml %.mli %.ml %.mli: %.oxridl $(OXRIDL) $< .PRECIOUS: %.ml %.mli %_stubs.c %.h %.ml %.mli %_stubs.c %.h: %.idl $(CAMLIDL) $(MAYBE_IDL_HEADER) $(IDLFLAGS) \ $(CAMLIDLFLAGS) $< $(QUIET)if [ $(NOIDLHEADER) ]; then touch $*.h; fi .c.$(EXT_OBJ): $(OCAMLC) -c -cc "$(CC)" -ccopt "$(CFLAGS) \ $(CFLAGS_WIN32) $(CINCFLAGS) $(CFLAG_O)$@ " $< .$(EXT_CXX).$(EXT_OBJ): $(CXX) -c $(CXXFLAGS) $(CINCFLAGS) -I'$(OCAMLLIBPATH)' \ $< $(CFLAG_O)$@ $(MLDEPDIR)/%.d: %.ml $(QUIET)echo making $@ from $< $(QUIET)if [ ! -d $(@D) ]; then mkdir -p $(@D); fi $(QUIET)pp=`sed -n -e '/^#/d' -e 's/(\*pp \([^*]*\) \*)/\1/p;q' $<`; \ if [ -z "$$pp" ]; then \ $(REAL_OCAMLFIND) $(OCAMLDEP) $(OCAML_DEP_PACKAGES) \ $(DINCFLAGS) $< > $@; \ else \ $(REAL_OCAMLFIND) $(OCAMLDEP) $(OCAML_DEP_PACKAGES) \ -pp "$$pp $(PPFLAGS)" $(DINCFLAGS) $< > $@; \ fi $(BCDIDIR)/%.di $(NCDIDIR)/%.di: %.mli $(QUIET)echo making $@ from $< $(QUIET)if [ ! -d $(@D) ]; then mkdir -p $(@D); fi $(QUIET)pp=`sed -n -e '/^#/d' -e 's/(\*pp \([^*]*\) \*)/\1/p;q' $<`; \ if [ -z "$$pp" ]; then \ $(REAL_OCAMLFIND) $(OCAMLDEP) $(DEPFLAGS) $(DINCFLAGS) $< > $@; \ else \ $(REAL_OCAMLFIND) $(OCAMLDEP) $(DEPFLAGS) \ -pp "$$pp $(PPFLAGS)" $(DINCFLAGS) $< > $@; \ fi doc/$(RESULT)/html: $(DOC_FILES) rm -rf $@ mkdir -p $@ $(QUIET)pp=`sed -n -e '/^#/d' -e 's/(\*pp \([^*]*\) \*)/\1/p;q' $<`; \ if [ -z "$$pp" ]; then \ echo $(OCAMLDOC) -html -d $@ $(OCAMLDOCFLAGS) $(INCFLAGS) $(DOC_FILES); \ $(OCAMLDOC) -html -d $@ $(OCAMLDOCFLAGS) $(INCFLAGS) $(DOC_FILES); \ else \ echo $(OCAMLDOC) -pp \"$$pp $(PPFLAGS)\" -html -d $@ $(OCAMLDOCFLAGS) \ $(INCFLAGS) $(DOC_FILES); \ $(OCAMLDOC) -pp "$$pp $(PPFLAGS)" -html -d $@ $(OCAMLDOCFLAGS) \ $(INCFLAGS) $(DOC_FILES); \ fi doc/$(RESULT)/latex: $(DOC_FILES) rm -rf $@ mkdir -p $@ $(QUIET)pp=`sed -n -e '/^#/d' -e 's/(\*pp \([^*]*\) \*)/\1/p;q' $<`; \ if [ -z "$$pp" ]; then \ echo $(OCAMLDOC) -latex $(OCAMLDOCFLAGS) $(INCFLAGS) \ $(DOC_FILES) -o $@/doc.tex; \ $(OCAMLDOC) -latex $(OCAMLDOCFLAGS) $(INCFLAGS) $(DOC_FILES) \ -o $@/doc.tex; \ else \ echo $(OCAMLDOC) -pp \"$$pp $(PPFLAGS)\" -latex $(OCAMLDOCFLAGS) \ $(INCFLAGS) $(DOC_FILES) -o $@/doc.tex; \ $(OCAMLDOC) -pp "$$pp $(PPFLAGS)" -latex $(OCAMLDOCFLAGS) \ $(INCFLAGS) $(DOC_FILES) -o $@/doc.tex; \ fi doc/$(RESULT)/latex/doc.ps: doc/$(RESULT)/latex cd doc/$(RESULT)/latex && \ $(LATEX) doc.tex && \ $(LATEX) doc.tex && \ $(DVIPS) $(DVIPSFLAGS) doc.dvi -o $(@F) doc/$(RESULT)/latex/doc.pdf: doc/$(RESULT)/latex/doc.ps cd doc/$(RESULT)/latex && $(PS2PDF) $( mikmatch-1.0.7/TODO000066400000000000000000000007451227060510600140110ustar00rootroot00000000000000Documentation: - mention all the rules which are overridden, since it may cause incompatibilities with other extensions which do the same Compatibility: - (maybe) add an option for using another keyword than "match", and avoid to have to delete existing rules Features: - port to Str the macros which already exist for PCRE (if possible) Possible optimizations: - computation of only the substrings that are effectively used (or at least which name appears in some expression) mikmatch-1.0.7/VERSION000077500000000000000000000000751227060510600143700ustar00rootroot00000000000000#!/bin/sh mikmatch_version="1.0.7" echo -n $mikmatch_version mikmatch-1.0.7/common/000077500000000000000000000000001227060510600146035ustar00rootroot00000000000000mikmatch-1.0.7/common/Makefile000066400000000000000000000003721227060510600162450ustar00rootroot00000000000000SOURCES = \ global_def.ml \ messages.ml charset.ml \ constants.ml \ regexp_ast.ml \ select_lib.ml OCAMLFLAGS = -dtypes USE_CAMLP4 = yes .PHONY: default default: bcnl TRASH = *~ OCAMLMAKEFILE = ../OCamlMakefile include $(OCAMLMAKEFILE) mikmatch-1.0.7/common/charset.ml000066400000000000000000000032161227060510600165700ustar00rootroot00000000000000module C = Set.Make (Char) type t = C.t let empty = C.empty let add = C.add let singleton = C.singleton let union = C.union let diff = C.diff let add_range first last set = let r = ref set in for i = Char.code first to Char.code last do r := add (Char.chr i) !r done; !r let range c1 c2 = add_range c1 c2 empty let irange i j = range (Char.chr i) (Char.chr j) let full = range '\000' '\255' let full_for_C = C.remove '\000' full let of_string s = let accu = ref C.empty in String.iter (fun c -> accu := C.add c !accu) s; !accu let complement set = C.diff full set let list = C.elements let nocase set = C.fold (fun c set -> let c1 = Char.lowercase c and c2 = Char.uppercase c in let set1 = C.add c1 set in if c1 <> c2 then C.add c2 set1 else set1) set C.empty module Posix = struct let lower = range 'a' 'z' let upper = range 'A' 'Z' let ascii = range '\x00' '\x7F' let alpha = union lower upper let digit = range '0' '9' let alnum = union alpha digit let punct = of_string "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" let graph = union alnum punct let print = union (singleton ' ') graph let blank = of_string " \t" let cntrl = union (range '\x00' '\x1F') (singleton '\x7F') let xdigit = of_string "0123456789abcdefABCDEF" let space = of_string " \t\n\x0B\x0C\r" let all = [ "lower", lower; "upper", upper; "ascii", ascii; "alpha", alpha; "digit", digit; "alnum", alnum; "punct", punct; "graph", graph; "print", print; "blank", blank; "cntrl", cntrl; "xdigit", xdigit; "space", space; ] end mikmatch-1.0.7/common/constants.ml000066400000000000000000000027401227060510600171540ustar00rootroot00000000000000(*pp camlp4orf *) open Camlp4.PreCast let dummy_loc = Loc.ghost let debug_mode = ref false let reserved_prefix = if !debug_mode then "_" else "__mikmatch_" let uppercase_prefix = "C" ^ reserved_prefix let typevar_prefix = "a" ^ reserved_prefix let mod_runtime = ref "" let mod_runtime_mt = ref "" let exn_exit = "Mikmatch_exit" let any_exn = reserved_prefix ^ "any_exn" let any_target = reserved_prefix ^ "any_target" let any_result = reserved_prefix ^ "any_result" let expr_exit _loc = <:expr< $uid: !mod_runtime$.$uid:exn_exit$ >> let raise_exit _loc = <:expr< raise $expr_exit _loc$ >> let patt_exit _loc = <:patt< $uid: !mod_runtime$.$uid:exn_exit$ >> let shared re_name = re_name ^ "shared" let subgroups2 re_name = re_name ^ "subgroups2" let shared_ovector re_name = re_name ^ "shared_ovector" let regexp_prefix = reserved_prefix ^ "regexp_" let view_prefix = reserved_prefix ^ "view_" let new_regexp = let r = ref 0 in fun () -> incr r; (!r, regexp_prefix ^ string_of_int !r) let new_view = let r = ref 0 in fun () -> incr r; (!r, view_prefix ^ string_of_int !r) let new_target = let r = ref 0 in fun () -> incr r; reserved_prefix ^ "match_target_" ^ string_of_int !r let new_subpatt = let r = ref 0 in fun () -> incr r; reserved_prefix ^ "subpatt_" ^ string_of_int !r let new_var = let r = ref 0 in fun () -> incr r; reserved_prefix ^ "var_" ^ string_of_int !r let new_type_var = let r = ref 0 in fun () -> incr r; typevar_prefix ^ string_of_int !r mikmatch-1.0.7/common/global_def.ml000066400000000000000000000022411227060510600172120ustar00rootroot00000000000000(*pp camlp4orf *) (* $Id$ *) open Camlp4.PreCast let init get = let rec rewrite si0 = let _loc = Ast.loc_of_str_item si0 in let tbl = Hashtbl.create 10 in let accu = ref <:str_item< >> in let map = (object inherit Ast.map as super method expr e = let _loc = Ast.loc_of_expr e in (match super#expr e with | <:expr< $lid:id$ >> -> (match get _loc id with Some si -> if not (Hashtbl.mem tbl id) then (Hashtbl.add tbl id (); accu := <:str_item< $!accu$ ; $si$ >>) | None -> ()) | _ -> ()); e method str_item si0 = let si = super#str_item si0 in let pending = !accu in accu := <:str_item< >>; match pending with <:str_item< >> -> (* Special workaround. Otherwise directives are not recognized and are skipped. *) si | _ -> <:str_item< $pending$ ; $si$ >> end) in map # str_item si0 in AstFilters.register_str_item_filter rewrite; AstFilters.register_topphrase_filter rewrite let init_from_table tbl = let get _loc id = try Some (Hashtbl.find tbl id) with Not_found -> None in init get mikmatch-1.0.7/common/global_def.mli000066400000000000000000000016651227060510600173740ustar00rootroot00000000000000(* $Id$ *) open Camlp4.PreCast val init : (Ast.loc -> string -> Ast.str_item option) -> unit (* [init get] registers a new filter that will operate after the parsing phase. The [get] function will be called to look up all expression lowercase identifiers. It returns the structure items that are required by this expression. They will be inserted just before the current structure item, only once. Example: In your files that will extend the camlp4 preprocessor, define the following: let get _loc id = match id with "pi" -> Some <:str_item< value pi = acos (-1.) >> | _ -> None ;; let () = init get;; The preprocessed file: let x = 0.5 *. pi will be expanded into: let pi = acos (-1.) let x = 0.5 *. pi *) val init_from_table : (string, Ast.str_item) Hashtbl.t -> unit (* Same as [init], but uses the given hash table for its lookups. *) mikmatch-1.0.7/common/match.ml000066400000000000000000001001371227060510600162330ustar00rootroot00000000000000(*pp camlp4orf *) (* $Id$ *) open Printf open Camlp4.PreCast open Mm_util open Regexp_ast open Constants open Select_lib (* Regular expression to match *) type regexp_info = { re_loc : Ast.loc; re_name : string; (* identifier of the compiled regexp *) re_num : int; re_args : regexp_args; re_source : regexp_source; (* string representation of the regexp, with possible gaps *) re_groups : named_groups (* names of substrings *) * named_groups (* names of positions *); re_postbindings : (string * Ast.expr) list; re_anchored : bool } (* Recorded view pattern *) type view_info = { view_loc : Ast.loc; view_unique_name : string; view_name_loc : Ast.loc; (* location of full name *) view_name : string; (* base name *) view_path : string list; (* module path w/o base name *) view_num : int; (* number of this view pattern *) view_arg : Ast.patt option } type special_info = [ `Regexp of regexp_info | `View of view_info ] (* Sets of strings with location *) let compare_loc_string (_, s1) (_, s2) = String.compare s1 s2 module Names = Set.Make (struct type t = Ast.loc * string let compare = compare_loc_string end) let add_new _loc set x = if Names.mem x set then Messages.multiple_binding _loc [(snd x)]; Names.add x set let add_if_needed _loc set x = Names.add x set let output_match_ref = ref (fun _ -> assert false) let unloc l = List.map snd l let pattify _loc l = debug "pattify"; match List.map (fun (_loc, s) -> <:patt< $lid:s$ >>) l with [] -> <:patt< () >> | [p] -> p | pl -> let tup = List.fold_left (fun tup p -> <:patt< $tup$ , $p$ >> ) <:patt< >> pl in <:patt< ( $tup:tup$ ) >> let exprify _loc l = debug "exprify"; match List.map (fun (_loc, s) -> <:expr< $lid:s$ >>) l with [] -> <:expr< () >> | [e] -> e | el -> let tup = List.fold_left (fun tup e -> <:expr< $tup$ , $e$ >> ) <:expr< >> el in <:expr< ( $tup:tup$ ) >> let check_same_ids _loc e1 e2 = if not (Names.equal e1 e2) then let diff_elements = unloc (Names.elements (Names.diff (Names.union e1 e2) (Names.inter e1 e2))) in Messages.unbalanced_bindings _loc diff_elements let check_different_ids _loc e1 e2 = let inter = Names.inter e1 e2 in if not (Names.equal inter Names.empty) then Messages.multiple_binding _loc (unloc (Names.elements inter)) let get_all_names re = let _loc = re.re_loc in let get_names x = Named_groups.fold (fun group_name _ accu -> add_new _loc accu (_loc, group_name)) x Names.empty in let (groups, positions) = re.re_groups in let group_names = get_names groups and position_names = get_names positions in check_different_ids _loc group_names position_names; Names.union group_names position_names let list_all_names re = Names.elements (get_all_names re) (* Subpatterns with info about their variables *) type sub_patt = { sub_patt : Ast.patt; sub_names : Names.t; sub_specials : special_info list; sub_alternatives : (string (* name of the target *) * sub_patt) list } (* Options *) let lib = ref Select_lib.dummy let tailrec = ref true let has_prefix ~prefix s = String.length s >= String.length prefix && String.sub s 0 (String.length prefix) = prefix let match_suffix ~suffix s = let len = String.length s in let slen = String.length suffix in if len >= slen && String.sub s (len - slen) slen = suffix then String.sub s 0 (len - slen) else invalid_arg "match_suffix" let is_reserved s = has_prefix ~prefix:reserved_prefix s let classify_id s = if has_prefix ~prefix:reserved_prefix s then if has_prefix ~prefix:regexp_prefix s then `Regexp else if has_prefix ~prefix:view_prefix s then `View else assert false else `Other let regexp_of_var var_name = match_suffix ~suffix:"_target" var_name let view_of_var var_name = match_suffix ~suffix:"_target" var_name let var_of_regexp re_name = re_name ^ "_target" let var_of_view unique_view_name = unique_view_name ^ "_target" let forbidden_type = reserved_prefix ^ "syntax_error" let posix_regexps = List.map (fun (name, set) -> (name, Characters (dummy_loc, set))) Charset.Posix.all module Mikmatch_regexps = struct let _loc = dummy_loc let set x = Characters (_loc, x) let cset s = set (Charset.of_string s) let opt x = Repetition (_loc, (Option, true), x) let plus x = Repetition (_loc, (Plus, true), x) let star x = Repetition (_loc, (Star, true), x) let lr f l = match List.rev l with [] -> Epsilon _loc | last :: rl -> List.fold_left (fun accu x -> f x accu) last rl let seq l = lr (fun x y -> Sequence (_loc, x, y)) l let alt l = lr (fun x y -> Alternative (_loc, x, y, S.empty, S.empty)) l let explode s = let l = ref [] in for i = String.length s - 1 downto 0 do l := set (Charset.singleton s.[i]) :: !l done; !l let string s = seq (explode s) open Charset.Posix let opt_sign = opt (cset "-+") let d = set digit (* RE int = ["-+"]? ( "0" ( ["xX"] xdigit+ | ["oO"] ['0'-'7']+ | ["bB"] ["01"]+ ) | digit+ ) *) let int = seq [ opt_sign; (alt [ seq [ (cset "0"); (alt [ seq [ (cset "xX"); (plus (set xdigit)) ]; seq [ (cset "oO"); (plus (set (Charset.range '0' '7'))) ]; seq [ (cset "bB"); (plus (cset "01")) ]; ]); ]; plus d; ]) ] (* RE float = ["-+"]? ( ( digit+ ("." digit* )? | "." digit+ ) (["eE"] ["+-"]? digit+ )? | "nan"~ | "inf"~ ) *) let float = seq [ opt_sign; alt [ seq [ alt [ seq [ plus d; opt (seq [ cset "."; star d ]) ]; seq [ cset "."; plus d ]; ]; opt (seq [ cset "eE"; opt (cset "+-"); plus d; ]) ]; nocase (string "nan"); nocase (string "inf"); ] ] let all = [ "int", int; "float", float ] end let named_regexps = (Hashtbl.create 13 : (string, Regexp_ast.ast) Hashtbl.t) let fill_tbl tbl l = List.iter (fun (key, data) -> Hashtbl.add tbl key data) l let init_named_regexps () = fill_tbl named_regexps posix_regexps; fill_tbl named_regexps Mikmatch_regexps.all; fill_tbl named_regexps !(lib).predefined_regexps let reset_named_regexps () = Hashtbl.clear named_regexps; fill_tbl named_regexps posix_regexps; fill_tbl named_regexps Mikmatch_regexps.all; fill_tbl named_regexps !(lib).predefined_regexps let select_lib x = lib := x; reset_named_regexps () let views = Hashtbl.create 100 (* Global table where we store information about regexps for later insertion or reuse. *) let regexp_table = Hashtbl.create 100 (* Function that will fetch the str_item that should be inserted before the current str_item when it contains a reference to a regexp. *) let get_regexp_str_item _loc name = let opt_re = try Some (Hashtbl.find regexp_table name) with Not_found -> None in match opt_re with None -> None | Some re -> let _loc = dummy_loc in let expr = let compile = if re.re_anchored then (!lib).compile_regexp_match else (!lib).compile_regexp_search in compile _loc re.re_args re.re_source in let regexp_def = <:str_item< value $lid:name$ = $expr$ >> in let str_item = List.fold_left ( fun si (ident, e) -> <:str_item< $si$ ; value $lid:ident$ = $e$ >> ) regexp_def re.re_postbindings in Some str_item (* Registration *) let _ = Global_def.init get_regexp_str_item let add_compiled_regexp ~anchored postbindings _loc name num re_args re_source named_groups = Hashtbl.add regexp_table name { re_loc = _loc; re_num = num; re_name = name; re_source = re_source; re_args = re_args; re_groups = named_groups; re_anchored = anchored; re_postbindings = postbindings } let add_view _loc unique_name num (name_loc, name, full_name) arg = Hashtbl.add views unique_name { view_loc = _loc; view_num = num; view_unique_name = unique_name; view_name = name; view_name_loc = name_loc; view_path = full_name; view_arg = arg } let find_compiled_regexp _loc name = try Hashtbl.find regexp_table name with Not_found -> Loc.raise _loc (Invalid_argument ("find_compiled_regexp: " ^ name)) let find_view _loc name = try Hashtbl.find views name with Not_found -> Loc.raise _loc (Invalid_argument ("find_view: " ^ name)) let bind_target ?(force_string = false) _loc target = match target with <:expr< ( $tup:tup$ ) >> -> let el = list_of_comma_expr tup in let ids = List.map (fun _ -> new_target ()) el in let idl = List.map (fun s -> <:expr< $lid:s$ >>) ids in let target = <:expr< ( $tup:comma_expr_of_list _loc idl$ ) >> in let make_target e = List.fold_right2 (fun x id e -> <:expr< let $lid:id$ = $x$ in $e$ >>) el ids e in (make_target, target) | x -> let id = new_target () in let target = <:expr< $lid:id$ >> in let make_target = if force_string then fun e -> <:expr< let $lid:id$ = ($x$ : string) in $e$ >> else fun e -> <:expr< let $lid:id$ = $x$ in $e$ >> in (make_target, target) let match_failure _loc = <:expr< match () with [] >> let keep patt = let subpatt = { sub_patt = patt; sub_names = Names.empty; sub_specials = []; sub_alternatives = [] } in (Names.empty, false, `Normal, subpatt) let sum_kind kind1 kind2 = if kind1 = `Special || kind2 = `Special then `Special else `Normal let names_from_list names _loc l get = List.fold_right (fun x (set, has_re, kind, spatts, subpatts, res) -> match get x with Some p -> let (local_set, local_has_re, local_kind, subpatt) = names p in check_different_ids _loc local_set set; (Names.union local_set set, (has_re || local_has_re), sum_kind kind local_kind, subpatt.sub_patt :: spatts, subpatt.sub_alternatives @ subpatts, subpatt.sub_specials @ res) | None -> (set, has_re, kind, <:patt< __dummy >> :: spatts, subpatts, res) ) l (Names.empty, false, `Normal, [], [], []) let rec names patt = debug "Match.names"; let _loc = Ast.loc_of_patt patt in match patt with | <:patt< ( $p$ : $lid:s$ ) >> when s = forbidden_type -> recons_patt1 _loc p (fun p -> p) | <:patt< ( $p$ : $t$ ) >> -> recons_patt1 _loc p (fun p -> <:patt< ( $p$ : $t$ ) >>) | <:patt< $lid:id$ >> -> (match classify_id id with `Regexp -> let re_name = regexp_of_var id in let re = find_compiled_regexp _loc re_name in let set = get_all_names re in let subpatt = { sub_patt = patt; sub_names = set; sub_specials = [`Regexp re]; sub_alternatives = [] } in (set, true, `Special, subpatt) | `View -> let view_name = view_of_var id in let view = find_view _loc view_name in let set = match view.view_arg with None -> Names.empty | Some p -> let (set, _, _, _) = names p in set in let subpatt = { sub_patt = patt; sub_names = set; sub_specials = [`View view]; sub_alternatives = [] } in (set, true, `Special, subpatt) | `Other -> let set = Names.singleton (_loc, id) in let subpatt = { sub_patt = patt; sub_names = set; sub_specials = []; sub_alternatives = [] } in (set, false, `Normal, subpatt)) | <:patt< $id: _ $ >> -> (* Should be something like A.x or A.B.x (at least one period) *) keep patt | <:patt< $anti:s$ >> -> Messages.failure _loc ("don't know what to do with antiquotation " ^ s) | <:patt< $p1$ .. $p2$ >> -> (recons_patt2 _loc p1 p2 (fun p1 p2 -> <:patt< $p1$ .. $p2$ >>)) | <:patt< $p1$ $p2$ >> -> (recons_patt2 _loc p1 p2 (fun p1 p2 -> <:patt< $p1$ $p2$ >>)) | <:patt< lazy $p$ >> -> (recons_patt1 _loc p (fun p -> <:patt< lazy $p$ >>)) | <:patt< ( $p1$ as $p2$ ) >> -> (recons_patt2 _loc p1 p2 (fun p1 p2 -> <:patt< ( $p1$ as $p2$ ) >>)) | <:patt< $p1$ | $p2$ >> -> let (set1, has_re1, kind1, subpatt1) = names p1 in let (set2, has_re2, kind2, subpatt2) = names p2 in check_same_ids _loc set1 set2; let subpatt = if kind1 = `Special || kind2 = `Special then let var_name = new_subpatt () in let spatt = <:patt< $lid:var_name$ >> in { sub_patt = spatt; sub_names = set1; sub_specials = []; sub_alternatives = [ var_name, subpatt1; var_name, subpatt2 ] } else { sub_patt = <:patt< $subpatt1.sub_patt$ | $subpatt2.sub_patt$ >>; sub_names = subpatt1.sub_names; sub_specials = []; sub_alternatives = subpatt1.sub_alternatives @ subpatt2.sub_alternatives } in (set1, (has_re1 || has_re2), `Special, subpatt) | <:patt< { $p$ } >> -> let ppl = list_of_record p in let (set, has_re, kind, spatts, l, res) = names_from_list names _loc ppl (function `Normal (_, x) -> Some x | `Other _ -> None) in let fields = List.map2 ( fun field spatt -> match field with | `Normal (p1, p2) -> `Normal (p1, spatt) | `Other p -> `Other p ) ppl spatts in let record_body = record_of_list _loc fields in let subpatt = { sub_patt = <:patt< { $record_body$ } >>; sub_names = set; sub_specials = res; sub_alternatives = l } in (set, has_re, kind, subpatt) | <:patt< [| $p$ |] >> -> let pl = list_of_semicolon_patt p in let (set, has_re, kind, spatts, l, res) = names_from_list names _loc pl (fun x -> Some x) in let subpatt = { sub_patt = <:patt< [| $list:spatts$ |] >>; sub_names = set; sub_specials = res; sub_alternatives = l } in (set, has_re, kind, subpatt) | <:patt< ( $tup:p$ ) >> -> let pl = list_of_comma_patt p in let (set, has_re, kind, spatts, l, res) = names_from_list names _loc pl (fun x -> Some x) in let tuple_body = comma_patt_of_list _loc spatts in let subpatt = { sub_patt = <:patt< ( $tup:tuple_body$ ) >>; sub_names = set; sub_specials = res; sub_alternatives = l } in (set, has_re, kind, subpatt) | <:patt< $chr:_$ >> | <:patt< $int:_$ >> | <:patt< $str:_$ >> | <:patt< $flo:_$ >> | <:patt< _ >> -> keep patt | Ast.PaNativeInt (_, _) | Ast.PaInt64 (_, _) | Ast.PaInt32 (_, _) -> keep patt | Ast.PaVrn (_, _) | Ast.PaTyp (_, _) -> keep patt | <:patt< $_$ , $_$ >> (* tuple items *) | <:patt< $_$ ; $_$ >> (* record or array items *) | <:patt< $_$ = $_$ >> (* record field *) | <:patt< >> | Ast.PaMod _ (* (module M) *) | Ast.PaOlb _ (* ?s or ?s:(p) *) | Ast.PaOlbi _ (* ?s:(p = e) or ?(p = e) *) | Ast.PaLab _ -> (* label *) Messages.invalid_pattern _loc and recons_patt1 _loc p1 f = let (set1, has_re1, kind1, { sub_patt = spatt1; sub_specials = res1; sub_alternatives = l1 }) = names p1 in let spatt = f spatt1 in let subpatt = { sub_patt = spatt; sub_names = set1; sub_specials = res1; sub_alternatives = l1 } in (set1, has_re1, kind1, subpatt) and recons_patt2 _loc p1 p2 f = let (set1, has_re1, kind1, { sub_patt = spatt1; sub_specials = res1; sub_alternatives = l1 }) = names p1 in let (set2, has_re2, kind2, { sub_patt = spatt2; sub_specials = res2; sub_alternatives = l2 }) = names p2 in check_different_ids _loc set1 set2; let kind = sum_kind kind1 kind2 in let spatt = f spatt1 spatt2 in let set = Names.union set1 set2 in let subpatt = { sub_patt = spatt; sub_names = set; sub_specials = res1 @ res2; sub_alternatives = l1 @ l2 } in (set, (has_re1 || has_re2), kind, subpatt) let id_list set = List.sort (fun (_, a) (_, b) -> String.compare a b) (Names.elements set) let expr_id_list l = List.map (fun (_loc, s) -> <:expr< $lid:s$ >>) l let patt_id_list l = List.map (fun (_loc, s) -> <:patt< $lid:s$ >>) l let expr_is_lid = function <:expr< $lid:_$ >> -> true | _ -> false let patt_is_lid = function <:patt< $lid:_$ >> -> true | _ -> false let match_case_of_pwe _loc p w e = let we = match w with None -> <:expr< >> | Some e -> e in <:match_case< $p$ when $we$ -> $e$ >> let match_case_of_tuple _loc (p, w, e) = match_case_of_pwe _loc p w e let match_case_of_tuple4 (_loc, p, w, e) = match_case_of_pwe _loc p w e let simplify_match _loc target l default = (* target is an expr that should not compute anything *) (* user-defined but unused expressions are kept for the type checker *) (* default normally raises a match_failure or reraises an exception *) match l with [] -> default | [ <:patt< _ >>, (None | Some <:expr< True >>), e ] -> e | [ <:patt< $lid:id$ >>, (None | Some <:expr< True >>), e ] when expr_is_lid target -> <:expr< let $lid:id$ = $target$ in $e$ >> | _ -> let l' = l @ [ <:patt< _ >>, None, default ] in let match_cases = List.map (match_case_of_tuple _loc) l' in <:expr< match $target$ with [ $list: match_cases$ ] >> let rec patt_succeeds patt = (* always?, catch_all? *) match patt with <:patt< _ >> -> (true, true) | <:patt< $lid:_$ >> -> (true, false) | <:patt< ( $tup:tup$ ) >> -> let l = list_of_comma_patt tup in List.fold_left (fun (works_always, is_catch_all) p -> let always, catch_all = patt_succeeds p in (works_always && always, is_catch_all && catch_all)) (false, true) l | _ -> (false, false) (* actually we could test if it is catch_all *) let match_one_case _loc target patt success failure = let always_succeeds, is_catch_all = patt_succeeds patt in if is_catch_all then (success, false) else if always_succeeds then (<:expr< let $patt$ = $target$ in $success$ >>, false) else (<:expr< match $target$ with [ $patt$ when True -> $success$ | _ -> $failure$ ] >>, true) let make_get_re _loc re_name re_args = List.fold_left (fun e (_, arg) -> let _loc = Ast.loc_of_expr arg in <:expr< $e$ $arg$ >>) <:expr< $lid:re_name$ >> re_args (* let re_match re_list (expr, expr_may_fail) = let result = List.fold_right (fun { re_loc = _loc; re_name = re_name; re_args = re_args; re_groups = named_groups } success -> let var_name = var_of_regexp re_name in let target = <:expr< $lid:var_name$ >> in let failure = raise_exit _loc in let get_re = make_get_re _loc re_name re_args in (!lib).match_and_bind _loc re_name get_re target named_groups success failure) re_list expr in let may_fail = expr_may_fail || re_list <> [] in (result, may_fail) *) let re_match re success = let { re_loc = _loc; re_name = re_name; re_args = re_args; re_groups = named_groups } = re in let var_name = var_of_regexp re_name in let target = <:expr< $lid:var_name$ >> in let failure = raise_exit _loc in let get_re = make_get_re _loc re_name re_args in (!lib).match_and_bind _loc re_name get_re target named_groups success failure let view_match x success = let { view_loc = _loc; view_unique_name = view_unique_name; view_name = view_name; view_name_loc = vloc; view_path = view_path; view_arg = arg } = x in let view_fun = let _loc = vloc in let base = <:ident< $lid:"view_" ^ view_name$ >> in let id = List.fold_right (fun s r -> <:ident< $uid:s$ . $r$ >>) view_path base in <:expr< $id:id$ >> in let var_name = var_of_view view_unique_name in let failure = raise_exit _loc in match arg with None -> (let target = let _loc = vloc in let alpha = new_type_var () in <:expr< ( $view_fun$ : '$alpha$ -> bool ) $lid:var_name$ >> in <:expr< if $target$ then $success$ else $failure$ >>) | Some patt -> let target = let _loc = vloc in let alpha = new_type_var () in let beta = new_type_var () in <:expr< ( $view_fun$ : '$alpha$ -> option '$beta$ ) $lid:var_name$ >> in let cases = [ (_loc, <:patt< Some $patt$ >>, None, success); (_loc, <:patt< _ >>, None, failure) ] in !output_match_ref _loc target cases let special_match specials (expr, expr_may_fail) = let result = List.fold_right (fun special success -> match special with `Regexp re -> re_match re success | `View x -> view_match x success) specials expr in let may_fail = expr_may_fail || specials <> [] in (result, may_fail) let expand_subpatt _loc l after_success = match l with [] -> (* This case never raises Exit *) (after_success, false) | (_, { sub_names = set }) :: _ -> let vars = id_list set in let subresult = match vars with [] -> <:expr< () >> | [_loc,s] -> <:expr< $id: <:ident< $lid:s$ >> $ >> | _ -> let tup = comma_expr_of_list _loc (expr_id_list vars) in <:expr< ( $tup:tup$ ) >> in let rec expand _loc l = match l with [] -> subresult, false | _ -> match List.fold_right (fun (var, subpatt) rest -> let patt = subpatt.sub_patt in let l = subpatt.sub_alternatives in let re_list = subpatt.sub_specials in let (success, may_fail1) = special_match re_list (expand _loc l) in let failure = <:expr< raise $expr_exit _loc$ >> in let target = <:expr< $lid:var$ >> in let (match_expr, may_fail2) = match_one_case _loc target patt success failure in let may_fail_here = may_fail1 || may_fail2 in match rest with None -> (* succeed or raise Exit, which is a failure *) Some (match_expr, may_fail_here) | Some (real_rest, rest_may_fail) -> let may_fail = may_fail_here || rest_may_fail in (* catch Exit and try the next alternative *) if may_fail then (* CHECK: was may_fail_here *) Some (<:expr< try $match_expr$ with [ $patt_exit _loc$ -> $real_rest$ ] >>, rest_may_fail) else (* can't fail, other tests are ignored *) Some (match_expr, false)) l None with None -> (subresult, false) | Some (e, may_fail) -> (e, may_fail) in let result_expr, may_fail = match vars with [] -> let expanded_expr, may_fail = expand _loc l in <:expr< do { $expanded_expr$; $after_success$ } >>, may_fail | _ -> let p = match vars with [_loc,s] -> <:patt< $lid:s$ >> | _ -> let tup = comma_patt_of_list _loc (patt_id_list vars) in <:patt< ( $tup:tup$ ) >> in let expanded_expr, may_fail = expand _loc l in <:expr< let $p$ = $expanded_expr$ in $after_success$ >>, may_fail in (* This whole expression raises Exit if the test fails *) (result_expr, may_fail) (* was: true *) let extract_re cases = List.fold_right (fun (_loc, patt, w, e) (has_re1, accu) -> let (all_names, has_re2, kind, subpatt) = names patt in let more_cases = accu <> [] in ((has_re1 || has_re2), (_loc, subpatt, w, e, has_re2, more_cases) :: accu)) cases (false, []) let force_when patt when_opt = let _loc = Ast.loc_of_patt patt in match when_opt with None -> Some <:expr< True >> | _ -> when_opt let output_special_match _loc target_expr cases_with_re default_action = debug "output_special_match"; let wrap_all = !(lib).wrap_match in let wrap = !(lib).wrap_user_case in let really_wrap = !(lib).really_wrap_user_case in let (clo, app) = if !tailrec then (fun e -> let _loc = Ast.loc_of_expr e in <:expr< fun [ () -> $e$ ] >>), (fun e -> let _loc = Ast.loc_of_expr e in <:expr< $e$ () >>) else (fun e -> e), (fun e -> e) in let (make_target, target) = bind_target _loc target_expr in let raise_exit = <:expr< raise $expr_exit _loc$ >> in let (cases_without_regexp, final_expr) = List.fold_right (fun (_, subpatt, when_opt, user_expr, has_re, more_cases) (cases_without_regexp, match_next) -> let patt = subpatt.sub_patt in let subpatts = subpatt.sub_alternatives in match has_re, when_opt, subpatts with false, (None | Some <:expr< True >>), [] -> (((patt, force_when patt when_opt, wrap (clo user_expr)) :: cases_without_regexp), match_next) | false, Some guard, [] when not really_wrap -> (((patt, force_when patt when_opt, clo user_expr) :: cases_without_regexp), match_next) | _ -> let e4, e4_may_fail = match when_opt with None -> clo user_expr, false | Some cond -> <:expr< if $cond$ then $clo user_expr$ else $raise_exit$ >>, true in let e3 = (wrap e4, e4_may_fail) in let (e2, may_fail1) = special_match subpatt.sub_specials e3 in let (e1, subpatt_may_fail) = expand_subpatt _loc subpatts e2 in let success = e1 in let failure = raise_exit in let (this_match, may_fail2) = match_one_case _loc target patt success failure in let rematch = simplify_match _loc target cases_without_regexp match_next in let e = if may_fail1 || may_fail2 || subpatt_may_fail then <:expr< try $this_match$ with [ $patt_exit _loc$ -> $rematch$ ] >> else this_match in ([], e)) cases_with_re ([], (wrap (clo default_action))) in let full_expr = simplify_match _loc target cases_without_regexp final_expr in make_target (app (wrap_all full_expr)) let output_match _loc target_expr cases = match extract_re cases with false, _ -> (* change nothing *) let match_cases = List.map match_case_of_tuple4 cases in <:expr< match $target_expr$ with [ $list:match_cases$ ] >> | true, cases_with_re -> output_special_match _loc target_expr cases_with_re (match_failure _loc) let _ = output_match_ref := output_match let output_try _loc e cases = match extract_re cases with false, _ -> (* change nothing *) let match_cases = List.map match_case_of_tuple4 cases in <:expr< try $e$ with [ $list:match_cases$ ] >> | true, cases_with_re -> let exn = <:expr< $lid:any_exn$ >> in let default_action = <:expr< raise $exn$ >> in <:expr< try $e$ with [ $lid:any_exn$ -> $output_special_match _loc exn cases_with_re default_action$ ] >> let output_function _loc cases = debug "output_function"; match extract_re cases with false, _ -> (* change nothing *) let match_cases = List.map match_case_of_tuple4 cases in <:expr< fun [ $list:match_cases$ ] >> | true, cases_with_re -> let target = <:expr< $lid:any_target$ >> in <:expr< fun $lid:any_target$ -> $output_special_match _loc target cases_with_re (match_failure _loc)$ >> let pp_named_groups _loc (groups, positions) = debug "pp_named_groups"; let l1 = Named_groups.list groups in let l2 = Named_groups.list positions in let l = l1 @ l2 in List.fold_right (fun (name, il) e -> let el = List.fold_right (fun (_loc, i, conv) e -> <:expr< [ $int:string_of_int i$ :: $e$ ] >>) il <:expr< [ ] >> in <:expr< [ ( $str:String.escaped name$, $el$ ) :: $e$ ] >>) l <:expr< [ ] >> let find_named_regexp _loc name = try Hashtbl.find named_regexps name with Not_found -> Messages.failure _loc ("Unbound regular expression " ^ name) let handle_regexp_patt _loc re = warnings re; let (num, re_name) = new_regexp () in let var_name = var_of_regexp re_name in let (re_args, re_source, named_groups, postbindings) = (!lib).process_regexp _loc ~sharing:false re re_name in add_compiled_regexp ~anchored:true postbindings _loc re_name num re_args re_source named_groups; <:patt< ( $lid:var_name$ : $lid: forbidden_type$ ) >> let handle_view_patt _loc x = let (num, unique_view_name) = new_view () in let (name, arg) = x in let var_name = var_of_view unique_view_name in add_view _loc unique_view_name num name arg; <:patt< ( $lid:var_name$ : $lid: forbidden_type$ ) >> let handle_special_patt _loc = function `Regexp re -> handle_regexp_patt _loc re | `View x -> handle_view_patt _loc x let gen_handle_let_bindings ?in_expr _loc is_rec l = let rec check_one_patt = function <:patt< ( $lid:s$ : $lid: t$ ) >> as p when is_reserved s || t = forbidden_type -> Messages.misplaced_pattern p | <:patt< ( $tup:tup$ ) >> -> let pl = list_of_comma_patt tup in check_patts pl | _ -> () and check_patts l = List.iter check_one_patt l in match l with [] -> (match in_expr with None -> `Str_item <:str_item< >> | Some e -> `Expr e) | [ (<:patt< ( $lid:s$ : $lid: t$ ) >> as p, e) ] when not is_rec && (is_reserved s || t = forbidden_type) -> (match classify_id s with `View -> Messages.failure (Ast.loc_of_patt p) "Views are not supported in this kind of pattern" | `Other -> assert false | `Regexp -> ()); let names = lazy (list_all_names (find_compiled_regexp _loc (regexp_of_var s))) in let e2 = match in_expr with None -> exprify _loc (Lazy.force names) | Some e2 -> e2 in let cases = [ (_loc, p, None, e2) ] in let expr = output_match _loc e cases in if in_expr = None then let p = pattify _loc (Lazy.force names) in `Str_item <:str_item< value $p$ = $expr$ >> else `Expr expr | l -> (* RE patterns in function arguments are not checked! *) check_patts (List.map fst l); match in_expr with None -> let bindings = List.map (binding_of_pair _loc) l in `Str_item <:str_item< value $rec: rec_flag is_rec$ $list:bindings$ >> | Some e2 -> let bindings = List.map (binding_of_pair _loc) l in `Expr <:expr< let $rec: rec_flag is_rec$ $list:bindings$ in $e2$ >> let handle_let_bindings _loc is_rec l e2 = match gen_handle_let_bindings ~in_expr:e2 _loc is_rec l with `Expr e -> e | `Str_item _ -> assert false let handle_value_bindings _loc is_rec l = match gen_handle_let_bindings _loc is_rec l with `Expr _ -> assert false | `Str_item x -> x let let_try_in _loc o l e2 pwel = let f = <:expr< fun () -> $e2$ >> in let e = handle_let_bindings _loc (o <> None) l f in <:expr< (try $e$ with [ $list:pwel$ ]) () >> let get_re_source ~quote_expr ~nocasify accu = let rec compact args strings pieces = function `String s :: l -> compact args (if s <> "" then s :: strings else strings) pieces l | `Var (e, nocase) :: l -> let _loc = Ast.loc_of_expr e in let name = Constants.new_var () in let tail = if strings = [] then pieces else `String (String.concat "" (List.rev strings)) :: pieces in let gap_expr = let e' = <:expr< $lid:name$ >> in if nocase then nocasify e' else <:expr< $quote_expr$ $e'$ >> in let new_args = (name, e) :: args in compact new_args [] (`Expr gap_expr :: tail) l | [] -> let final_pieces = if strings = [] then pieces else let s = String.concat "" (List.rev strings) in if s <> "" then `String s :: pieces else pieces in (List.rev args, List.rev final_pieces) in compact [] [] [] (List.rev !accu) let compute_re_string _loc re_source = let get_expr = function `String s -> <:expr< $str: String.escaped s$ >> | `Expr e -> e in match re_source with [] -> <:expr< "" >> | [x] -> get_expr x | [first; second] -> <:expr< $get_expr first$ ^ $get_expr second$ >> | _ -> let l = List.fold_right (fun x tail -> <:expr< [ $get_expr x$ :: $tail$ ] >>) re_source <:expr< [] >> in <:expr< String.concat "" $l$ >> let protect mt e = if mt then let _loc = Ast.loc_of_expr e in <:expr< do { Mutex.lock mutex; try let result = $e$ in do { Mutex.unlock mutex; result } with exn -> do { Mutex.unlock mutex; raise exn } } >> else e let get_re_fragments _loc re_source = match re_source with [] -> <:expr< "" >> | [`String s] -> <:expr< $str:String.escaped s$ >> | _ -> List.fold_right (fun x e -> match x with `String s -> <:expr< [ $str:String.escaped s$ :: $e$ ] >> | _ -> e) re_source <:expr< [] >> mikmatch-1.0.7/common/messages.ml000066400000000000000000000036641227060510600167550ustar00rootroot00000000000000(* $Id$ *) open Printf open Camlp4.PreCast let warning _loc s = (* Format.err_formatter _loc;*) let label = if !Sys.interactive then "" else "Warning: " in Format.eprintf "%a:@.%s%s@." Loc.print _loc label s let failure _loc s = (* does it print the error like Stdpp.raise_with_loc? *) Loc.raise _loc (Failure s) let list = function [] -> "" | [s] -> s | l -> let l' = List.rev l in String.concat ", " (List.rev (List.tl l')) ^ " and " ^ List.hd l' let invalid_backref _loc name = failure _loc (sprintf "Invalid backreference %s" name) let unbalanced_bindings _loc l = failure _loc (sprintf "Variable%s %s must occur on both sides of this | pattern" (if List.length l > 1 then "s" else "") (list l)) let multiple_binding _loc l = let s, are = if List.length l > 1 then "s", "are" else "", "is" in failure _loc (sprintf "Variable%s %s %s bound several times in this matching" s (list l) are) let invalid_range _loc = failure _loc "Invalid range" let invalid_pattern _loc = failure _loc "Invalid pattern" let invalid_lookbehind _loc kind adjective = failure _loc (sprintf "%s are disabled in %slookbehind assertions" kind adjective) let not_visible _loc who where = let s, are = if List.length who > 1 then "s", "are" else "", "is" in warning _loc (sprintf "identifier%s %s %s not visible \ out of this %s" s (list who) are where) let invalid_converter _loc name = failure _loc (sprintf "%s is not a valid converter" name) let reserved_identifier _loc prefix name = failure _loc (sprintf "%s is a reserved identifier: use another prefix than %s" name prefix) let misplaced_pattern p = failure (Ast.loc_of_patt p) ("patterns of this kind cannot appear in this context. \ Use match ... with if you are unsure.") let cannot_delete_rule name = eprintf "Warning: Cannot delete rule %s\n%!" name mikmatch-1.0.7/common/mikmatch.ml000066400000000000000000000160711227060510600167370ustar00rootroot00000000000000module Text = struct exception Internal_exit let iter_lines_of_channel f ic = try while true do let line = try input_line ic with End_of_file -> raise Internal_exit in f line done with Internal_exit -> () let iter_lines_of_file f file = let ic = open_in file in try iter_lines_of_channel f ic; close_in ic with exn -> close_in_noerr ic; raise exn let lines_of_channel ic = let l = ref [] in iter_lines_of_channel (fun line -> l := line :: !l) ic; List.rev !l let lines_of_file file = let l = ref [] in iter_lines_of_file (fun line -> l := line :: !l) file; List.rev !l let channel_contents ic = let len = 2048 in let buf = String.create len in let rec loop size accu = match input ic buf 0 len with 0 -> (accu, size) | n when n = len -> loop (size + n) (String.copy buf :: accu) | n -> loop (size + n) (String.sub buf 0 n :: accu) in let accu, size = loop 0 [] in let result = String.create size in let rec loop2 last_pos = function [] -> assert (last_pos = 0) | s :: rest -> let len = String.length s in let pos = last_pos - len in String.blit s 0 result pos len; loop2 pos rest in loop2 size accu; result let file_contents ?(bin = false) file = let ic = open_in file in let s = try channel_contents ic with exn -> close_in_noerr ic; raise exn in close_in ic; s let save file data = let oc = open_out_bin file in (try output_string oc data; with exn -> close_out_noerr oc; raise exn); close_out oc let save_lines file lines = let oc = open_out_bin file in (try List.iter (fun s -> output_string oc s; output_char oc '\n') lines; with exn -> close_out_noerr oc; raise exn); close_out oc exception Skip let rev_map f l = let rec loop f accu = function [] -> accu | hd :: tl -> let accu' = try f hd :: accu with Skip -> accu in loop f accu' tl in loop f [] l let map f l = List.rev (rev_map f l) let rec fold_left f accu l = match l with [] -> accu | hd :: tl -> let accu' = try f accu hd with Skip -> accu in fold_left f accu' tl let rec rev_fold_right f l accu = match l with [] -> accu | hd :: tl -> let accu' = try f hd accu with Skip -> accu in rev_fold_right f tl accu' let rec fold_right f l accu = rev_fold_right f (List.rev l) accu let map_lines_of_channel f ic = let l = ref [] in iter_lines_of_channel (fun line -> try l := f line :: !l with Skip -> ()) ic; List.rev !l let map_lines_of_file f file = let l = ref [] in iter_lines_of_file (fun line -> try l := f line :: !l with Skip -> ()) file; List.rev !l end module Fixed = struct let chop_spaces str = let len = String.length str in let rec getfirst n = if n = len then len else if String.unsafe_get str n = ' ' then getfirst (n+1) else n and getlast n = if String.unsafe_get str n = ' ' then getlast (n-1) else n in let first = getfirst 0 in if first = len then "" else let last = getlast (len - 1) in String.sub str first (last-first+1) let int s = int_of_string (chop_spaces s) let float s = float_of_string (chop_spaces s) end module Directory = struct let list ?(absolute = false) ?path dir = let names = Sys.readdir dir in Array.sort String.compare names; let make_path, path_maker = match absolute, path with false, None | false, Some false -> false, (fun s -> s) | false, Some true -> true, Filename.concat dir | true, Some true | true, None -> let f = if Filename.is_relative dir then let cwd = Sys.getcwd () in Filename.concat (Filename.concat cwd dir) else Filename.concat dir in true, f | true, Some false -> invalid_arg "Directory.list" in let paths = if make_path then Array.map path_maker names else names in Array.to_list paths let is_dir ?(nofollow = false) x = let stat = if nofollow then Unix.lstat else Unix.stat in try (stat x).Unix.st_kind = Unix.S_DIR with Unix.Unix_error (Unix.ENOENT, _, _) -> false (* may be a bad symbolic link if nofollow is false *) end module Glob = struct (* Filename globbing utility *) (* Examples of use with mikmatch: let ml_files = list [FILTER _* ".ml" "i"? eos] let trash_files = list [ FILTER ""; FILTER _* (".cm" ("i"|"o"|"x"|"a"|"xa") | ".o" | ".a") eos ] *) let filter_array f a = Array.fold_right (fun x l -> if f x then x :: l else l) a [] let rec scan_gen ~cons ~real_dir ~dir ?nofollow action path_filter = let real_real_dir = if real_dir = "" then Filename.current_dir_name else real_dir in match path_filter with [] -> () | [f] -> List.iter (fun name -> action (cons dir name)) (filter_array f (Sys.readdir real_real_dir)) | f :: subpath_filter -> let filtered_files = filter_array f (Sys.readdir real_real_dir) in List.iter (fun name -> let subdir = cons dir name in let real_subdir = Filename.concat real_dir name in if Directory.is_dir ?nofollow real_subdir then scan_gen ~cons ~real_dir:real_subdir ~dir:subdir ?nofollow action subpath_filter) filtered_files let get_dir ~getcwd ~concat ~is_relative ~fun_name ~absolute ~path ~relative_root root = match absolute, path with false, None | false, Some false -> relative_root | false, Some true -> root | true, Some true | true, None -> if is_relative root then let cwd = getcwd () in concat cwd root else root | true, Some false -> invalid_arg fun_name let scan ?(absolute = false) ?path ?(root = "") ?nofollow action path_filter = let getcwd = Sys.getcwd in let cons = Filename.concat in let concat = Filename.concat in let is_relative = Filename.is_relative in let fun_name = "Glob.scan" in let relative_root = "" in let dir = get_dir ~getcwd ~concat ~is_relative ~fun_name ~absolute ~path ~relative_root root in scan_gen ~cons ~real_dir:root ~dir ?nofollow action path_filter let lscan ?(rev = false) ?(absolute = false) ?path ?(root = []) ?nofollow action path_filter = let getcwd () = [Sys.getcwd ()] in let cons = (fun l s -> s :: l) in let concat = (fun root rel_path -> rel_path @ root) in let rec is_relative = function [] -> true | [x] -> Filename.is_relative x | x :: l -> is_relative l in let fun_name = "Glob.lscan" in let relative_root = [] in let rev_root = if rev then root else List.rev root in let rev_dir = get_dir ~getcwd ~concat ~is_relative ~fun_name ~absolute ~path ~relative_root rev_root in let real_dir = List.fold_left Filename.concat "" (List.rev rev_dir) in let new_action = if rev then action else (fun l -> action (List.rev l)) in scan_gen ~cons ~real_dir ~dir:rev_dir ?nofollow new_action path_filter let list_gen scan ?absolute ?path ?root ?nofollow ?(sort = true) path_filter = let l = ref [] in scan ?absolute ?path ?root ?nofollow (fun x -> l := x :: !l) path_filter; if sort then List.sort compare !l else !l let list = list_gen scan let llist ?rev = list_gen (lscan ?rev) end mikmatch-1.0.7/common/mikmatch.mli000066400000000000000000000203231227060510600171030ustar00rootroot00000000000000(** A small text-oriented library *) (** The [Mikmatch] module provides a submodule named [Text]. A normal usage is to place [open Mikmatch] at the beginning of user code that uses it. This module is part of the runtime environment of Mikmatch (the library run_mikmatch_pcre.cma or equivalent). *) module Text : sig (** This module provides some general functions which are especially useful for manipulating text and text files. *) val iter_lines_of_channel : (string -> unit) -> in_channel -> unit (** [iter_lines_of_channel f ic] reads input channel [ic] and applies successively the given function [f] to each line until the end of file is reached. *) val iter_lines_of_file : (string -> unit) -> string -> unit (** [iter_lines_of_file f file] reads file [file] and applies successively the given function [f] to each line until the end of file is reached. *) val lines_of_channel : in_channel -> string list (** [lines_of_channel ic] returns the list of the lines that can be read from input channel [ic]. *) val lines_of_file : string -> string list (** [lines_of_file file] returns the list of the lines that can be read from file [file]. *) val channel_contents : in_channel -> string (** [channel_contents ic] returns the string containing the bytes that can be read from the given input channel [ic]. *) val file_contents : ?bin:bool -> string -> string (** [file_contents file] returns the string containing the bytes that can be read from the given file. Option [bin] specifies if [Pervasives.open_in_bin] should be used instead of [Pervasives.open_in] to open the file. Default is [false]. *) val save : string -> string -> unit (** [save file data] stores the string [data] in [file]. If the file already exists, its contents is discarded silently. *) val save_lines : string -> string list -> unit (** [save_lines file l] saves the given list [l] of strings in [file] and adds a newline characters (['\n']) after each of them. If the file already exists, its contents is discarded silently. *) exception Skip (** This exception can be used to skip an element of a list being processed with [rev_map], [map], [fold_left], and [fold_right]. *) val map : ('a -> 'b) -> 'a list -> 'b list (** Like [List.map] but it is guaranteed that the elements of the input list are processed from left to right. Moreover the [Skip] exception can be used to skip an element of the list. This function runs in constant stack space. *) val rev_map : ('a -> 'b) -> 'a list -> 'b list (** Like [List.rev_map], but it is guaranteed that the elements of the input list are processed from left to right. Moreover the [Skip] exception can be used to skip an element of the list. This function runs in constant stack space and is slightly faster then [map]. *) val fold_left : ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a (** Like [List.fold_left] but the [Skip] exception can be used to skip an element of the list. This function runs in constant stack space. *) val fold_right : ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b (** Like [List.fold_right] but the [Skip] exception can be used to skip an element of the list. This function runs in constant stack space. *) val map_lines_of_channel : (string -> 'a) -> in_channel -> 'a list (** [map_lines_of_channel f ic] is equivalent to [map f (lines_of_channel ic)] but faster. *) val map_lines_of_file : (string -> 'a) -> string -> 'a list (** [map_lines_of_file f file] is equivalent to [map f (lines_of_file file)] but faster. *) end module Fixed : sig (** This module provides some functions which are useful for manipulating files with fields of fixed width. *) val chop_spaces : string -> string (** [chop_spaces s] returns a string where the leading and trailing spaces are removed. *) val int : string -> int (** [int s] reads an int from a string where leading and trailing spaces are allowed. Equivalent to [Pervasives.int_of_string (chop_spaces s)]. *) val float : string -> float (** [float s] reads an float from a string where leading and trailing spaces are allowed. Equivalent to [Pervasives.float_of_string (chop_spaces s)]. *) end module Directory : sig (** Basic operations on directories *) val list : ?absolute:bool -> ?path:bool -> string -> string list (** [list dir] returns the alphabetically sorted list of the names of the files contained in directory [dir]. The special names that refer to the parent directory (e.g. [..]) and the directory itself (e.g. [.]) are ignored. If the option [absolute] is set to [true], the result is a list of absolute file paths, i.e. that do not depend on the current directory which is associated to the process (default is false; implies [path = true]). If the option [path] is set to [true], the result is a list of paths instead of just the file names (default is [false] except if [absolute] is explicitely set to [true]). Exception [Invalid_argument "Directory.list"] is raised if there is an incompatibility between the options. Unspecified exceptions will be raised if the given directory does not exist or is not readable. *) val is_dir : ?nofollow:bool -> string -> bool (** [is_dir dir] returns true if [dir] is a directory, false otherwise. The [nofollow] option is false by default, but if true, a symbolic link will not be followed. In that case false is returned even if the link points to a valid directory. *) end module Glob : sig (** A generic file path matching utility *) val scan : ?absolute:bool -> ?path:bool -> ?root:string -> ?nofollow:bool -> (string -> unit) -> (string -> bool) list -> unit (** [scan action path_filter] returns all the file paths having a name that matches [path_filter]. [path_filter] is a list of filters that test whether a directory name or a file name should be selected. The path search starts from the current directory by default, or from the directory specified by the [root] option. The file names are examined in an undefined order. When a file path matches, [action] is applied to the string representing the path. Options [absolute] and [path] have the same meaning and the same default values as in {!Mikmatch.Directory.list}. [nofollow] can be used to prevent from considering symbolic links as directories. It is false by default. See also {!Mikmatch.Directory.is_dir}. *) val lscan : ?rev:bool -> ?absolute:bool -> ?path:bool -> ?root:string list -> ?nofollow:bool -> (string list -> unit) -> (string -> bool) list -> unit (** Same as {!Mikmatch.Glob.scan} but file paths are kept as a list of strings that form a valid path when concatenated using [Filename.concat]. Option [rev] can be set if the lists representing paths are in reversed order, i.e. the root comes last. In [lscan action path_filter], options [rev], [absolute], and [path] take their default values which are all false. In this situation, it is guaranteed that the paths that are passed to [action] have the same length as [path_filter]. *) val list : ?absolute:bool -> ?path:bool -> ?root:string -> ?nofollow:bool -> ?sort:bool -> (string -> bool) list -> string list (** [list path_filter] works like {!Mikmatch.Glob.scan} but returns a list of all file paths that match [path_filter]. An example in Mikmatch syntax is [list [FILTER _* ".ml" eos]]. It returns the list of ".ml" files in the current directory. It could have been written as [list [ fun s -> Filename.check_suffix s ".ml"]] and is equivalent to [*.ml] in shell syntax. *) val llist : ?rev:bool -> ?absolute:bool -> ?path:bool -> ?root:string list -> ?nofollow:bool -> ?sort:bool -> (string -> bool) list -> string list list (** [llist path_filter] works like {!Mikmatch.Glob.lscan} but returns a list of all file paths that match [path_filter]. *) end mikmatch-1.0.7/common/mm_util.ml000066400000000000000000000053031227060510600166040ustar00rootroot00000000000000(*pp camlp4orf *) (* $Id$ *) open Camlp4.PreCast open Messages (* General Camlp4 utilities *) let debug s = if !Constants.debug_mode then Printf.eprintf "[debug] %s\n%!" s let list_of_comma_expr e = let rec aux e l = match e with <:expr< $e1$ , $e2$ >> -> aux e1 (aux e2 l) | <:expr< >> -> l | e -> e :: l in aux e [] let list_of_comma_patt p = let rec aux p l = match p with <:patt< $p1$ , $p2$ >> -> aux p1 (aux p2 l) | <:patt< >> -> l | p -> p :: l in aux p [] let list_of_semicolon_patt p = let rec aux p l = match p with <:patt< $p1$ ; $p2$ >> -> aux p1 (aux p2 l) | <:patt< >> -> l | p -> p :: l in aux p [] let list_of_record p = List.map ( fun field -> match field with <:patt< $p1$ = $p2$ >> -> `Normal (p1, p2) | p -> `Other p | _ -> assert false ) (list_of_semicolon_patt p) let comma_expr_of_list _loc = function hd :: tl -> debug "comma_expr_of_list"; List.fold_left ( fun accu e -> <:expr< $accu$ , $e$ >> ) hd tl | [] -> assert false let comma_patt_of_list _loc = function hd :: tl -> debug "comma_patt_of_list"; List.fold_left ( fun accu p -> <:patt< $accu$ , $p$ >> ) hd tl | [] -> assert false let semicolon_patt_of_list _loc = function hd :: tl -> debug "semicolon_patt_of_list"; List.fold_left ( fun accu p -> <:patt< $accu$ ; $p$ >> ) hd tl | [] -> assert false let record_of_list _loc l = debug "record_of_list"; semicolon_patt_of_list _loc (List.map ( function | `Normal (p1, p2) -> <:patt< $p1$ = $p2$ >> | `Other p -> p ) l) let meta_bool = function true -> Ast.BTrue | false -> Ast.BFalse let binding_of_pair _loc (p, e) = debug "binding_of_pair"; <:binding< $p$ = $e$ >> let pair_of_binding = function <:binding< $p$ = $e$ >> -> (p, e) | b -> let _loc = Ast.loc_of_binding b in failure _loc "Failed assertion in Mm_util.pair_of_binding" let list_of_binding b = let rec aux b l = match b with <:binding< $b1$ and $b2$ >> -> aux b1 (aux b2 l) | <:binding< >> -> l | <:binding< $p$ = $e$ >> -> (p, e) :: l | <:binding< $anti: _ $ >> -> failure (Ast.loc_of_binding b) "Antiquotations for let bindings are not supported by mikmatch" in aux b [] let match_case_of_tuple _loc (p, w, e) = debug "match_case_of_tuple"; match w with None -> <:match_case< $p$ -> $e$ >> | Some cond -> <:match_case< $p$ when $cond$ -> $e$ >> let eval_string s = Camlp4.Struct.Token.Eval.string ~strict:() s let eval_char s = Camlp4.Struct.Token.Eval.char s let rec_flag = function true -> Ast.ReRecursive | false -> Ast.ReNil mikmatch-1.0.7/common/regexp_ast.ml000066400000000000000000000163761227060510600173130ustar00rootroot00000000000000(* $Id$ *) (* Abstract syntax tree for regular expressions *) open Camlp4.PreCast type converter = [ `Int | `Float | `Option | `Custom of Ast.expr | `Value of Ast.expr ] module S = Set.Make (String) let list_named_groups set = List.sort String.compare (S.elements set) module Named_groups = struct module M = Map.Make (String) include M let list m = List.sort (fun (a, _) (b, _) -> String.compare a b) (fold (fun key data accu -> let positions = List.sort (fun (loc, i, conv1) (loc, j, conv2) -> Pervasives.compare i j) data in (key, positions) :: accu) m []) let list_keys m = List.sort String.compare (fold (fun key data accu -> key :: accu) m []) let keys m = fold (fun key data accu -> S.add key accu) m S.empty let equal m1 m2 = S.equal (keys m1) (keys m2) let inter m1 m2 = S.inter (keys m1) (keys m2) let union m1 m2 = S.union (keys m1) (keys m2) let diff m1 m2 = S.diff (keys m1) (keys m2) end type named_groups = (Ast.loc * int * converter option) list Named_groups.t let add_new loc name conv group_num set = if Named_groups.mem name set then Messages.multiple_binding loc [name]; Named_groups.add name [loc, group_num, conv] set let add_new_group loc name conv group_num (groups, positions) = (add_new loc name conv group_num groups, positions) let add_new_pos loc name group_num (groups, positions) = (groups, add_new loc name None group_num positions) let merge_lists l1 l2 = let tbl = Hashtbl.create (List.length l1 + List.length l2) in let add l = List.iter (fun ((_, n, conv) as x) -> Hashtbl.replace tbl n x) l in add l1; add l2; let l = Hashtbl.fold (fun _ x l -> x :: l) tbl [] in let cmp (_, x, _) (_, y, _) = compare x y in List.sort cmp l let really_add name l2 set = try let l1 = Named_groups.find name set in Named_groups.add name (merge_lists l1 l2) (Named_groups.remove name set) with Not_found -> Named_groups.add name l2 set let merge set1 set2 = Named_groups.fold really_add set1 set2 type repetition_kind = Star | Option | Plus | Range of (int * int option option) type greediness = bool type ast = Epsilon of Ast.loc | Characters of Ast.loc * Charset.t | Sequence of Ast.loc * ast * ast | Alternative of Ast.loc * ast (* choice 1 *) * ast (* choice 2 *) * S.t (* group names *) * S.t (* position names *) | Repetition of Ast.loc * (repetition_kind * greediness) * ast | Possessive of Ast.loc * ast | Bind of Ast.loc * ast * string * converter option | Bind_pos of Ast.loc * string | Backref of Ast.loc * string | Variable of Ast.loc * Ast.expr | Nocase_variable of Ast.loc * Ast.expr | Special of Ast.loc * string * (string * int option) | Lookahead of Ast.loc * bool * ast | Lookbehind of Ast.loc * bool * ast | Closed of ast let rec loc_of_regexp = function Epsilon loc | Characters (loc, _) | Sequence (loc, _, _) | Alternative (loc, _, _, _, _) | Repetition (loc, _, _) | Possessive (loc, _) | Bind (loc, _, _, _) | Bind_pos (loc, _) | Backref (loc, _) | Variable (loc, _) | Nocase_variable (loc, _) | Special (loc, _, _) | Lookahead (loc, _, _) | Lookbehind (loc, _, _) -> loc | Closed ast -> loc_of_regexp ast let rec bindings : ast -> S.t = function Bind (loc, e, s, conv) -> S.add s (bindings e) | Bind_pos _ | Epsilon _ | Characters _ | Backref _ | Variable _ | Nocase_variable _ | Special _ -> S.empty | Sequence (loc, e1, e2) -> S.union (bindings e1) (bindings e2) | Alternative (loc, e1, e2, set, pos_set) -> set | Repetition (loc, kind, e) -> bindings e | Possessive (loc, e) | Lookahead (loc, _, e) | Lookbehind (loc, _, e) -> bindings e | Closed e -> S.empty let rec pos_bindings : ast -> S.t = function Bind_pos (loc, s) -> S.singleton s | Bind _ | Epsilon _ | Characters _ | Backref _ | Variable _ | Nocase_variable _ | Special _ -> S.empty | Sequence (loc, e1, e2) -> S.union (pos_bindings e1) (pos_bindings e2) | Alternative (loc, e1, e2, set, pos_set) -> pos_set | Repetition (loc, kind, e) -> pos_bindings e | Possessive (loc, e) | Lookahead (loc, _, e) | Lookbehind (loc, _, e) -> pos_bindings e | Closed _ -> S.empty let alternative loc e1 e2 = match e1, e2 with Characters (loc1, s1), Characters (loc2, s2) -> Characters (loc, Charset.union s1 s2) | _ -> let b1 = bindings e1 and b2 = bindings e2 in let pb1 = pos_bindings e1 and pb2 = pos_bindings e2 in Alternative (loc, e1, e2, S.union b1 b2, S.union pb1 pb2) let rec repeat loc e (mini, maxoptopt) = if mini < 0 then Messages.invalid_range loc else match maxoptopt with None -> (match mini with 0 -> Epsilon loc | n -> let rec loop i = if i > 1 then Sequence (loc, e, loop (i-1)) else e in loop n) | Some (Some maxi) -> let diff = maxi - mini in if diff < 0 then Messages.invalid_range loc else if diff = 0 then e else let rec loop i = alternative loc (Epsilon loc) (if i > 1 then (Sequence (loc, e, loop (i-1))) else e) in Sequence (loc, (repeat loc e (mini, None)), loop diff) | Some None -> Sequence (loc, repeat loc e (mini, None), Repetition (loc, (Star, true), e)) let rec nocase = function Bind (loc, e, s, conv) -> Bind (loc, nocase e, s, conv) | Bind_pos _ | Epsilon _ | Backref _ | Nocase_variable _ | Special _ as e -> e | Characters (loc, charset) -> Characters (loc, Charset.nocase charset) | Sequence (loc, e1, e2) -> Sequence (loc, nocase e1, nocase e2) | Alternative (loc, e1, e2, ids, pos_ids) -> Alternative (loc, nocase e1, nocase e2, ids, pos_ids) | Repetition (loc, kind, e) -> Repetition (loc, kind, nocase e) | Possessive (loc, e) -> Possessive (loc, nocase e) | Lookahead (loc, b, e) -> Lookahead (loc, b, nocase e) | Lookbehind (loc, b, e) -> Lookbehind (loc, b, nocase e) | Variable (loc, e) -> Nocase_variable (loc, e) | Closed ast -> Closed (nocase ast) (* Miscellaneous functions *) let explode s = let l = ref [] in for i = String.length s - 1 downto 0 do l := s.[i] :: !l done; !l let of_string loc s = let l = explode s in match l with [c] -> Characters (loc, Charset.singleton c) | _ -> List.fold_right (fun c re -> Sequence (loc, (Characters (loc, Charset.singleton c)), re)) l (Epsilon loc) let as_charset _loc msg = function Characters (_loc, set) -> set | _ -> Messages.failure _loc msg let rec warn_bindings w = function Bind (loc, e, s, conv) -> if w then Messages.not_visible loc [s] "context"; warn_bindings w e | Bind_pos (loc, s) -> if w then Messages.not_visible loc [s] "context" | Epsilon _ | Characters _ | Backref _ | Variable _ | Nocase_variable _ | Special _ -> () | Sequence (loc, e1, e2) -> warn_bindings w e1; warn_bindings w e2 | Alternative (loc, e1, e2, set, pos_set) -> if w then (match list_named_groups (S.union set pos_set) with [] -> () | ignored -> Messages.not_visible loc ignored "context") | Repetition (loc, kind, e) -> warn_bindings w e | Possessive (loc, e) | Lookahead (loc, _, e) | Lookbehind (loc, _, e) -> warn_bindings w e | Closed e -> warn_bindings true e let warnings re = warn_bindings false re mikmatch-1.0.7/common/select_lib.ml000066400000000000000000000025131227060510600172430ustar00rootroot00000000000000(* $Id$ *) open Camlp4.PreCast open Ast let not_implemented _loc = Messages.failure _loc "not implemented" let fail _ = failwith "not implemented" type regexp_args = (string * Ast.expr) list type regexp_source = [ `String of string | `Expr of Ast.expr ] list type regexp_lib = { predefined_regexps : (string * Regexp_ast.ast) list; unfold_range : bool; process_regexp : loc -> sharing:bool -> Regexp_ast.ast -> string -> regexp_args * regexp_source * (Regexp_ast.named_groups * Regexp_ast.named_groups) * (string * expr) list; compile_regexp_match : loc -> regexp_args -> regexp_source -> expr; compile_regexp_search : loc -> regexp_args -> regexp_source -> expr; match_and_bind : loc -> string -> expr -> expr -> (Regexp_ast.named_groups * Regexp_ast.named_groups) -> expr -> expr -> expr; wrap_match : expr -> expr; wrap_user_case : expr -> expr; really_wrap_match : bool; really_wrap_user_case : bool } let dummy = { predefined_regexps = []; unfold_range = false; process_regexp = not_implemented; compile_regexp_match = not_implemented; compile_regexp_search = not_implemented; match_and_bind = not_implemented; wrap_match = fail; wrap_user_case = fail; really_wrap_match = false; really_wrap_user_case = false } mikmatch-1.0.7/common/syntax_common.ml000066400000000000000000000155041227060510600200400ustar00rootroot00000000000000(*pp camlp4orf *) open Printf open Camlp4.PreCast open Syntax open Mm_util open Regexp_ast open Select_lib open Match let regexp = Gram.Entry.mk "regexp";; let regexp_match_case = Gram.Entry.mk "regexp_match_case";; let range = Gram.Entry.mk "range";; let seq _loc e = match e with <:expr< $_$ ; $_$ >> -> <:expr< do { $e$ } >> | _ -> e let extend_common () = (try DELETE_RULE Gram patt: LIDENT END with _rule_not_found -> ()); (try DELETE_RULE Gram expr: "let"; opt_rec; binding; "in"; expr LEVEL ";" END with _rule_not_found -> Messages.cannot_delete_rule "(1)"); (try DELETE_RULE Gram str_item: "let"; opt_rec; binding; "in"; expr END with _rule_not_found -> Messages.cannot_delete_rule "(2)"); (try DELETE_RULE Gram str_item: "let"; opt_rec; binding END with _rule_not_found -> Messages.cannot_delete_rule "(3)"); EXTEND Gram GLOBAL: str_item patt expr regexp regexp_match_case range; str_item: [ [ "RE"; name = LIDENT; "="; re = regexp -> warnings re; Hashtbl.add named_regexps name re; <:str_item< >> ] ]; special_patt: [ [ "RE"; re = regexp -> `Regexp re | "/"; re = regexp; "/" -> `Regexp re | "%"; name = uid_path; arg = OPT patt LEVEL "simple" -> `View (name, arg) ] ]; uid_path: [ [ l = LIST1 [ x = UIDENT -> x ] SEP "." -> match List.rev l with basename :: modname -> (_loc, basename, List.rev modname) | _ -> assert false ] ]; patt: LEVEL "simple" [ [ x = special_patt -> handle_special_patt _loc x ] ]; expr: LEVEL "top" [ [ "let"; o = OPT "rec"; b = binding; "in"; e2 = sequence -> handle_let_bindings _loc (o <> None) (list_of_binding b) (seq _loc e2) ] ]; expr: LEVEL "top" [ [ "let"; LIDENT "view"; name = UIDENT; "="; e1 = expr; "in"; e2 = sequence -> <:expr< let $lid:"view_" ^ name$ = $e1$ in $seq _loc e2$ >> ] ]; expr: LEVEL "top" [ [ "let"; "try"; o = OPT "rec"; l = LIST1 let_binding SEP "and"; "in"; e2 = sequence; "with"; pwel = LIST1 lettry_case SEP "|" -> let_try_in _loc o (List.map pair_of_binding l) (seq _loc e2) pwel ] ]; str_item: LEVEL "top" [ [ "let"; o = OPT "rec"; b = binding -> handle_value_bindings _loc (o <> None) (list_of_binding b) | "let"; o = OPT "rec"; b = binding; "in"; e2 = sequence -> let e = handle_let_bindings _loc (o <> None) (list_of_binding b) (seq _loc e2) in <:str_item< $exp:e$ >> | "let"; LIDENT "view"; name = UIDENT; "="; e1 = expr -> <:str_item< value $lid:"view_" ^ name$ = $e1$ >> | "let"; "try"; o = OPT "rec"; b = binding; "in"; e2 = sequence; "with"; pwel = LIST1 lettry_case SEP "|" -> let e = let_try_in _loc o (list_of_binding b) (seq _loc e2) pwel in <:str_item< $exp:e$ >> ] ]; lettry_case: [ [ p = patt; w = OPT [ "when"; e = expr -> e ]; "->"; e = sequence -> match_case_of_tuple _loc (p, w, <:expr< fun () -> $seq _loc e$ >>) ] ]; regexp_match_case: [ [ x1 = patt; w = OPT [ "when"; e = expr -> e ]; "->"; x2 = sequence -> (_loc, x1, w, seq _loc x2) ] ]; regexp: [ [ r = regexp; "as"; i = LIDENT; conv = OPT [ ":"; s = LIDENT -> (match s with "int" -> `Int | "float" -> `Float | "option" -> `Option | s -> Messages.invalid_converter _loc s) | ":="; e = expr -> `Custom e | "="; e = expr -> `Value e ] -> Bind (_loc, r, i, conv) ] | [ r1 = regexp; "|"; r2 = regexp -> alternative _loc r1 r2 ] | [ r1 = regexp; r2 = regexp -> Sequence (_loc, r1, r2) ] | "postop" NONA [ r = regexp; "*" -> Repetition (_loc, (Star, true), Closed r) | r = regexp; "+" -> Repetition (_loc, (Plus, true), Closed r) | r = regexp; "?" -> Repetition (_loc, (Option, true), Closed r) | r = regexp; "~" -> nocase r | r = regexp; "{"; (rng, rng_loc) = range; "}" -> if !(lib).unfold_range then repeat rng_loc (Closed r) rng else Repetition (_loc, (Range rng, true), Closed r) ] | "binop" LEFTA [ r1 = regexp; "#"; r2 = regexp -> let msg = " term is not a set of characters" in let set1 = Regexp_ast.as_charset _loc ("left" ^ msg) r1 in let set2 = Regexp_ast.as_charset _loc ("right" ^ msg) r2 in Characters (_loc, Charset.diff set1 set2) ] | "preop" NONA [ "!"; ident = LIDENT -> Backref (_loc, ident) | "@"; e = expr LEVEL "." -> Variable (_loc, e) ] | "simple" NONA [ "["; set = charset; "]" -> Characters (_loc, set) | s = string -> s | name = LIDENT -> find_named_regexp _loc name | "%"; name = LIDENT -> Bind_pos (_loc, name) | "("; r = regexp; ")" -> r ] ]; string: [ [ s = STRING -> Regexp_ast.of_string _loc (eval_string s) ] | [ c = CHAR -> Characters (_loc, Charset.singleton (eval_char c)) ] ]; charset: [ [ "^"; x = charset -> Charset.complement x ] | [ c1 = CHAR; "-"; c2 = CHAR -> Charset.range (eval_char c1) (eval_char c2) | c = CHAR -> Charset.singleton (eval_char c) | s = STRING -> Charset.of_string (eval_string s) | name = LIDENT -> Regexp_ast.as_charset _loc "not a set of characters" (find_named_regexp _loc name) | set1 = charset; set2 = charset -> Charset.union set1 set2 ] ]; range: [ [ mini = INT; maxi = OPT ["-"; maxi = OPT [ maxi = INT -> int_of_string maxi ] -> maxi] -> let mini = int_of_string mini in (mini, maxi), _loc | mini = INT; "+" -> (int_of_string mini, Some None), _loc ] ]; (* Reserved identifiers in patterns *) patt: LEVEL "simple" [ [ s = LIDENT -> if Match.is_reserved s then Messages.reserved_identifier _loc Constants.reserved_prefix s else <:patt< $lid:s$ >> ] ]; END;; let extend_regular () = extend_common (); (try DELETE_RULE Gram expr: "function"; match_case END with _rule_not_found -> Messages.cannot_delete_rule "(reg 1)"); (try DELETE_RULE Gram expr: "match"; sequence; "with"; match_case END with _rule_not_found -> Messages.cannot_delete_rule "(reg 2)"); EXTEND Gram expr: LEVEL "top" [ [ "match"; target = sequence; "with"; OPT "|"; cases = LIST1 regexp_match_case SEP "|" -> output_match _loc (seq _loc target) cases | "try"; e = expr; "with"; OPT "|"; cases = LIST1 regexp_match_case SEP "|" -> output_try _loc e cases | "function"; OPT "|"; cases = LIST1 regexp_match_case SEP "|" -> output_function _loc cases ] ]; END let () = init_named_regexps (); Camlp4.Options.add "-tailrec" (Arg.Set tailrec) " produce code that preserves tail-recursivity (default)"; Camlp4.Options.add "-direct" (Arg.Clear tailrec) " produce code that does not try to preserve tail-recursivity"; (* How to test if the current syntax is the regular or revised one? *) extend_regular () mikmatch-1.0.7/common/top_declaration.ml000066400000000000000000000005561227060510600203120ustar00rootroot00000000000000let add, flush = let accu = ref [] in let add str_item = accu := str_item :: !accu in let flush () = let result = <:str_item< $ List.rev !accu $ >> in accu := []; result in add, flush let install_syntax () = EXTEND Camlp4.PreCast.Gram GLOBAL: str_item; str_item: FIRST [ [ si = NEXT -> add si; flush () ] ]; END mikmatch-1.0.7/doc/000077500000000000000000000000001227060510600140605ustar00rootroot00000000000000mikmatch-1.0.7/doc/Makefile000066400000000000000000000007621227060510600155250ustar00rootroot00000000000000all: mikmatch-ocamldoc/index.html camlmix -o mikmatch-manual.tex -clean mikmatch-manual.tex.mlx pdflatex -interaction=nonstopmode mikmatch-manual pdflatex -interaction=nonstopmode mikmatch-manual hevea -fix macros.hva mikmatch-manual mikmatch-ocamldoc/index.html: ../common/mikmatch.mli ocamldoc -d mikmatch-ocamldoc -html $< ocamldoc $< -o mikmatch-ocamldoc.tex -latex -noheader -notoc -notrailer clean: rm -f *~ *.haux *.log *.aux *.toc *.pdf *.html *.out *.htoc \ mikmatch-ocamldoc/* mikmatch-1.0.7/doc/macros.hva000066400000000000000000000004351227060510600160460ustar00rootroot00000000000000\let\oldmeta=\@meta \renewcommand{\@meta}{% \oldmeta \begin{rawhtml} \end{rawhtml}} mikmatch-1.0.7/doc/mikmatch-manual.tex.mlx000066400000000000000000000726551227060510600204700ustar00rootroot00000000000000% -*- mode: latex -*- \documentclass[a4paper,12pt]{article} \usepackage{ae} \usepackage{hyperref} \usepackage{hevea} \usepackage{alltt} \usepackage[latin1]{inputenc} \usepackage[T1]{fontenc} \usepackage{fullpage} \usepackage{url} \usepackage{ocamldoc} \title{Mikmatch Version~## Sys.command "../VERSION" ##\\ Reference Manual} \author{Martin Jambon} \date{October 21, 2011} \setcounter{secnumdepth}{5} \setcounter{tocdepth}{4} \newcommand{\toplevelwarning}[0]{% {\footnotesize [toplevel support not available in Camlp4 3.10]}% } \begin{document} \maketitle This manual is available online as a single HTML file at\\ \url{http://mjambon.com/mikmatch-manual.html}\\ and as a PDF document at\\ \url{http://mjambon.com/mikmatch-manual.pdf}.\\ The home page of Mikmatch is:\\ \url{http://mjambon.com/micmatch.html} \tableofcontents \section{Introduction} Mikmatch is an extension of the syntax of the Objective Caml programming language (OCaml). Its purpose it to make the use of regular expressions easier and more generally to provide a set of tools for using OCaml as a powerful scripting language. Mikmatch believes that regular expressions are just like any other program and deserve better than a cryptic sequence of symbols placed in a string of a master program. Mikmatch currently supports two different libraries that implement regular expressions: Str which comes with the original distribution of OCaml and PCRE-OCaml which is an interface to PCRE (Perl Compatible Regular Expressions) for OCaml. These two flavors will be referred as Mikmatch\_str and Mikmatch\_pcre. They share a large number of syntaxic features, but Mikmatch\_pcre provides several macros that cannot be implemented safely in Mikmatch\_str. Therefore, it is recommended to use Mikmatch\_pcre. \section{Language} \subsection{Regular expressions} \subsubsection{Grammar of the regular expressions} Regular expressions support the syntax of Ocamllex regular expressions as of version 3.08.1 of the Objective Caml system (\url{http://caml.inria.fr/pub/docs/manual-ocaml/}), and several additional features. A regular expression (\textit{regexp}) is defined by the grammar that follows. The associativity rules are given by priority levels. 0 is the strongest priority. \begin{itemize} \item \underline{\textit{char-literal}} Match the given character (priority 0). \item \underline{\textbf{\_}} (underscore) Match any character (priority 0). \item \underline{\textit{string-literal}} Match the given sequence of characters (priority 0). \item \underline{\textbf{$[$}\textit{set-of-characters}\textbf{$]$}} Match one of the characters given by \textit{set-of-characters} (priority 0). The grammar for \textit{set-of-characters} is the following: \begin{itemize} \item \underline{\textit{char-literal}\textbf{$-$}\textit{char-literal}} defines a range of characters according to the iso-8859-1 encoding (includes ASCII). \item \underline{\textit{char-literal}} defines a singleton (a set containing just this character). \item \underline{\textit{string-literal}} defines a set that contains all the characters present in the given string. \item \underline{\textit{lowercase-identifier}} is replaced by the corresponding predefined regular expression; this regular expression must be exactly of length 1 and therefore represents a set of characters. \item \underline{\textit{set-of-characters} \textit{set-of-characters}} defines the union of two sets of characters. \end{itemize} \item \underline{\textit{regexp} \textbf{\#} \textit{regexp}} Match any of the characters given by the first regular expression except those which are given by the second one. Both regular expressions must be of length 1 and thus stand for a set of characters (priority 0). \item \underline{\textbf{$[$\^{}}\textit{set-of-characters}\textbf{$]$}} Same as \textbf{\_ \#} \textbf{$[$}\textit{set-of-characters}\textbf{$]$} (priority 0). \item \underline{\textit{regexp} \textbf{*}} Match the pattern given by \textit{regexp} 0 time or more (priority 0). \item \underline{\textit{regexp} \textbf{+}} Match the pattern given by \textit{regexp} 1 time or more (priority 0). \item \underline{\textit{regexp} \textbf{?}} Match the pattern given by \textit{regexp} at most once (priority 0). \item \underline{\textit{regexp}\textbf{\{}\textit{m\textbf{$-$}n}\textbf{\}}} Match \textit{regexp} at least \textit{m}~times and up to \textit{n}~times. \textit{m} and~\textit{n} must be integer literals (priority 0). \item \underline{\textit{regexp}\textbf{\{}\textit{n}\textbf{\}}} Same as \textit{regexp}\textbf{\{}\textit{n\textbf{$-$}n}\textbf{\}} (priority 0). \item \underline{\textit{regexp}\textbf{\{}\textit{n}\textbf{$+$\}}} Same as \textit{regexp}\textbf{\{}\textit{n}\textbf{\}}\textit{regexp}\textbf{$*$} (priority 0). \item \underline{\textit{regexp}\textbf{\{}\textit{n}\textbf{$-$\}}} Deprecated. Same as \textit{regexp}\textbf{\{}\textit{n}\textbf{$+$\}} (priority 0). \item \underline{\textbf{(} \textit{regexp} \textbf{)}} Match \textit{regexp} (priority 0). \item \underline{\textit{regexp} \textbf{\~{}}} Case insensitive match of the given regular expression \textit{regexp} according to the conventions of Objective Caml, i.e. according to the representation of characters in the iso-8859-1 standard (latin1) (priority 0). \item \underline{\textit{regexp} \textit{regexp}} Match the first regular expressions and then the second one (priority 1). \item \underline{\textit{regexp} \textbf{|} \textit{regexp}} Match one of these two regular expressions (priority 2). \item \underline{\textit{regexp} \textbf{as} \textit{lowercase-identifier}} Give a name to the substring that will be matched by the given pattern. This string becomes available under this name (priority 3). In-place conversions of the matched substring can be performed using one these three mechanisms: \begin{itemize} \item \underline{\textit{regexp} \textbf{as} \textit{lowercase-identifier} \textbf{:} \textit{built-in-converter}} where \textit{built-in-converter} is one of \texttt{int}, \texttt{float} or \texttt{option}. \texttt{int} behaves as \texttt{int\_of\_string}, \texttt{float} behaves as \texttt{float\_of\_string}, and \texttt{option} encapsulate the substring in an object of type \texttt{string option} using an equivalent of \texttt{function "" -> None | s -> Some s} \item \underline{\textit{regexp} \textbf{as} \textit{lowercase-identifier} \textbf{:=} \textit{converter}} where \textit{converter} is any function which converts a string into something else. \item \underline{\textit{regexp} \textbf{as} \textit{lowercase-identifier} \textbf{=} \textit{expr}} where \textit{expr} is any OCaml expression, usually a constant, which assigns a value to \textit{lowercase-identifier} without knowing which substring it matches. \end{itemize} \item \underline{\textbf{\%} \textit{lowercase-identifier}} Give a name to the position in the string that is being matched. This position becomes available as an int under this name. \item \underline{\textbf{@} \textit{expr}} Match the string given by \textit{expr}. \textit{expr} can be any OCaml expression of type string. Parentheses will be needed around \textit{expr} if it is a function application, or any construct of equivalent or lower precedence (see the Objective Caml manual, chapter ``The Objective Caml language'', section ``Expressions''). Matching such patterns is not thread-safe in any of the current implementations. Expressions that contain @ patterns should be protected against concurrent accesses. \end{itemize} \subsubsection{Named regular expressions} Naming regular expressions is possible using the following toplevel construct:\\ \underline{\textbf{RE} \textit{ident} \textbf{=} \textit{regexp}}\\ where \textit{ident} is a lowercase identifier. Regular expressions share their own namespace. For instance, we can define a phone number as a sequence of 3~digits followed by a dash and followed by 4~digits: \begin{verbatim} RE digit = ['0'-'9'] RE phone = digit{3} '-' digit{4} \end{verbatim} \subsubsection{Predefined sets of characters} The POSIX character classes (sets of characters) are available as predefined regular expressions of length 1. Their definition is given in table~\ref{posix-classes}. \begin{table} \caption{\label{posix-classes} POSIX character classes and their definition in the Mikmatch syntax} \tt \begin{tabular}{l} RE lower = \verb!['a'-'z']!\\ RE upper = \verb!['A'-'Z']!\\ RE alpha = lower | upper\\ RE digit = \verb!['0'-'9']!\\ RE alnum = alpha | digit\\ RE punct = \verb=["!\"#$%&'()*+,-./:;<=\verb!=>?@[\\]^_`{|}~"]! \\ %$ RE graph = alnum | punct\\ RE print = graph | ' '\\ RE blank = \verb!' ' | '\t'!\\ RE cntrl = \verb!['\x00'-'\x1F' '\x7F']!\\ RE xdigit = \verb![digit 'a'-'f' 'A'-'F']!\\ RE space = \verb![blank "\n\x0B\x0C\r"]!\\ \end{tabular} \end{table} \subsubsection{More predefined patterns} Some named regexps are predefined and available in every implementation of Mikmatch. These are the following: \begin{itemize} \item \texttt{int}: matches an integer (see table~\ref{predefined-regexps}). It accepts a superset of the integer literals that are produced with the OCaml standard function \texttt{string\_of\_int}. \item \texttt{float}: matches a floating-point number (see table~\ref{predefined-regexps}). It accepts a superset of the float literals that are produced with the OCaml standard function \texttt{string\_of\_float}. \end{itemize} \begin{table} \caption{\label{predefined-regexps} Predefined regexps in Mikmatch} \tt \begin{verbatim} RE int = ["-+"]? ( "0" ( ["xX"] xdigit+ | ["oO"] ['0'-'7']+ | ["bB"] ["01"]+ ) | digit+ ) RE float = ["-+"]? ( ( digit+ ("." digit* )? | "." digit+ ) (["eE"] ["+-"]? digit+ )? | "nan"~ | "inf"~ ) \end{verbatim} \end{table} \subsection{General pattern matching} \subsubsection{Regexps and match/function/try constructs} In Mikmatch, regular expressions can be used to match strings instead of the regular patterns. In this case, the regular expression must be preceded by the \textbf{RE} keyword, or placed between slashes (\textbf{/}\dots\textbf{/}). Both notations are equivalent. Only the following constructs support patterns that contain regular expressions: \begin{itemize} \item \textbf{match} \dots\ \textbf{with} \textit{pattern} \textbf{->} \dots \item \textbf{function} \textit{pattern} \textbf{->} \dots \item \textbf{try} \dots\ \textbf{with} \textit{pattern} \textbf{->} \dots \end{itemize} Examples: \begin{verbatim} let is_num = function RE ['0'-'9']+ -> true | _ -> false let get_option () = match Sys.argv with [| _ |] -> None | [| _; RE (['a'-'z']+ as key) "=" (_* as data) |] -> Some (key, data) | _ -> failwith "Usage: myprog [key=value]" let option = try get_option () with Failure RE "usage"~ -> None \end{verbatim} If alternatives are used in a pattern, then both alternatives must define the same set of identifiers. In the following example, the string \texttt{code} can either come from the normal pattern matching or be a fresh substring which was extracted using the regular expression: \begin{verbatim} match option, s with Some code, _ | None, RE _* "=" (['A'-'Z']['0'-'9'] as code) -> print_endline code | _ -> () \end{verbatim} In the general case, it is not possible to check in advance if the pattern-matching cases are complete if at least one of the patterns is a regular expression. In this case, no warnings against missing cases are displayed, thus it is safer to either add a catch-all case like in the previous examples or to catch the \texttt{Match\_failure} exception that can be raised unexpectedly. \subsubsection{Views (experimental feature)} Views are a general form of symbolic patterns other than those authorized by the concrete structure of data. For example, \texttt{Positive} could be a view for positive ints. View patterns can also bind variables and a useful example in OCaml is pattern-matching over lazy values. Here we propose simple views, as suggested by Simon Peyton Jones for Haskell:\\ \url{http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns}. We propose a different syntax, but note that the syntax that we have chosen here is experimental and may change slightly in future releases. \paragraph{View patterns} A view pattern has one of these two forms: \begin{enumerate} \item \underline{\textbf{\%} \textit{view-name}}: a view without an argument. It is a simple check over the subject data. \item \underline{\textbf{\%} \textit{view-name} \textit{pattern}}: a view with an argument, the pattern. If the view function matches successfully, its result is matched against the given pattern. \end{enumerate} where a \textit{view-name} is a capitalized alphanumeric identifier, possibly preceded by a module path specification, e.g. \texttt{Name} or \texttt{Module.Name}. \paragraph{Definition of a view} Views without arguments are defined as functions of type \texttt{'a -> bool}, while views with arguments are defined as functions of type \texttt{'a -> 'b option}. The syntax for defining a view is: \begin{itemize} \item \underline{\textbf{let} \textbf{view} \textit{uppercase-identifier} \textbf{=} \textit{expression}} \item \underline{\textbf{let} \textbf{view} \textit{uppercase-identifier} \textbf{=} \textit{expression} \textbf{in} \textit{expression}} \end{itemize} Using the syntax above is however not strictly needed, since it just defines a function named after the name of the view, and prefixed by \texttt{view\_}. For instance \texttt{let view X = f} can be written as \texttt{let view\_X = f} in regular OCaml. Therefore, some library modules can export view definitions without using any syntax extension themselves. \paragraph{Example} \begin{verbatim} (* The type of lazy lists *) type 'a lazy_list = Nil | Cons of ('a * 'a lazy_list lazy_t) (* Definition of a view without argument for the empty list *) let view Nil = fun l -> try Lazy.force l = Nil with _ -> false (* Independent definition of a view with an argument, the head and tail of the list *) let view Cons = fun l -> try match Lazy.force l with Cons x -> Some x | Nil -> None with _ -> None (* Test *) let _ = let l = lazy (Cons (1, lazy (Cons (2, lazy Nil)))) in match l with %Nil | %Cons (_, %Nil) -> assert false | %Cons (x1, %Cons (x2, %Nil)) -> assert (x1 = 1); assert (x2 = 2); Printf.printf "Passed view test\n%!" | _ -> assert false \end{verbatim} \paragraph{Limitations} Each time a value is tested against a view pattern, the corresponding function is called. There is no optimization that would avoid calling the view function twice on the same argument. Redundant or missing cases cannot be checked, just like when there is a regexp in a pattern. This is due both to our definition of views and to the implementation that we get using Camlp4. \subsection{Shortcut for one-case regexp matching} A shortcut notation can be used to extract substrings from a string that match a pattern which is known in advance:\\ \textbf{let /}\textit{regexp}\textbf{/} \textbf{=} \textit{expr} \textbf{in} \textit{expr} Global declarations also support this shortcut:\\ \textbf{let /}\textit{regexp}\textbf{/} \textbf{=} \textit{expr}\\ Example \toplevelwarning: \begin{verbatim} # Sys.ocaml_version;; - : string = "3.08.3" # RE int = digit+;; # let /(int as major : int) "." (int as minor : int) ("." (int as patchlevel) | ("" as patchlevel)) ("+" (_* as additional_info) | ("" as additional_info))/ = Sys.ocaml_version ;; val additional_info : string = "" val major : int = 3 val minor : int = 8 val patchlevel : string = "3" \end{verbatim} The notation does not allow simultaneous definitions using the \textbf{and} keyword nor recursive definitions using \textbf{rec}. As usual, the \texttt{Match\_failure} exception is raised if the string fails to match the pattern. The let-try-in-with construct described in the next section also supports regexp patterns, with the same restrictions. \subsection{The let-try-in-with construct} A general notation for catching exceptions that are raised during the definition of bindings is provided:\\ \textbf{let} \textbf{try} [\textbf{rec}] \textit{let-binding} \{\textbf{and} \textit{let-binding}\} \textbf{in}\\ \verb! !\textit{expr}\\ \textbf{with} \textit{pattern-matching} It has the same meaning as:\\ \textbf{try} \textbf{let} [\textbf{rec}] \textit{let-binding} \{\textbf{and} \textit{let-binding}\} \textbf{in}\\ \verb! !\textit{expr}\\ \textbf{with} \textit{pattern-matching}\\ except that in the former case only the exceptions raised by the \textit{let-binding}s are handled by the exception handler introduced by \textbf{with}. \subsection{Implementation-dependent features} These features depend on which library is actually used internally for manipulating regular expressions. Currently two libraries are supported: the Str library from the official OCaml distribution and the PCRE-OCaml library. Support for other libraries might be added in the future. \subsubsection{Backreferences} \label{backref} Previously matched substrings can be matched again using backreferences. \underline{\textbf{!}\textit{ident}} is a backreference to the named group \textit{ident} that is defined previously in the sequence. During the matching process, it is not possible that a backreference refers to a named group which is not matched. In the following example, we extract the repeated pattern \texttt{abc} from \texttt{abcabc} \toplevelwarning: \begin{verbatim} # match "abcabc" with RE _* as x !x -> x;; - : string = "abc" \end{verbatim} \subsubsection{Specificities of Mikmatch\_str} Backreferences as described previously (section~\ref{backref}) are supported. In addition to the POSIX character classes, a set of predefined patterns is available: \begin{itemize} \item \underline{bol} matches at beginning of line (either at the beginning of the matched string, or just after a newline character). \item \underline{eol} matches at end of line (either at the end of the matched string, or just before a newline character). \item \underline{any} matches any character except newline. \item \underline{bnd} matches word boundaries. \end{itemize} \subsubsection{Specificities of Mikmatch\_pcre} This is currently the version which is used by the \verb$mikmatch$ command. \paragraph{Matching order} Alternatives (\textit{regexp1}|\textit{regexp2}) are tried from left to right. The quantifiers (\verb$*$, \verb$+$, \verb$?$ and \verb${$\dots\verb$}$) are greedy except if specified otherwise (see next paragraph). The regular expressions are matched from left to right, and the repeated patterns are matched as many times as possible before trying to match the rest of the regular expression and either succeed or give up one repetition before retrying (backtracking). \paragraph{Greediness and laziness} Normally, quantifiers (\verb$*$, \verb$+$, \verb$?$ and \verb${$\dots\verb$}$) are greedy, i.e. they perform the longest match in terms of number of repetitions before matching the rest of the regular expression or backtracking. The opposite behavior is laziness: in that case, the number of repetitions is made minimal before trying to match the rest of the regular expression and either succeed or continue with one more repetition. The lazy behavior is turned on by placing the keyword \verb$Lazy$ after the quantifier. This is the equivalent of Perl's quantifiers \verb$*?$, \verb$+?$, \verb$??$ and \verb${$\dots\verb$}?$. For instance, compare the following behaviors \toplevelwarning: \begin{verbatim} # match "" with RE "<" (_* as contents) ">" -> contents;; - : string = "hello>" with RE "<" (_* Lazy as contents) ">" -> contents;; - : string = "hello" \end{verbatim} \paragraph{Possessiveness or atomic grouping} Sometimes it can be useful to prevent backtracking. This is achieved by placing the \verb$Possessive$ keyword after a given group. For instance, compare the following \toplevelwarning: \begin{verbatim} # match "abc" with RE _* _ -> true | _ -> false;; - : bool = true # match "abc" with RE _* Possessive _ -> true | _ -> false;; - : bool = false \end{verbatim} This operator has the strongest associativity priority (0), just like the quantifiers. \paragraph{Backreferences} Backreferences are supported as described in section~\ref{backref}. \paragraph{Predefined patterns} The following predefined patterns are available in addition to the POSIX character classes: \begin{itemize} \item \underline{bos} matches at beginning of the matched string. \item \underline{eos} matches at the end of the matched string. \item \underline{bol} matches at beginning of line (either at the beginning of the matched string, or just after a newline character). \item \underline{eol} matches at end of line (either at the end of the matched string, or just before a newline character). \item \underline{any} matches any character except newline. \end{itemize} \paragraph{Lookaround assertions} A lookaround assertion is a pattern that has to be matched but doesn't consume characters in the string being matched. Lookahead assertions are checked after the current position in the string, and lookbehind assertions are matched before the current point. The general syntax for an assertion is the following: \\ \fbox{\texttt{<} \textit{lookbehind} \texttt{.} \textit{lookahead} \texttt{>}}\\ \fbox{\texttt{<} \textit{lookahead} \texttt{>}}\\ The central dot symbolizes the current position. The \textit{lookbehind} assertion is a test over the characters at the left of the current point, while the \textit{lookahead} is a test over the characters at the right of the current point in the string. \textit{lookbehind} or \textit{lookahead} are either empty or a regular expression, optionally preceded by \texttt{Not}. An assertion starting with \texttt{Not} is called negative and means that the given regular expression can not match here. There are no restrictions on the contents of lookahead regular expressions. Lookbehind regular expressions are restricted to those that match substrings of length that can be predetermined. Besides this, backreferences are not supported in lookbehind expressions. \paragraph{Macros} This implementation provides a set of macros that follow this syntax:\\ \fbox{\textit{MACRO-NAME} \textit{regexp} \texttt{->} \textit{expr}}\\ where \textit{expr} is the expression that will be computed every time the pattern given by \textit{regexp} is matched. Only the \verb$SPLIT$ and \verb$FILTER$ macros follows a simplified syntax:\\ \fbox{\textit{MACRO-NAME} \textit{regexp}} These constructs build a function which accepts some optional arguments and the string to match. For instance, \\ \verb$(REPLACE "," -> ";") "a,b,c"$\\ returns \verb$"a;b;c"$ whereas\\ \verb$(REPLACE "," -> ";") ~pos:2 "a,b,c"$\\ returns \verb$"a,b;c"$ The possible options are the following: \begin{itemize} \item \verb$pos$ has type \verb$int$ and indicates that matching or searching must start from this position in the string. Its default value is always 0 (beginning of the string). \item \verb$full$ is a boolean that defines whether split operations must ignore empty fragments before the first matched pattern or the last matched pattern in the string. The default value is \verb$true$ for \verb$MAP$ and \verb$false$ for \verb$SPLIT$. \item \verb$share$ is a potentially unsafe option which allows the reuse of some mutable data which are associated to a given regular expression. This may make the program slightly faster, but should generally not be used in multi-threaded programs or in libraries. \end{itemize} \fbox{\texttt{MATCH} \textit{regexp} \texttt{->} \textit{expr}}\\ tries to match the pattern \textit{regexp} at the beginning of the string or at the given position \verb$pos$ and returns \textit{expr} or raise \verb$Not_found$. Options: \verb$pos$ (0), \verb$share$ (false). When \verb$pos$ and \verb$share$ are not specified, it is equivalent to: \begin{alltt} function RE \textit{regexp} -> \textit{expr} | _ -> raise Not_found \end{alltt} \fbox{\texttt{REPLACE} \textit{regexp} \texttt{->} \textit{expr}}\\ returns a string in which every occurrence of the pattern is replaced by \textit{expr}. Options: \verb$pos$ (0). \fbox{\texttt{REPLACE\_FIRST} \textit{regexp} \texttt{->} \textit{expr}}\\ returns a string in which the first occurrence of the pattern is replaced by \textit{expr}. A copy of the input string is returned if the pattern is not found. Options: \verb$pos$ (0). \fbox{\texttt{SEARCH} \textit{regexp} \texttt{->} \textit{expr}}\\ simply evaluates \textit{expr} every time the pattern is matched. Options: \verb$pos$ (0). \fbox{\texttt{SEARCH\_FIRST} \textit{regexp} \texttt{->} \textit{expr}}\\ simply evaluates \textit{expr} the first time the pattern is matched and returns the result. Exception \texttt{Not\_Found} is raised if the pattern is not matched. Options: \verb$pos$ (0), \verb$share$ (false). \fbox{\texttt{COLLECT} \textit{regexp} \texttt{->} \textit{expr}}\\ evaluates \textit{expr} every time the pattern is matched and puts the result into a list. Options: \verb$pos$ (0). \fbox{\texttt{COLLECTOBJ} \textit{regexp}}\\ like \texttt{COLLECT}, but the elements of the returned list are automatically objects with methods that correspond to the subgroups captured with \texttt{as}. Options: \verb$pos$ (0). \fbox{\texttt{SPLIT} \textit{regexp}}\\ splits the given string using \textit{regexp} as a delimiter. Options: \verb$pos$ (0), \verb$full$ (false). \fbox{\texttt{FILTER} \textit{regexp}}\\ creates a predicate that returns true is the given string matches \textit{regexp} or false otherwise. Options: \verb$pos$ (0), \verb$share$ (false). \fbox{\texttt{CAPTURE} \textit{regexp}}\\ returns \texttt{Some o} where \texttt{o} is an object with methods that correspond to the captured subgroups, or \texttt{None} if the subject string doesn't match \textit{regexp}. Options: \verb$pos$ (0), \verb$share$ (false). \fbox{\texttt{MAP} \textit{regexp} \texttt{->} \textit{expr}}\\ splits the given string into fragments: the fragments that do not match the pattern are returned as \texttt{`Text s} where s is a string. Fragments that match the pattern are replaced by the result of \textit{expr}, which has to be a polymorphic variant. Options: \verb$pos$ (0), \verb$full$ (true). For instance, \\ \verb$(MAP ',' -> `Sep) "a,b,c,"$\\ returns the list\\ \verb$[`Text "a"; `Sep; `Text "b"; `Sep; `Text "c"; `Sep; `Text ""]$\\ whereas \\ \verb$(MAP ',' -> `Sep) ~full:false "a,b,c,"$\\ returns only\\ \verb$[`Text "a"; `Sep; `Text "b"; `Sep; `Text "c"; `Sep]$ \section{Tools} \subsection{Micmatch, Mikmatch, old Camlp4, new Camlp4, Camlp5} Camlp4/Camlp5 is the set of tools that allows to build and use syntax extensions of OCaml. We distinguish 3 major variants of Camlp4: \begin{itemize} \item The ``old Camlp4'' is Camlp4 as distributed with OCaml until version 3.09.3. \item Camlp5 is an independent branch of the old Camlp4, compatible with at least the 3.09 and 3.10 release lines of OCaml. It is close to 100\% compatible with the old Camlp4. \item The new Camlp4 or just Camlp4 shares the same goals as the old Camlp4 and Camlp5, but is largely incompatible with them. It is included in the core OCaml distribution starting from OCaml 3.10 and replaces the old Camlp4. \end{itemize} Micmatch is the name of the original implementation of Mikmatch for the old Camlp4: \begin{itemize} \item Micmatch < 1.0 requires the old Camlp4. \item Micmatch $\geq$ 1.0 requires Camlp5. \item Mikmatch requires the new Camlp4. \end{itemize} \subsection{The toplevel} \toplevelwarning \subsection{The libraries for the preprocessor} \subsubsection{Mikmatch\_str} The preprocessing library \texttt{pa\_mikmatch\_str.cma} must be loaded by the preprocessor (\texttt{camlp4o} or \texttt{camlp4r}). It is safe to use Mikmatch\_str in multithreaded programs without locks only if the patterns do not contain the @ keyword because it uses a shared cache of compiled regexps. \subsubsection{Mikmatch\_pcre} The preprocessing library \texttt{pa\_mikmatch\_pcre.cma} must be loaded by the preprocessor (\texttt{camlp4o} or \texttt{camlp4r}). It is safe to use Mikmatch\_str in multithreaded programs without locks only if the patterns do not contain the @ keyword because it uses a shared cache of compiled regexps. \subsection{The runtime libraries} Both variants depend on portable features of the Unix library. The executables must therefore be linked against \texttt{unix.cma} (bytecode) or \texttt{unix.cmxa} (native code) in addition to the specific libraries mentioned below. \subsubsection{Mikmatch\_str} In addition to the backend for the regular expressions engine (\texttt{str.cma} for bytecode or \texttt{str.cmxa} for native code), the OCaml code which is produced by the preprocessor needs to be linked against either \texttt{run\_mikmatch\_str.cma} (bytecode), \texttt{run\_mikmatch\_str.cmxa} (native code), \texttt{run\_mikmatch\_str\_mt.cma} (bytecode, threads) or \texttt{run\_mikmatch\_str\_mt.cmxa} (native code, threads). \subsubsection{Mikmatch\_pcre} In addition to the backend for the regular expressions engine (\texttt{pcre.cma} for bytecode or \texttt{pcre.cmxa} for native code), the OCaml code which is produced by the preprocessor needs to be linked against either \texttt{run\_mikmatch\_pcre.cma} (bytecode), \texttt{run\_mikmatch\_pcre.cmxa} (native code). Multithreaded programs are supported as well and do not require a specific library. \begin{latexonly} \input{mikmatch-ocamldoc.tex} \end{latexonly} \begin{htmlonly} \section{A small text-oriented library} \begin{rawhtml} Module Mikmatch \end{rawhtml} \end{htmlonly} \end{document} mikmatch-1.0.7/doc/ocamldoc.sty000066400000000000000000000037741227060510600164150ustar00rootroot00000000000000 %% Support macros for LaTeX documentation generated by ocamldoc. %% This file is in the public domain; do what you want with it. \NeedsTeXFormat{LaTeX2e} \ProvidesPackage{ocamldoc} [2001/12/04 v1.0 ocamldoc support] \newenvironment{ocamldoccode}{% \bgroup \leftskip\@totalleftmargin \rightskip\z@skip \parindent\z@ \parfillskip\@flushglue \parskip\z@skip %\noindent \@@par\smallskip \@tempswafalse \def\par{% \if@tempswa \leavevmode\null\@@par\penalty\interlinepenalty \else \@tempswatrue \ifhmode\@@par\penalty\interlinepenalty\fi \fi} \obeylines \verbatim@font \let\org@prime~% \@noligs \let\org@dospecials\dospecials \g@remfrom@specials{\\} \g@remfrom@specials{\{} \g@remfrom@specials{\}} \let\do\@makeother \dospecials \let\dospecials\org@dospecials \frenchspacing\@vobeyspaces \everypar \expandafter{\the\everypar \unpenalty}} {\egroup\par} \def\g@remfrom@specials#1{% \def\@new@specials{} \def\@remove##1{% \ifx##1#1\else \g@addto@macro\@new@specials{\do ##1}\fi} \let\do\@remove\dospecials \let\dospecials\@new@specials } \newenvironment{ocamldocdescription} {\list{}{\rightmargin0pt \topsep0pt}\raggedright\item\noindent\relax\ignorespaces} {\endlist\medskip} \newenvironment{ocamldoccomment} {\list{}{\leftmargin 2\leftmargini \rightmargin0pt \topsep0pt}\raggedright\item\noindent\relax} {\endlist} \let \ocamldocparagraph \paragraph \def \paragraph #1{\ocamldocparagraph {#1}\noindent} \let \ocamldocsubparagraph \subparagraph \def \subparagraph #1{\ocamldocsubparagraph {#1}\noindent} \let\ocamldocvspace\vspace \newenvironment{ocamldocindent}{\list{}{}\item\relax}{\endlist} \newenvironment{ocamldocsigend} {\noindent\quad\texttt{sig}\ocamldocindent} {\endocamldocindent\vskip -\lastskip \noindent\quad\texttt{end}\medskip} \newenvironment{ocamldocobjectend} {\noindent\quad\texttt{object}\ocamldocindent} {\endocamldocindent\vskip -\lastskip \noindent\quad\texttt{end}\medskip} \endinput mikmatch-1.0.7/pcre/000077500000000000000000000000001227060510600142445ustar00rootroot00000000000000mikmatch-1.0.7/pcre/META.template000066400000000000000000000007221227060510600165300ustar00rootroot00000000000000name = "mikmatch_pcre" version = "VERSION" description = "Pattern matching extended with regexps in Ocamllex syntax" requires = "camlp4 pcre unix" requires(toploop) += "tophide" archive(syntax,toploop) = "pa_mikmatch_pcre.cma run_mikmatch_pcre.cma" archive(syntax,create_toploop) = "pa_mikmatch_pcre.cma run_mikmatch_pcre.cma" archive(syntax,preprocessor) = "pa_mikmatch_pcre.cma" archive(byte) = "run_mikmatch_pcre.cma" archive(native) = "run_mikmatch_pcre.cmxa" mikmatch-1.0.7/pcre/Makefile000066400000000000000000000066421227060510600157140ustar00rootroot00000000000000PCRE_DIR = $(shell ocamlfind query pcre) # PCRE_DIR = something else ifndef BINDIR BINDIR = $(shell dirname `which ocaml`) endif SOURCES = \ mm_util.ml \ global_def.mli global_def.ml \ match.ml \ pcre_lib.ml \ syntax_common.ml \ syntax_pcre.ml RESULT = mikmatch_pcre OCAMLFLAGS = -dtypes OCAMLLDFLAGS = \ messages.cmo charset.cmo \ constants.cmo \ regexp_ast.cmo \ select_lib.cmo INCDIRS = ../common USE_CAMLP4 = yes COMMON_LIBINSTALL_FILES = \ pa_mikmatch_pcre.cma pa_mikmatch_pcre.cmo pa_mikmatch_pcre.cmi \ run_mikmatch_pcre.cmi mikmatch.cmi mikmatch.mli BC_LIBINSTALL_FILES = \ run_mikmatch_pcre.cma run_mikmatch_pcre.cmo NC_LIBINSTALL_FILES = \ run_mikmatch_pcre.cmxa run_mikmatch_pcre.cmx \ run_mikmatch_pcre.a run_mikmatch_pcre.o mikmatch.cmx ALL_LIBINSTALL_FILES = \ $(COMMON_LIBINSTALL_FILES) $(BC_LIBINSTALL_FILES) $(NC_LIBINSTALL_FILES) .PHONY: default all-bc all-nc all force links pa_lib install uninstall \ version topinstall misc-bc misc-nc default: version links pa_lib misc-bc misc-nc touch bytecode touch nativecode all-bc: links pa_lib misc-bc touch bytecode all-nc: links pa_lib misc-nc touch nativecode all: # needed by libinstall force: touch $(SOURCES) $(MAKE) links: mikmatch.mli mikmatch.ml match.ml syntax_common.ml mm_util.ml \ global_def.mli global_def.ml mikmatch.mli: ../common/mikmatch.mli ln -s $< $@ mikmatch.ml: ../common/mikmatch.ml ln -s $< $@ syntax_common.ml: ../common/syntax_common.ml ln -s $< $@ match.ml: ../common/match.ml ln -s $< $@ mm_util.ml: ../common/mm_util.ml ln -s $< $@ global_def.mli: ../common/global_def.mli ln -s $< $@ global_def.ml: ../common/global_def.ml ln -s $< $@ pa_lib: $(MAKE) RESULT=pa_mikmatch_pcre pabc bcl install: topinstall if test -f nativecode; \ then \ $(MAKE) "LIBINSTALL_FILES=$(ALL_LIBINSTALL_FILES)" libinstall;\ else \ $(MAKE) "LIBINSTALL_FILES=$(COMMON_LIBINSTALL_FILES) $(BC_LIBINSTALL_FILES)" libinstall; \ fi uninstall: $(MAKE) libuninstall version: sed -e "s/VERSION/$(VERSION)/" < META.template > META misc-bc: ocamlc -c mikmatch.mli ocamlc -a -o run_mikmatch_pcre.cma -I $(PCRE_DIR) \ mikmatch.ml run_mikmatch_pcre.ml ocamlmktop -o mikmatch_pcre.top \ -I +camlp4 -I $(PCRE_DIR) \ dynlink.cma \ camlp4o.cma \ pa_mikmatch_pcre.cma pcre.cma unix.cma run_mikmatch_pcre.cma misc-nc: ocamlc -c mikmatch.mli ocamlopt -a -o run_mikmatch_pcre.cmxa -I $(PCRE_DIR) \ mikmatch.ml run_mikmatch_pcre.ml .PHONY: test simple-test more-tests test-install test: simple-test more-tests # Toplevel (preinstall test) simple-test: camlp4o ./pa_mikmatch_pcre.cma -printer o test.ml > test.ppo ocamlmktop -o mikmatch_pcre.test -I +camlp4 -I . -I $(PCRE_DIR) \ dynlink.cma \ camlp4o.cma \ pa_mikmatch_pcre.cma pcre.cma unix.cma run_mikmatch_pcre.cma # Preinstall test more-tests: camlp4o ./pa_mikmatch_pcre.cma -printer o -direct test.ml > test.ppo ocamlopt \ -pp 'camlp4o ./pa_mikmatch_pcre.cma' \ -I $(PCRE_DIR) pcre.cmxa run_mikmatch_pcre.cmxa \ test.ml -o test ./test # Postinstall test test-install: ocamlfind ocamlopt \ -syntax camlp4o \ -package pcre,mikmatch_pcre\ -linkpkg \ test.ml -o test ./test # Debugging .PHONY: test2 test2: camlp4o ./pa_mikmatch_pcre.cma -printer o test2.ml > test2.ppo TRASH = \ *~ *.ppo *.cm[ioxa] *.cmxa *.o *.a *.top \ *.test test test.more mikmatch bytecode nativecode mikmatch_pcre.ml OCAMLMAKEFILE = ../OCamlMakefile include $(OCAMLMAKEFILE) mikmatch-1.0.7/pcre/pcre_lib.ml000066400000000000000000000375521227060510600163710ustar00rootroot00000000000000(*pp camlp4orf *) open Printf open Camlp4.PreCast open Mm_util open Regexp_ast let _ = Constants.mod_runtime := "Run_mikmatch_pcre"; Constants.mod_runtime_mt := "Run_mikmatch_pcre" (* output for PCRE: Perl Compatible Regular Expressions *) let special_regexps = let _loc = Constants.dummy_loc in [ "any", (* any character except newline *) Special (_loc, ".", ("any", Some 1)); "bol", (* beginning of line *) Special (_loc, "(?:^|(?<=\n))", ("bol", Some 0)); "eol", (* end of line *) Special (_loc, "(?:$|(?=\n))", ("eol", Some 0)); "bos", (* beginning of string *) Special (_loc, "^", ("bos", Some 0)); "eos", (* end of string *) Special (_loc, "$", ("eos", Some 0)); ] let string c = String.make 1 c let quote_char c = match c with '\\' | '^' | '$' | '.' | '[' | ']' | '|' | '(' | ')' | '?' | '*' | '+' | '{' | '}' -> let s = String.create 2 in s.[0] <- '\\'; s.[1] <- c; s | '\000' -> "\\000" | c -> string c let quote_char_in_class c = match c with '\\' -> "\\\\" | ']' -> "\\]" | '-' -> "\\-" | '^' -> "\\^" | '\000' -> "\\000" | c -> string c let reorder_charset l = if l = [] then invalid_arg "reorder_charset"; List.sort Char.compare l let compact l = let finish first last = match Char.code last - Char.code first with 0 -> quote_char_in_class first | 1 -> (quote_char_in_class first) ^ (quote_char_in_class last) | _ -> (quote_char_in_class first) ^ "-" ^ (quote_char_in_class last) in let rec extend first last = function [] -> [finish first last] | c :: rest -> if Char.code c = Char.code last + 1 then extend first c rest else finish first last :: extend c c rest in match l with [] -> [] | hd :: tl -> extend hd hd tl let compact_charset _loc l = let sorted = List.sort Char.compare l in let (zero, nozero) = match sorted with '\000' :: rest -> ("\\000", rest) | l -> ("", l) in String.concat "" (zero :: compact nozero) let rec rm_closed = function Closed ast -> rm_closed ast | ast -> ast let add_const accu s = accu := `String s :: !accu let add_var accu e = accu := `Var (e, false) :: !accu let add_var_nocase accu e = accu := `Var (e, true) :: !accu let rec to_string ?(top = false) ((last_group, named_groups) as groups) accu = function Epsilon _loc -> groups | Special (_loc, s, _) -> add_const accu s; groups | Characters (_loc, set) -> let l = Charset.list set in (match l with [] -> groups | [c] -> add_const accu (quote_char c); groups | _ -> add_const accu (sprintf "[%s]" (compact_charset _loc l)); groups) | Sequence (_loc, re1, re2) -> let groups = to_string groups accu re1 in to_string groups accu re2 | Alternative (_loc, re, Epsilon _, _, _) -> let must_group = not top && match rm_closed re with Characters _ | Special _ | Bind _ | Alternative _ -> false | _ -> true in if must_group then add_const accu "(?:"; let (last_group, named_groups) as groups = to_string (last_group, named_groups) accu re in if must_group then add_const accu ")"; add_const accu "?"; groups | Alternative (_loc, re1, re2, _, _) -> let must_group = not top in if must_group then add_const accu "(?:"; let (last_group, named_groups1) = to_string (last_group, named_groups) accu re1 in add_const accu "|"; let (last_group, named_groups2) = to_string (last_group, named_groups) accu re2 in if must_group then add_const accu ")"; let check_balance set1 set2 = if not (Named_groups.equal set1 set2) then (let missing = S.diff (Named_groups.union set1 set2) (Named_groups.inter set1 set2) in Messages.unbalanced_bindings _loc (list_named_groups missing)) in let (groups1, positions1) = named_groups1 and (groups2, positions2) = named_groups2 in check_balance groups1 groups2; check_balance positions1 positions2; (last_group, (merge groups1 groups2, merge positions1 positions2)) | Repetition (_loc, (kind, greedy), re) -> let must_group = not top && match rm_closed re with Characters _ | Special _ | Bind _ | Alternative _ | Possessive _ -> false | _ -> true in if must_group then add_const accu "(?:"; let groups = to_string (last_group, named_groups) accu re in if must_group then add_const accu ")"; let rec convert = function Star -> "*" | Plus -> "+" | Option -> "?" | Range (m, None) -> (match m with 1 -> "" | _ -> sprintf "{%i}" m) | Range (m, Some None) -> (match m with 0 -> "*" | 1 -> "+" | _ -> sprintf "{%i,}" m) | Range (m, Some (Some n)) -> if m = n then convert (Range (m, None)) else if m = 0 && n = 1 then "?" else sprintf "{%i,%i}" m n in let base_op = convert kind in add_const accu base_op; if not greedy && base_op <> "" then add_const accu "?"; groups | Possessive (_loc, re) -> add_const accu "(?>"; let groups = to_string groups accu re in add_const accu ")"; groups | Lookahead (_loc, positive, re) -> let start = if positive then "(?=" else "(?!" in add_const accu start; let groups = to_string groups accu re in add_const accu ")"; groups | Lookbehind (_loc, positive, re) -> let start = if positive then "(?<=" else "(? let last_group = succ last_group in let named_groups = add_new_group _loc name conv last_group named_groups in add_const accu "("; let groups = to_string (last_group, named_groups) accu re in add_const accu ")"; groups | Bind_pos (_loc, name) -> let last_group = succ last_group in let named_groups = add_new_pos _loc name last_group named_groups in add_const accu "()"; (last_group, named_groups) | Backref (_loc, name) -> (try match Named_groups.find name (fst named_groups) with [] -> Messages.invalid_backref _loc name | [(_, n, conv)] -> add_const accu (sprintf "\\%i" n); groups | l -> add_const accu (sprintf "(?:%s)" (String.concat "|" (List.map (fun (_, n, conv) -> sprintf "\\%d" n) l))); groups with Not_found -> Messages.invalid_backref _loc name) | Variable (_loc, e) -> add_var accu e; groups | Nocase_variable (_loc, e) -> add_var_nocase accu e; groups | Closed ast -> let saved_named_groups = named_groups in let (last_group, named_groups) = to_string groups accu ast in (last_group, saved_named_groups) (* Syntax expanders *) open Constants let nocasify e = let _loc = Ast.loc_of_expr e in <:expr< $uid: !mod_runtime$.nocase $e$ >> let make_get_re_noargs _loc re_name re_args = let empty = <:expr< "" >> in let empty_args = List.map (fun (name, arg) -> (name, empty)) re_args in Match.make_get_re _loc re_name empty_args let process_regexp _loc ~sharing re re_name = let accu = ref [] in let (last_group, named_groups) = to_string ~top:true (0, (Named_groups.empty, Named_groups.empty)) accu re in let re_args, re_source = Match.get_re_source ~quote_expr: <:expr< $uid: !mod_runtime$.quote_string >> ~nocasify accu in let shared_id = shared re_name in let get_re_noargs = make_get_re_noargs _loc re_name re_args in let postbindings = if sharing then [ shared_id, <:expr< Pcre.make_ovector $get_re_noargs$ >>; subgroups2 re_name, <:expr< fst $lid:shared_id$ >>; shared_ovector re_name, <:expr< snd $lid:shared_id$ >> ] else [] in (re_args, re_source, named_groups, postbindings) let raises_exn = function <:expr< raise $exn$ >> -> true | _ -> false let string_match _loc re_name get_re target substrings pos success failure = let match_it = <:expr< Pcre.exec ~rex:$get_re$ ~pos:$int:string_of_int pos$ $target$ >> in if raises_exn failure then (* shortcut *) <:expr< let $lid:substrings$ = try $match_it$ with [ Not_found -> $failure$ ] in $success$ >> else <:expr< try let $lid:substrings$ = try $match_it$ with [ Not_found -> $raise_exit _loc$ ] in $success$ with [ $patt_exit _loc$ -> $failure$ ] >> let matched_group _loc substrings n = <:expr< Pcre.get_substring $lid:substrings$ $int:string_of_int n$ >> let matched_position _loc substrings n = <:expr< Pcre.get_substring_ofs $lid:substrings$ $int:string_of_int n$ >> let compile_regexp_general ~anchored ~mt _loc re_args re_source = let default_flags = <:expr< [`DOLLAR_ENDONLY] >> in let anchored_flags = <:expr< [`ANCHORED; `DOLLAR_ENDONLY] >> in let flags = if anchored then anchored_flags else default_flags in let compile_string e = <:expr< Pcre.regexp ~flags:$flags$ $e$ >> in match re_args with [] -> let re_string = Match.compute_re_string _loc re_source in compile_string re_string | _ -> let key = match re_args with [name, _] -> <:expr< $id: <:ident< $lid:name$ >> $ >> | _ -> let expr_list = List.map ( fun (name, _) -> <:expr< $id: <:ident< $lid:name$ >> $ >> ) re_args in let tup = comma_expr_of_list _loc expr_list in <:expr< ( $tup: tup$ ) >> in let compile = let re_string = Match.compute_re_string _loc re_source in compile_string re_string in let find = Match.protect mt <:expr< $uid: !mod_runtime$.Mem.find tbl key >> in let add = Match.protect mt <:expr< $uid: !mod_runtime$.Mem.unsafe_add tbl key data >> in let check_cache = <:expr< let key = $key$ in try $find$ with [ Not_found -> let data = $compile$ in do { $add$; data } ] >> in let get_regexp = List.fold_right (fun (argname, _) e -> <:expr< fun $lid:argname$ -> $e$ >>) re_args check_cache in let result = <:expr< let tbl = $uid: !mod_runtime$.Mem.create 100 in $get_regexp$ >> in if mt then <:expr< let mutex = Mutex.create () in $result$ >> else result let compile_regexp_match = compile_regexp_general ~anchored:true let compile_regexp_search = compile_regexp_general ~anchored:false let convert _loc conv e = match conv with None -> e | Some f -> match f with `Int -> <:expr< Pervasives.int_of_string $e$ >> | `Float -> <:expr< Pervasives.float_of_string $e$ >> | `Option -> <:expr< let s = $e$ in if s = "" then None else Some s >> | `Custom f -> <:expr< $f$ $e$ >> | `Value e' -> <:expr< do { ignore $e$; $e'$ } >> let insert_bindings_poly ?(skip_empty_captures = false) (* for compatibility with old versions of Pcre (before 2004-04-29) *) ?(get_fst = false) make_expr _loc substrings set e = Named_groups.fold (fun name l e -> match l with [] -> assert false | (_loc, _, _) :: _ -> let find_it = List.fold_right (fun (_loc, n, conv) accu -> let expr = make_expr _loc substrings n in match accu with None -> Some (convert _loc conv expr) | Some e -> let result = if skip_empty_captures then <:expr< try match $expr$ with [ "" -> raise Not_found | s -> $convert _loc conv <:expr< s >>$ ] with [ Not_found -> $e$ ] >> else <:expr< try $convert _loc conv expr$ with [ Not_found -> $e$ ] >> in Some result) l None in let result = match find_it with None -> assert false | Some e -> e in let patt = if get_fst then <:patt< ( $lid:name$, _ ) >> else <:patt< $lid:name$ >> in <:expr< let $patt$ = $result$ in $e$ >>) set e let insert_group_bindings = insert_bindings_poly matched_group let insert_position_bindings = insert_bindings_poly ~get_fst:true matched_position let insert_bindings _loc substrings (group_bindings, position_bindings) e = insert_group_bindings _loc substrings group_bindings (insert_position_bindings _loc substrings position_bindings e) let substrings_of_target target = match target with <:expr< $lid:s$ >> -> s ^ "_result" | _ -> assert false let match_and_bind _loc re_name get_re target named_groups success failure = let substrings = substrings_of_target target in string_match _loc re_name get_re target substrings 0 (insert_bindings _loc substrings named_groups success) failure let macro_replace_generic f _loc (re_name : string) get_re target_name named_groups expr = let target = <:expr< $lid:target_name$ >> in let substrings = substrings_of_target target in <:expr< fun ?pos $lid:target_name$ -> Pcre.$lid: f$ ~rex:$get_re$ ?pos ~subst:(fun $lid:substrings$ -> $insert_bindings _loc substrings named_groups expr$) $target$ >> let macro_replace = macro_replace_generic "substitute_substrings" let macro_replace_first = macro_replace_generic "substitute_substrings_first" let macro_match ?(ignore_bindings = false) _loc re_name get_re target_name named_groups expr = let target = <:expr< $lid:target_name$ >> in let substrings = substrings_of_target target in let sv = shared_ovector re_name in let sg2 = subgroups2 re_name in let result = if ignore_bindings then expr else insert_bindings _loc substrings named_groups expr in <:expr< fun ?(share = False) ?pos $lid:target_name$ -> let $lid:substrings$ = if not share then Pcre.exec ~rex:$get_re$ ?pos $target$ else (Obj.magic ($target$, do { Pcre.unsafe_pcre_exec (Obj.magic 0 : Pcre.irflag) $get_re$ (match pos with [ None -> 0 | Some n -> n]) $target$ $lid:sg2$ $lid:sv$ None; $lid:sv$ }) : Pcre.substrings) in $result$ >> let macro_search_first = macro_match let macro_filter _loc re_name get_re target_name named_groups _ = let expr = <:expr< True >> in let e = macro_match ~ignore_bindings:true _loc re_name get_re target_name named_groups expr in <:expr< fun ?share ?pos x -> try $e$ ?share ?pos x with [ Not_found -> False ] >> let make_object _loc (group_bindings, position_bindings) = let all_ids = Named_groups.list_keys group_bindings @ Named_groups.list_keys position_bindings in let methods = List.fold_right (fun id accu -> <:class_str_item< method $id$ = $lid:id$ ; $accu$ >>) all_ids <:class_str_item< >>in <:expr< object $methods$ end >> let macro_capture _loc re_name get_re target_name named_groups _ = let expr = make_object _loc named_groups in let e = macro_match _loc re_name get_re target_name named_groups expr in <:expr< fun ?share ?pos x -> try Some ($e$ ?share ?pos x) with [ Not_found -> None ] >> let macro_function fun_name _loc (re_name : string) get_re target_name named_groups expr = let target = <:expr< $lid:target_name$ >> in let substrings = substrings_of_target target in <:expr< $uid: !mod_runtime$.$lid:fun_name$ $get_re$ (fun $lid:substrings$ -> $insert_bindings _loc substrings named_groups expr$) >> let macro_search = macro_function "search" let macro_map = macro_function "map" let macro_collect = macro_function "collect" let macro_collectobj _loc re_name get_re target_name named_groups _ = let e = make_object _loc named_groups in macro_function "collect" _loc re_name get_re target_name named_groups e let macro_split _loc re_name get_re target_name named_groups _ = <:expr< $uid: !mod_runtime$.split $get_re$ >> open Select_lib let lib = { predefined_regexps = special_regexps; unfold_range = false; process_regexp = process_regexp; compile_regexp_match = compile_regexp_match ~mt:false; compile_regexp_search = compile_regexp_search ~mt:false; match_and_bind = match_and_bind; wrap_match = (fun e -> e); wrap_user_case = (fun e -> e); really_wrap_match = false; really_wrap_user_case = false } let lib_mt = { lib with compile_regexp_match = compile_regexp_match ~mt:true; compile_regexp_search = compile_regexp_search ~mt:true } mikmatch-1.0.7/pcre/run_mikmatch_pcre.ml000066400000000000000000000110011227060510600202610ustar00rootroot00000000000000(* $Id$ *) exception Mikmatch_exit open Pcre let irflags = rflags [] external make_substrings : string * int array -> substrings = "%identity" let search rex f ?(pos = 0) subj = let subgroup_offsets, offset_vector = make_ovector rex in let substrings = make_substrings (subj, offset_vector) in let subj_len = String.length subj in let rec loop cur_pos = if try unsafe_pcre_exec irflags rex cur_pos subj subgroup_offsets offset_vector None; true with Not_found -> false then (f substrings; let first = offset_vector.(0) in let last = offset_vector.(1) in if first < subj_len then loop (max (first + 1) last)) in loop pos let scan ~full rex pos ~ftext ~fmatch subj = let subgroup_offsets, offset_vector = make_ovector rex in let substrings = make_substrings (subj, offset_vector) in let subj_len = String.length subj in let rec loop previous_last cur_pos = if try unsafe_pcre_exec irflags rex cur_pos subj subgroup_offsets offset_vector None; true with Not_found -> let last = String.length subj in if full || last > previous_last then ftext (String.sub subj previous_last (last - previous_last)); false then (let first = offset_vector.(0) in let last = offset_vector.(1) in if full || first > pos then ftext (String.sub subj previous_last (first - previous_last)); fmatch substrings; if first < subj_len then loop last (max (first + 1) last) else if full then ftext "") in loop pos pos let map rex f ?(pos = 0) ?(full = true) subj = let l = ref [] in let ftext s = l := `Text s :: !l and fmatch substrings = l := f substrings :: !l in scan ~full rex pos ~ftext ~fmatch subj; List.rev !l let collect rex f ?(pos = 0) subj = let l = ref [] in let f substrings = l := f substrings :: !l in search rex f ~pos subj; List.rev !l let split rex ?(full = false) ?(pos = 0) subj = let l = ref [] in let ftext s = l := s :: !l and fmatch substrings = () in scan ~full rex pos ~ftext ~fmatch subj; List.rev !l let bquote_char buf c = match c with '\\' | '^' | '$' | '.' | '[' | ']' | '|' | '(' | ')' | '?' | '*' | '+' | '{' | '}' -> Printf.bprintf buf "\\%c" c | '\000' -> Buffer.add_string buf "\\000" | c -> Buffer.add_char buf c (* Pcre.quote does not escape null characters (which terminate C strings) *) let quote_string s = let len = String.length s in let buf = Buffer.create (2 * len) in for i = 0 to len - 1 do bquote_char buf (String.unsafe_get s i) done; Buffer.contents buf let nocase s = let len = String.length s in let buf = Buffer.create (2 * len) in for i = 0 to len - 1 do let c = s.[i] in let cl = Char.lowercase c and cu = Char.uppercase c in if cl <> cu then (* in this case, cl and cu are letters *) Printf.bprintf buf "[%c%c]" cl cu else bquote_char buf c done; Buffer.contents buf module Mem = struct (* memoization table with periodic removal of old items *) type ('a, 'b) t = { mutable date : float; mutable last_cleanup : float; opt_size : int; max_size : int; tbl : ('a, ('b * float ref)) Hashtbl.t } let create n = if n < 1 then invalid_arg "Memo.create" else { date = 0.; last_cleanup = 0.; opt_size = n; max_size = n + n; tbl = Hashtbl.create (n + n) } (* removal of anything which is too old *) let cleanup t = let t0 = t.last_cleanup in t.last_cleanup <- t.date; let tbl = t.tbl in let trash = Hashtbl.fold (fun key (data, last_access) trash -> if !last_access < t0 then key :: trash else trash) tbl [] in List.iter (Hashtbl.remove tbl) trash (* unsafe addition of data (key should not be in the table) *) let unsafe_add t key data = let date = t.date +. 1. in t.date <- date; Hashtbl.add t.tbl key (data, ref date); let size = Hashtbl.length t.tbl in if size > t.max_size then cleanup t else if size = t.opt_size + 1 then t.last_cleanup <- t.date let add t key data = if Hashtbl.mem t.tbl key then invalid_arg "Memo.add" else unsafe_add t key data let find t key = let (data, last_access) = Hashtbl.find t.tbl key in let date = t.date in last_access := date; t.date <- date +. 1.; data let get t key lazy_data = try find t key with Not_found -> let data = Lazy.force lazy_data in unsafe_add t key data; data let clear t = Hashtbl.clear t.tbl; t.date <- 0.; t.last_cleanup <- 0. end mikmatch-1.0.7/pcre/syntax_pcre.ml000066400000000000000000000137231227060510600171430ustar00rootroot00000000000000(*pp camlp4orf *) open Printf open Camlp4.PreCast open Syntax open Regexp_ast open Syntax_common open Select_lib open Match let expand_macro ?(sharing = false) ?(anchored = false) _loc re e f = warnings re; let (num, re_name) = Constants.new_regexp () in let var_name = var_of_regexp re_name in let (re_args, re_source, named_groups, postbindings) = (!lib).process_regexp ~sharing _loc re re_name in let get_re = Match.make_get_re _loc re_name re_args in add_compiled_regexp ~anchored postbindings _loc re_name num re_args re_source named_groups; !(lib).wrap_match (f _loc re_name get_re var_name named_groups ((!lib).wrap_user_case e)) let check_assertion ~lookahead positive re = let rec check ~branched = function Bind (_loc, e, name, conv) -> if not positive then Messages.not_visible _loc [name] "negative assertion"; check ~branched e | Bind_pos (_loc, name) -> if not positive then Messages.not_visible _loc [name] "negative assertion"; 0 | Epsilon _ -> 0 | Characters _ -> 1 | Special (_loc, s, (name, Some len)) -> len | Special (_loc, s, (name, None)) -> if not lookahead then Messages.invalid_lookbehind _loc (sprintf "These patterns (%s)" name) "" else 0 | Backref (_loc, _) -> if not lookahead then Messages.invalid_lookbehind _loc "Backreferences" "" else 0 | Variable (_loc, _) | Nocase_variable (_loc, _) -> if not lookahead && branched then Messages.invalid_lookbehind _loc "Variables in optional branches" "" else 0 | Sequence (_loc, e1, e2) -> check ~branched e1 + check ~branched e2 | Alternative (_loc, e1, e2, _, _) -> let len1 = check ~branched:true e1 in let len2 = check ~branched:true e2 in if not lookahead && len1 <> len2 then Messages.invalid_lookbehind _loc "Alternatives of different length" "" else max len1 len2 | Repetition (_loc, (kind, greediness), e) -> (match kind with Range (x, None) -> x * check ~branched e | _ -> if not lookahead then Messages.invalid_lookbehind _loc "Repetitions of variable length" "" else check ~branched e) | Lookahead (_loc, _, e) | Lookbehind (_loc, _, e) -> check ~branched e | Possessive (_, e) | Closed e -> check ~branched e in ignore (check ~branched:false re) let lookahead _loc bopt re = let positive = bopt = None in check_assertion ~lookahead:true positive re; Lookahead (_loc, positive, if positive then re else Closed re) let lookbehind _loc bopt re = let positive = bopt = None in check_assertion ~lookahead:false positive re; Lookbehind (_loc, positive, if positive then re else Closed re) let seq _loc e = <:expr< do { $e$ } >> let extend_common () = let expr_level = "top" in EXTEND Gram expr: LEVEL $expr_level$ [ [ "RE_PCRE"; re = regexp -> warnings re; let (re_args, re_source, named_groups, postbindings) = Pcre_lib.lib.process_regexp _loc ~sharing:false re "" in let re_fragments = Match.get_re_fragments _loc re_source in <:expr< ( $re_fragments$, $pp_named_groups _loc named_groups$ ) >> | "REPLACE"; re = regexp; "->"; e = sequence -> expand_macro _loc re (seq _loc e) Pcre_lib.macro_replace | "SEARCH"; re = regexp; "->"; e = sequence -> expand_macro _loc re (seq _loc e) Pcre_lib.macro_search | "MAP"; re = regexp; "->"; e = sequence -> expand_macro _loc re (seq _loc e) Pcre_lib.macro_map | "COLLECT"; re = regexp; "->"; e = sequence -> expand_macro _loc re (seq _loc e) Pcre_lib.macro_collect | "COLLECTOBJ"; re = regexp -> expand_macro _loc re <:expr< assert false >> Pcre_lib.macro_collectobj | "SPLIT"; re = regexp -> expand_macro _loc re <:expr< assert false >> Pcre_lib.macro_split | "REPLACE_FIRST"; re = regexp; "->"; e = sequence -> expand_macro _loc re (seq _loc e) Pcre_lib.macro_replace_first | "SEARCH_FIRST"; re = regexp; "->"; e = sequence -> expand_macro ~sharing:true _loc re (seq _loc e) Pcre_lib.macro_search_first | "MATCH"; re = regexp; "->"; e = sequence -> expand_macro ~sharing:true ~anchored:true _loc re (seq _loc e) Pcre_lib.macro_match | "FILTER"; re = regexp -> expand_macro ~sharing:true ~anchored:true _loc re <:expr< assert false >> Pcre_lib.macro_filter | "CAPTURE"; re = regexp -> expand_macro ~sharing:true ~anchored:true _loc re <:expr< assert false >> Pcre_lib.macro_capture ] ]; regexp: LEVEL "postop" [ [ re = regexp; "*"; UIDENT "Lazy" -> Repetition (_loc, (Star, false), Closed re) | re = regexp; "+"; UIDENT "Lazy" -> Repetition (_loc, (Plus, false), Closed re) | re = regexp; "?"; UIDENT "Lazy" -> Repetition (_loc, (Option, false), Closed re) | r = regexp; "{"; (rng, rng_loc) = range; "}"; UIDENT "Lazy" -> Repetition (_loc, (Range rng, false), Closed r) | re = regexp; UIDENT "Possessive" -> Possessive (_loc, re) ] ]; regexp: LEVEL "simple" [ [ "_" -> Characters (_loc, Charset.full) | "<"; x = OPT [ b1 = OPT [ x = UIDENT "Not" -> x ]; re1 = regexp -> (b1, re1) ]; y = OPT [ "."; r2 = OPT [ b2 = OPT [ x = UIDENT "Not" -> x ]; re2 = regexp -> (b2, re2) ] -> r2 ]; ">" -> match x, y with None, None | None, Some None -> Epsilon _loc | None, Some (Some (b2, re2)) -> lookahead _loc b2 re2 | Some (b1, re1), None -> lookahead _loc b1 re1 | Some (b1, re1), Some None -> lookbehind _loc b1 re1 | Some (b1, re1), Some (Some (b2, re2)) -> Sequence (_loc, lookbehind _loc b1 re1, lookahead _loc b2 re2) ] ]; END;; let extend_regular () = extend_common () (* let extend_revised () = extend_common () *) let _ = select_lib Pcre_lib.lib; Camlp4.Options.add "-thread" (Arg.Unit ( fun () -> select_lib Pcre_lib.lib_mt; eprintf "Warning: -thread is deprecated.\n/%!" ) ) " Deprecated option that protects access to shared data with a mutex. \ Currently only patterns containing @ are concerned."; (* How to test if the current syntax is the regular or revised one? *) extend_regular () mikmatch-1.0.7/pcre/test.ml000066400000000000000000000336571227060510600155730ustar00rootroot00000000000000open Printf (* Definition of regular expressions for further use *) RE space = [' ' '\t' '\n' '\r'] RE not_space = _ # space RE digit = ['0'-'9'] RE letter = ['A'-'Z' '_' 'a'-'z'] RE word = letter+ (* (* Inclusion of file in the same syntax, e.g. a library of user-defined regular expressions. (known problem with error location) *) USE "my_regexps.ml" (* defines `word' and `digit' *) *) (* Extended pattern-matching in the following constructs: match ... with ... try ... with ... function ... *) (* Doesn't work. Don't know how to make it work. *) (* Testing the Camlp4 support for stream parsers *) let _ = match Stream.of_list [] with parser [< >] -> () let test expected a b c = printf "[case %i] " expected; flush stdout; (match a, b, c with None, (None | Some (RE space* )), None -> printf "case 1\n" | Some ({ contents = [| RE (word as x); _; y |]}), (Some ("test" as z | RE word space (word as z))), None -> printf "case 2: %S %S %S\n" x y z | _, _, Some (RE space* (['0'-'9']+ as n)) -> printf "case 3: %s\n" n | _ -> printf "case 4\n"); flush stdout let _ = printf "Tests (match ... with):\n"; flush stdout; test 1 None (Some " ") None; test 2 (Some (ref [| "alpha"; "beta"; "2 gamma" |])) (Some "Hello World!") None; test 3 None None (Some " 123 "); test 4 (Some (ref [| |])) (Some "") (Some "abc") let _ = match "" with (RE (("a" as a) | ("b" as a))) | a -> () let hello_who s = match s with RE _* ['h''H']"ello" ","? space* ((word | space)* word as someone) -> String.capitalize someone | _ -> "nobody" let _ = printf "Extraction of the recipient's name\n"; flush stdout; List.iter (fun s -> printf "Hello who: %S\n" s; printf " -> %S\n" (hello_who s); flush stdout) [ "Hello World!"; "First of all, hello everybody."; "*** hello world ***"; "Hello, Caml riders!" ] let _ = printf "Test (local and global bindings):\n"; flush stdout; match "" with (RE (word as x | space+ (word as x))* ) | _ -> printf "Passed.\n" let _ = printf "Test (repetition range + end of line):\n"; flush stdout; let f s = match s with RE '-'? digit{1-4} eol -> printf "%S has less than 5 digits.\n" s | RE '-'? digit{5+} eol -> printf "%S has at least 5 digits.\n" s | _ -> printf "%S is not a number.\n" s in List.iter f ["123"; "1234"; "12345"; "12345678"; "-1234"; "*1"; "1*"; "9\n*" ] let test f (expected, s) = let (success, result) = f s in let passed = expected = success in if passed then (printf "[OK] %s%s\n" s (match result with None -> "" | Some x -> sprintf " -> %s"x); flush stdout) else (print_endline (s ^ "Failed"); flush stdout; failwith s) let () = printf "Test (no case: the ~ operator):\n"; flush stdout; List.iter (test (function RE "hello"~ " World!" -> true, None | _ -> false, None)) [ true, "Hello World!"; true, "hElLO World!"; false, "hello WORLD!" ] let () = printf "Test (try ... with):\n"; flush stdout; try failwith "Hello World!" with Failure RE "Hello" space* (word as w) -> printf "OK: %s\n" w | Failure s -> printf "Failure: %s\n" s let () = printf "Test (function ... -> ...):\n"; flush stdout; let f = function RE "Hello" space* (word as w) -> printf "OK: %s\n" w | _ -> printf "Error\n" in f "Hello Everybody"; f "Hello Caml!" let () = printf "Test (backreferences):\n"; flush stdout; let f s = match s with RE (digit+ as x | (word as x)) (* x = global id *) (" " as sp !sp)* (* sp = local id *) !x -> true, Some x | _ -> false, None in List.iter (test f) [ true, "123123"; false, "123 123"; true, "123 123"; true, "aaaa"; false, "abc"; false, "ab1ab1" ] let print_named_groups l = List.iter (fun (name, positions) -> printf "%s:" name; List.iter (fun i -> printf " %i" i) positions; printf "\n") l (* Lower level feature: RE_PCRE returns the source of the regexp, to be used with specific compilation or search options. *) let _ = let (src, named_groups) = RE_PCRE (("a"|"b"+)? digit{2}) as start (space* word)+ ( digit{1} (word as last_word) | digit{1} Lazy (word as last_word) | digit{3} (word as last_word)) in printf "Regexp source: %S\n" src; printf "Named groups and their locations:\n"; print_named_groups named_groups; flush stdout let charset_union = RE_PCRE digit | space | "a" | ['A'-'Z'] (* Laziness *) let _ = printf "Laziness and backreferences:\n"; flush stdout; let f = function RE _* Lazy "<" (_* Lazy as tag) ">" (_* Lazy as contents) "" -> sprintf " -> (%S, %S)" tag contents | s -> "" in List.iter (fun s -> printf "%S%s\n" s (f s); flush stdout) [ "hello"; ""; "text" ] let _ = printf "Possessiveness + backreferences + laziness Take the first word, find its next occurence and return the text in the middle:\n"; flush stdout; let f = function RE letter* Possessive as x (_* Lazy as y) !x -> sprintf " -> %S" y | _ -> "" in List.iter (fun s -> printf "%S%s\n" s (f s); flush stdout) [ "abc,ab,abc,abc" ] (* Macros *) let swap = REPLACE "(" space* (word as x) space* "," space* (word as y) space* ")" -> "(" ^ y ^ "," ^ x ^ ")" let swap = REPLACE_FIRST "(" space* (word as x) space* "," space* (word as y) space* ")" -> "(" ^ y ^ "," ^ x ^ ")" let _ = let test s = let s1 = swap s in let s2 = swap s1 in printf "swap 0: %s\nswap 1: %s\nswap 2: %s\n" s s1 s2 in test "tuples: (x,y)/(1, 2)/(martin, jambon)" RE host = (['.' '-'] | letter | digit)+ let hide_email = flush stdout; REPLACE "@" (host as host) -> "@" ^ (String.make (String.length host) '.') let _ = let test s = printf "\ before: %s after : %s " s (hide_email s) in test "this is a list of email addresses: joe@sixpack.com, martin@home" let _ = let i = ref 0 in let f = SEARCH "a" -> incr i in f "aaa"; printf "This should be 3: %i\n" !i; flush stdout let _ = let f = MAP ['a'-'z']+ as w -> `Word (String.capitalize w) in f "hello world!" let _ = List.iter print_endline ((SPLIT ",") "a,b,c") let _ = let l = List.filter (fun s -> (FILTER _* ".ml" eos) s) (Array.to_list (Sys.readdir ".")) in printf "*.ml: %s\n%!" (String.concat " " l) (* Sharing the subgroups array *) let _ = let f ?share s = try (MATCH "+" (print* as x) -> print_endline (" Found " ^ x)) ?share s with Not_found -> print_endline " Not found" in let g ?share () = print_endline "2 found:"; flush stdout; List.iter (f ?share) [ "+a"; "b"; "+blop" ] in g ~share:true (); g ~share:false () (* Positions *) let _ = match "a1234yz", 333 with RE "a" as s ("bc" %pos | digit+ %pos), _ | (s, pos) -> printf "a = %s, 5 = %i\n%!" s pos; assert (s = "a" && pos = 5) let _ = let find = SEARCH_FIRST "(" %pos1 (_* Lazy as x) %pos2 ")" -> printf "%s %i-%i\n%!" x pos1 pos2; assert (pos1 = 11 && pos2 = 17 && x = "result") in find "0123456789(result)..." (* No real tuple (maybe not fixed yet) *) let _ = match "abc", "def" with (RE _, RE _) | "abc", _ -> ignore `Case1 | _ -> ignore `Case2 (* Assertions *) let _ = let search = SEARCH alpha{2}.> digit+ Possessive as m < %pos _ as x . Not (alpha | "_"+ !m)+ > as n -> printf "num = %s; pos = %i; x = %s\n%!" n pos x in search "abc1 23 456xyz 7. x8 33_33Y 34_35Y 36__36Y" let print_triplets_of_letters = SEARCH -> print_endline x let _ = print_triplets_of_letters "Hello World!" let _ = List.iter print_endline ((SPLIT "") "abc");; RE test_warnings = ("a" as plus_warning)+ let _ = match "" with RE ("a" as star_warning)* test_warnings -> () | _ -> () (* Converters *) let _ = let f s = printf "got it!\n"; int_of_string s in let n = match "(123)", 456 with (RE "(" (digit+ as n := f) ")"), _ | (RE _{1-3} as n = -1), _ | (RE digit+ as n : int), _ | _, n -> n in assert (n = 123); printf "123=%i\n%!" n (* debugging *) let _ = match "a" with (RE "b") | (RE "c") | (RE "a") -> () | _ -> failwith "test failed" let () = () in ();; (* General syntax for local exception handling (let try ... in ... with ...) *) let _ = try (let try x = () and z = () in raise Exit with Exit -> assert false) with Exit -> print_endline "OK for local exception handling (let-try-in-with)" ;; let RE (_* as x) = "hello" in assert ("" <> "hello");; (* Shortcut syntax *) let _ = let RE (alpha+ as x) = "abc def" in assert (x = "abc"); assert (x = "abc"); print_endline "shortcut is OK" (* Shortcut syntax with local exception handling *) let _ = try (let try /"xy" as x/ = "xyz" in ignore x; raise Exit with Exit -> assert false) with Exit -> print_endline "OK for local exception handling (RE)" let /alpha+ space+ (alpha+ as x)/ = "xyz abc " in assert (x = "abc");; (* Similar tests for str_item's let in *) let try /alpha+ space+ (alpha+ as x)/ = "" in assert false with Match_failure _ -> print_endline "OK for str_item let-try-in-with";; (* Global value bindings *) let /[digit "."]* as version/ = Sys.ocaml_version RE int = digit+ let /(int as major : int) "." (int as minor : int) ("." (int as patchlevel : int) | ("" as patchlevel = 0)) ("+" (_* as additional_info) | ("" as additional_info))/ = Sys.ocaml_version let _ = printf "OCaml version: major=%i minor=%i patchlevel=%i additional-info=%S\n%!" major minor patchlevel additional_info (* Parametrized regexps *) let _ = let find s = match "abcabcdefggghijkl" with RE _* Lazy ( @s+ as x) -> x | _ -> assert false in assert (find "abc" = "abcabc"); assert (find "g" = "ggg") let _ = let find_not_after x y = COLLECT < Not ( @x ) . > ":" @y "=" (alnum* as result) -> result in let text = "a:b=, xy:z=1, x:z=25, _:z=99" in assert (find_not_after "x" "z" text = ["1"; "99"]); assert (find_not_after "" "z" text = []); assert (find_not_after "a" "b" text = []); assert (find_not_after "a" "" text = []); assert (find_not_after "?" "b" text = [""]) let _ = let find_not_between ~before ~after ~label = COLLECT < Not < ( @before ) . > @label "=" alnum* @after > @label "=" (alnum* as result) -> result in let text = "(field=12) (field=OK, field=) (field=yes" in assert (find_not_between ~before:"(" ~after:")" ~label:"field" text = ["OK"; ""; "yes"]) let _ = let field key = printf "Case-insensitive search for field %S:\n%!" key; SEARCH @key~ "=" (alnum* as data) -> printf " %s=%s\n%!" key data in let text = "hello name=Martin, AGE=27, Name=Jambon" in printf "Text: %S\n" text; field "name" text; field "age" text (* Null character *) let _ = match "" with / "\000abc" / -> assert false | _ -> () let _ = let zero_abc = "\000abc" in match "" with / @zero_abc / -> assert false | _ -> () let ( % ) = (+) let view T = fun x -> true let view Lazy = fun x -> try Some (Lazy.force x) with _ -> None let _ = match "a", lazy (1+1), lazy (3, lazy 4) with %T, %Lazy (2 as x), %Lazy (y, %Lazy z) -> assert (x = 2); assert (y = 3); assert (z = 4); printf "Passed view test 1\n%!" | _ -> assert false type 'a lazy_list = Empty | Cons of ('a * 'a lazy_list lazy_t) let view Empty = fun l -> try Lazy.force l = Empty with _ -> false let view Cons = fun l -> try match Lazy.force l with Cons x -> Some x | Empty -> None with _ -> None let _ = let l = lazy (Cons (1, lazy (Cons (2, lazy Empty)))) in match l with %Empty | %Cons (_, %Empty) -> assert false | %Cons (x1, %Cons (x2, %Empty)) -> assert (x1 = 1); assert (x2 = 2); printf "Passed view test 2\n%!" | _ -> assert false let _ = let view XY = fun o -> Some (o#x, o#y) in let view S = fun o -> Some o#s in let o = (object method x = 0 method y = 1 method s = "abc" end) in match o with %XY (1, _) -> assert false | %S / "A" / -> assert false | %S ( / upper as c / | / lower as c / as s) -> assert (c = "a"); assert (s = "abc"); printf "Passed view test 3\n%!" module Test = struct let view Even = fun x -> x land 1 = 0 end let _ = match 0 with %Test.Even -> printf "Passed view test 4\n%!" | _ -> assert false let _ = let f = COLLECTOBJ (letter+ as x) (digit+ as y : int) in match List.map (fun x -> (x#x, x#y)) (f "ab12ER5") with [ ("ab", 12); ("ER", 5) ] -> printf "Passed COLLECTOBJ test\n%!" | _ -> assert false let _ = match (CAPTURE (letter+ as x) (digit+ as y : int)) "ab12ER5" with Some o -> assert (o#x = "ab" && o#y = 12); printf "Passed CAPTURE test\n%!" | None -> assert false let _ = match (SPLIT "x" ) "axbxc" with [ "a"; "b"; "c" ] -> printf "Passed basic SPLIT test\n%!" | _ -> assert false let _ = match (SPLIT < "x" > ) "axbxc" with [ "a"; "xb"; "xc" ] -> printf "Passed zero-length SPLIT test (bug in versions <= 1.0.1)\n%!" | _ -> assert false let () = try match "a" with / "a" ("" as x) | ("b" as x) / -> ignore x; printf "Passed zero-length capture in alternative \ (bug in versions <= 1.0.3)\n%!" | _ -> assert false with Not_found -> assert false let () = match "1" with / ("0" as x = 0) | ("1" as x := int_of_string) / -> if x = 1 then printf "Passed alt/= test (bug in versions <= 1.0.4)\n%!" else assert false | _ -> assert false type test = { x : int; y : int } let f x = match x with | {x = 1; _} -> () | _ -> () let g x = match x with | {x} -> () | _ -> () mikmatch-1.0.7/pcre/test2.ml000066400000000000000000000001641227060510600156400ustar00rootroot00000000000000open Printf ignore (FILTER "a") open Set module M = struct open Map ignore (FILTER "b") open Hashtbl end mikmatch-1.0.7/str/000077500000000000000000000000001227060510600141235ustar00rootroot00000000000000mikmatch-1.0.7/str/META.template000066400000000000000000000007121227060510600164060ustar00rootroot00000000000000name = "mikmatch_str" version = "VERSION" description = "Pattern matching extended with regexps in Ocamllex syntax" requires = "camlp4 pcre unix" requires(toploop) += "tophide" archive(syntax,toploop) = "pa_mikmatch_str.cma run_mikmatch_str.cma" archive(syntax,create_toploop) = "pa_mikmatch_str.cma run_mikmatch_str.cma" archive(syntax,preprocessor) = "pa_mikmatch_str.cma" archive(byte) = "run_mikmatch_str.cma" archive(native) = "run_mikmatch_str.cmxa" mikmatch-1.0.7/str/Makefile000066400000000000000000000047711227060510600155740ustar00rootroot00000000000000ifndef BINDIR BINDIR = $(shell dirname `which ocaml`) endif SOURCES = \ mm_util.ml \ global_def.mli global_def.ml \ match.ml \ str_lib.ml \ syntax_common.ml \ syntax_str.ml RESULT = mikmatch_str OCAMLFLAGS = -dtypes OCAMLLDFLAGS = \ messages.cmo charset.cmo \ constants.cmo \ regexp_ast.cmo \ select_lib.cmo INCDIRS = ../common USE_CAMLP4 = yes LIBINSTALL_FILES := \ pa_mikmatch_str.cma pa_mikmatch_str.cmo pa_mikmatch_str.cmi \ run_mikmatch_str.cma run_mikmatch_str.cmo run_mikmatch_str.cmi \ mikmatch.cmi mikmatch.cmo mikmatch.cmx mikmatch.mli \ run_mikmatch_str.cmxa run_mikmatch_str.cmx \ run_mikmatch_str.a run_mikmatch_str.o .PHONY: default force all links default: version links pa_lib misc force: touch $(SOURCES) $(MAKE) all: links: mikmatch.mli mikmatch.ml match.ml syntax_common.ml mm_util.ml \ global_def.mli global_def.ml mikmatch.mli: ../common/mikmatch.mli ln -s $< $@ mikmatch.ml: ../common/mikmatch.ml ln -s $< $@ syntax_common.ml: ../common/syntax_common.ml ln -s $< $@ match.ml: ../common/match.ml ln -s $< $@ mm_util.ml: ../common/mm_util.ml ln -s $< $@ global_def.mli: ../common/global_def.mli ln -s $< $@ global_def.ml: ../common/global_def.ml ln -s $< $@ .PHONY: pa_lib install uninstall version topinstall misc pa_lib: $(MAKE) RESULT=pa_mikmatch_str pabc bcl #install: libinstall topinstall install: libinstall uninstall: libuninstall rm -f $(BINDIR)/mikmatch_str.top $(BINDIR)/mikmatch_str version: sed -e "s/VERSION/$(VERSION)/" < META.template > META topinstall: install -m 0755 mikmatch_str.top mikmatch_str $(BINDIR) misc: ocamlc -c mikmatch.mli ocamlc -a -o run_mikmatch_str.cma \ mikmatch.ml run_mikmatch_str.ml ocamlopt -a -o run_mikmatch_str.cmxa \ mikmatch.ml run_mikmatch_str.ml .PHONY: test1 test-install test: test1 # (preinstall test) test1: camlp4o ./pa_mikmatch_str.cma -printer o test1.ml > test1.ppo ocamlopt -pp 'camlp4o ./pa_mikmatch_str.cma' \ str.cmxa unix.cmxa \ run_mikmatch_str.cmxa \ test1.ml -o test1 ./test1 # ocamlmktop -o mikmatch_str.test -I +camlp4 -I . camlp4o.cma \ # pa_mikmatch_str.cma str.cma unix.cma run_mikmatch_str.cma # ./mikmatch_str.test test1.ml # Compilation with ocamlfind (postinstall test) test-install: ocamlfind ocamlopt \ -syntax camlp4o \ -package mikmatch_str\ -linkpkg \ test1.ml -o test1_inst ./test1_inst TRASH = \ *~ *.ppo *.cm[ioxa] *.cmxa *.o *.a *.top \ *.test test1 test1.more mikmatch_str mikmatch_str.ml OCAMLMAKEFILE = ../OCamlMakefile include $(OCAMLMAKEFILE) mikmatch-1.0.7/str/run_mikmatch_str.ml000066400000000000000000000040211227060510600200230ustar00rootroot00000000000000exception Mikmatch_exit let nocase s = let len = String.length s in let buf = Buffer.create (4 * len) in for i = 0 to len - 1 do let c = s.[i] in let cl = Char.lowercase c and cu = Char.uppercase c in if cl <> cu then (* in this case, cl and cu are letters *) Printf.bprintf buf "[%c%c]" cl cu else Buffer.add_string buf (Str.quote (String.make 1 c)) done; Buffer.contents buf module Mem = struct (* memoization table with periodic removal of old items *) type ('a, 'b) t = { mutable date : float; mutable last_cleanup : float; opt_size : int; max_size : int; tbl : ('a, ('b * float ref)) Hashtbl.t } let create n = if n < 1 then invalid_arg "Memo.create" else { date = 0.; last_cleanup = 0.; opt_size = n; max_size = n + n; tbl = Hashtbl.create (n + n) } (* removal of anything which is too old *) let cleanup t = let t0 = t.last_cleanup in t.last_cleanup <- t.date; let tbl = t.tbl in let trash = Hashtbl.fold (fun key (data, last_access) trash -> if !last_access < t0 then key :: trash else trash) tbl [] in List.iter (Hashtbl.remove tbl) trash (* unsafe addition of data (key should not be in the table) *) let unsafe_add t key data = let date = t.date +. 1. in t.date <- date; Hashtbl.add t.tbl key (data, ref date); let size = Hashtbl.length t.tbl in if size > t.max_size then cleanup t else if size = t.opt_size + 1 then t.last_cleanup <- t.date let add t key data = if Hashtbl.mem t.tbl key then invalid_arg "Memo.add" else unsafe_add t key data let find t key = let (data, last_access) = Hashtbl.find t.tbl key in let date = t.date in last_access := date; t.date <- date +. 1.; data let get t key lazy_data = try find t key with Not_found -> let data = Lazy.force lazy_data in unsafe_add t key data; data let clear t = Hashtbl.clear t.tbl; t.date <- 0.; t.last_cleanup <- 0. end mikmatch-1.0.7/str/str_lib.ml000066400000000000000000000300651227060510600161170ustar00rootroot00000000000000(*pp camlp4orf *) (* $Id$ *) open Camlp4.PreCast open Mm_util open Constants let _ = mod_runtime := "Run_mikmatch_str"; mod_runtime_mt := "Run_mikmatch_str_mt" let str_mutex = "str_mutex" (* Emacs/Str syntax for regular expressions *) open Printf open Regexp_ast let special_regexps = let _loc = Constants.dummy_loc in [ "bol", Special (_loc, "^", ("bol", Some 0)); (* beginning of line *) "eol", Special (_loc, "$", ("eol", Some 0)); (* end of line *) "bnd", Special (_loc, "\\b", ("bnd", Some 0)); (* word boundary *) "any", Special (_loc, ".", ("any", Some 1)); (* any character except newline *) ] (* Note that the usual regexp special characters are not special inside a character set. A completely different set of special characters exists inside character sets: `]', `-' and `^'. To include a `]' in a character set, you must make it the first character. For example, `[]a]' matches `]' or `a'. To include a `-', write `-' as the first or last character of the set, or put it after a range. Thus, `[]-]' matches both `]' and `-'. To include `^', make it other than the first character in the set. *) let string c = String.make 1 c let quote_char = function '[' | ']' | '*' | '.' | '\\' | '?' | '+' | '^' | '$' as c -> let s = String.create 2 in s.[0] <- '\\'; s.[1] <- c; s | c -> string c let reorder_charset l = if l = [] then invalid_arg "reorder_charset"; List.sort (fun c1 c2 -> if c1 = c2 then invalid_arg "reorder_charset: repeated char" else if c1 = ']' then -1 else if c2 = ']' then 1 else if c1 = '-' then 1 else if c2 = '-' then -1 else if c1 = '^' then 1 else if c2 = '^' then -1 else Char.compare c1 c2) l let compact l = let finish first last = match Char.code last - Char.code first with 0 -> string first | 1 -> string first ^ string last | _ -> string first ^ "-" ^ string last in let rec extend first last = function [] -> [finish first last] | c :: rest -> if Char.code c = Char.code last + 1 then extend first c rest else finish first last :: extend c c rest in match l with [] -> [] | hd :: tl -> extend hd hd tl let compact_charset l = let rbracket = ref false and dash = ref false and caret = ref false in let normal = List.filter (function ']' -> rbracket := true; false | '-' -> dash := true; false | '^' -> caret := true; false | _ -> true) l in let sorted = List.sort Char.compare normal in let special_tail = let tail = if !dash then ["-"] else [] in if !caret then "^" :: tail else tail in let tail = compact sorted @ special_tail in if !rbracket then "]" :: tail else tail let add_const accu s = accu := `String s :: !accu let add_var accu e = accu := `Var (e, false) :: !accu let add_var_nocase accu e = accu := `Var (e, true) :: !accu let rec rm_closed = function Closed ast -> rm_closed ast | ast -> ast let rec to_string ?(top = false) ((last_group, named_groups) as groups) accu = function Epsilon _loc -> groups | Special (_loc, s, _) -> add_const accu s; groups | Characters (_loc, set) -> let l = Charset.list set in (match l with [] -> groups | [c] -> add_const accu (quote_char c); groups | _ -> add_const accu "["; List.iter (add_const accu) (compact_charset l); add_const accu "]"; groups) | Sequence (_loc, re1, re2) -> let groups = to_string groups accu re1 in to_string groups accu re2 | Alternative (_loc, re, Epsilon _, _, _) -> let must_group = not top && match rm_closed re with Characters _ | Special _ | Bind _ | Alternative _ -> false | _ -> true in let last_group = if must_group then succ last_group else last_group in if must_group then add_const accu "\\("; let (last_group, named_groups) as groups = to_string (last_group, named_groups) accu re in if must_group then add_const accu "\\)"; add_const accu "?"; groups | Alternative (_loc, re1, re2, _, _) -> let must_group = not top in let last_group = if must_group then succ last_group else last_group in if must_group then add_const accu "\\("; let (last_group, named_groups1) = to_string (last_group, named_groups) accu re1 in add_const accu "\\|"; let (last_group, named_groups2) = to_string (last_group, named_groups) accu re2 in if must_group then add_const accu "\\)"; let check_balance set1 set2 = if not (Named_groups.equal set1 set2) then (let missing = S.diff (Named_groups.union set1 set2) (Named_groups.inter set1 set2) in Messages.unbalanced_bindings _loc (list_named_groups missing)) in let (groups1, positions1) = named_groups1 and (groups2, positions2) = named_groups2 in check_balance groups1 groups2; check_balance positions1 positions2; (last_group, (merge groups1 groups2, merge positions1 positions2)) | Repetition (_loc, (Star, true), (Repetition (_, (Star, true), _) as re)) -> to_string ~top groups accu re | Repetition (_loc, kind, re) -> let must_group = not top && match rm_closed re with Characters _ | Special _ | Bind _ | Alternative _ -> false | _ -> true in let last_group = if must_group then (add_const accu "\\("; succ last_group) else last_group in let groups = to_string (last_group, named_groups) accu re in if must_group then add_const accu "\\)"; let op = match kind with (Star, true) -> "*" | (Plus, true) -> "+" | (Option, true) -> "?" | _ -> assert false in add_const accu op; groups | Bind (_loc, re, name, conv) -> let last_group = succ last_group in let named_groups = add_new_group _loc name conv last_group named_groups in add_const accu "\\("; let groups = to_string (last_group, named_groups) accu re in add_const accu "\\)"; groups | Bind_pos (_loc, name) -> let last_group = succ last_group in let named_groups = add_new_pos _loc name last_group named_groups in add_const accu "\\(\\)"; (last_group, named_groups) | Backref (_loc, name) -> (try match Named_groups.find name (fst named_groups) with [] -> Messages.invalid_backref _loc name | [(_, n, conv)] -> add_const accu (sprintf "\\%i" n); groups | l -> add_const accu (sprintf "\\(%s\\)" (String.concat "\\|" (List.map (fun (_, n, conv) -> sprintf "\\%i" n) l))); (succ last_group, named_groups) with Not_found -> Messages.invalid_backref _loc name) | Variable (_loc, e) -> add_var accu e; groups | Nocase_variable (_loc, e) -> add_var_nocase accu e; groups | Closed ast -> let saved_named_groups = named_groups in let (last_group, named_groups) = to_string groups accu ast in (last_group, saved_named_groups) | Possessive _ -> assert false | Lookahead _ -> assert false | Lookbehind _ -> assert false let nocasify e = let _loc = Ast.loc_of_expr e in <:expr< $uid: !mod_runtime$.nocase $e$ >> let process_regexp _loc ~sharing re re_name = let accu = ref [] in let (last_group, named_groups) = to_string ~top:true (0, (Named_groups.empty, Named_groups.empty)) accu re in let re_args, re_source = Match.get_re_source ~quote_expr: <:expr< Str.quote >> ~nocasify accu in (re_args, re_source, named_groups, []) (* Syntax expanders *) open Constants let expr_mutex _loc = <:expr< $uid: !mod_runtime_mt$.$lid:str_mutex$ >> let unlock _loc = <:expr< Mutex.unlock $expr_mutex _loc$ >> let lock _loc = <:expr< Mutex.lock $expr_mutex _loc$ >> let lock_unlock e = let _loc = Ast.loc_of_expr e in <:expr< do { $lock _loc$; try let x = $e$ in do { $unlock _loc$; x } with [ exn -> do { $unlock _loc$; raise exn } ] } >> let unlock_lock e = let _loc = Ast.loc_of_expr e in <:expr< do { $unlock _loc$; try let x = $e$ in do { $lock _loc$; x } with [ exn -> do { $lock _loc$; raise exn } ] } >> let string_match _loc re_name get_re target pos = <:expr< Str.string_match $get_re$ $target$ $int:string_of_int pos$ >> let matched_group _loc n target = <:expr< Str.matched_group $int:string_of_int n$ $target$ >> let matched_position _loc n target = <:expr< Str.group_beginning $int:string_of_int n$ >> let compile_regexp ~mt _loc re_args re_source = let compile_string e = <:expr< Str.regexp $e$ >> in match re_args with [] -> let re_string = Match.compute_re_string _loc re_source in compile_string re_string | _ -> let key = match re_args with [name, _] -> <:expr< $lid:name$ >> | _ -> let expr_list = List.map (fun (name, _) -> <:expr< $lid:name$ >>) re_args in let tup = comma_expr_of_list _loc expr_list in <:expr< ( $tup: tup$ ) >> in let compile = let re_string = Match.compute_re_string _loc re_source in compile_string re_string in let find = Match.protect mt <:expr< $uid: !mod_runtime$.Mem.find tbl key >> in let add = Match.protect mt <:expr< $uid: !mod_runtime$.Mem.unsafe_add tbl key data >> in let check_cache = <:expr< let key = $key$ in try $find$ with [ Not_found -> let data = $compile$ in do { $add$; data } ] >> in let get_regexp = List.fold_right (fun (argname, _) e -> <:expr< fun $lid:argname$ -> $e$ >>) re_args check_cache in let result = <:expr< let tbl = $uid: !mod_runtime$.Mem.create 100 in $get_regexp$ >> in if mt then <:expr< let mutex = Mutex.create () in $result$ >> else result let convert _loc conv e = match conv with None -> e | Some f -> match f with `Int -> <:expr< Pervasives.int_of_string $e$ >> | `Float -> <:expr< Pervasives.float_of_string $e$ >> | `Option -> <:expr< let s = $e$ in if s = "" then None else Some s >> | `Custom f -> <:expr< $f$ $e$ >> | `Value e' -> <:expr< do { ignore $e$; $e'$ } >> let insert_bindings_poly make_expr _loc target set e = Named_groups.fold (fun name l e -> match l with [] -> assert false | (_loc, _, _) :: _ -> let find_it = List.fold_right (fun (_loc, n, conv) accu -> let expr = convert _loc conv (make_expr _loc n target) in match accu with None -> Some expr | Some e -> Some <:expr< try $expr$ with [ Not_found -> $e$ ] >>) l None in let result = match find_it with None -> assert false | Some e -> e in <:expr< let $lid:name$ = $result$ in $e$ >>) set e let insert_group_bindings = insert_bindings_poly matched_group let insert_position_bindings = insert_bindings_poly matched_position let insert_bindings _loc target (group_bindings, position_bindings) e = insert_group_bindings _loc target group_bindings (insert_position_bindings _loc target position_bindings e) let match_and_bind _loc re_name get_re target named_groups success failure = <:expr< if $string_match _loc re_name get_re target 0$ then $insert_bindings _loc target named_groups success$ else $failure$ >> let macro_replace _loc re_name target_name named_groups expr = let target = <:expr< $lid:target_name$ >> in <:expr< fun $lid:target_name$ -> Str.global_substitute $lid:re_name$ (fun _ -> $insert_bindings _loc target named_groups expr$) $target$ >> open Select_lib let lib = { predefined_regexps = special_regexps; unfold_range = true; process_regexp = process_regexp; compile_regexp_match = compile_regexp ~mt:false; compile_regexp_search = compile_regexp ~mt:false; match_and_bind = match_and_bind; wrap_match = (fun e -> e); wrap_user_case = (fun e -> e); really_wrap_match = false; really_wrap_user_case = false } let lib_mt = { predefined_regexps = special_regexps; unfold_range = true; process_regexp = process_regexp; compile_regexp_match = compile_regexp ~mt:true; compile_regexp_search = compile_regexp ~mt:true; match_and_bind = match_and_bind; wrap_match = lock_unlock; wrap_user_case = unlock_lock; really_wrap_match = true; really_wrap_user_case = true } mikmatch-1.0.7/str/syntax_str.ml000066400000000000000000000021761227060510600167010ustar00rootroot00000000000000(*pp camlp4orf *) (* $Id$ *) open Printf open Camlp4.PreCast open Syntax open Syntax_common open Select_lib open Match let extend_common () = EXTEND Gram expr: [ [ "RE_STR"; re = regexp -> Regexp_ast.warnings re; let (re_args, re_source, named_groups, postbindings) = Str_lib.lib.process_regexp _loc ~sharing:true re "" in let re_fragments = Match.get_re_fragments _loc re_source in <:expr< ( $re_fragments$, $pp_named_groups _loc named_groups$ ) >> ] ]; Syntax_common.regexp: LEVEL "simple" [ [ "_" -> Regexp_ast.Characters (_loc, Charset.full) ] ]; END;; let extend_regular () = extend_common () (* let extend_revised () = extend_common () *) let _ = select_lib Str_lib.lib; (* Keeping it for backwards compatibility *) Camlp4.Options.add "-thread" (Arg.Unit ( fun () -> select_lib Str_lib.lib_mt; eprintf "Warning: -thread is deprecated.\n/%!" )) " Deprecated option that protects access to shared data with a mutex. \ Currently only patterns containing @ are concerned."; (* How to test if the current syntax is the regular or revised one? *) extend_regular () mikmatch-1.0.7/str/test1.ml000066400000000000000000000131041227060510600155140ustar00rootroot00000000000000open Printf let _ = function RE "" -> () let x = 1 in ();; let / alpha / = "a" in ();; (* Definition of regular expressions for further use *) RE space = [' ' '\t' '\n' '\r'] RE not_space = _ # space RE digit = ['0'-'9'] RE letter = ['A'-'Z' '_' 'a'-'z'] RE word = letter+ (* (* Inclusion of file in the same syntax, e.g. a library of user-defined regular expressions. (known problem with error location) *) USE "my_regexps.ml" (* defines `word' and `digit' *) *) (* Extended pattern-matching in the following constructs: match ... with ... try ... with ... function ... *) let test expected a b c = printf "[case %i] " expected; flush stdout; (match a, b, c with None, (None | Some (RE space* )), None -> printf "case 1\n" | Some ({ contents = [| RE (word as x); _; y |]}), (Some ("test" as z | RE word space (word as z))), None -> printf "case 2: %S %S %S\n" x y z | _, _, Some (RE space* (['0'-'9']+ as n)) -> printf "case 3: %s\n" n | _ -> printf "case 4\n"); flush stdout let _ = printf "Tests (match ... with):\n"; flush stdout; test 1 None (Some " ") None; test 2 (Some (ref [| "alpha"; "beta"; "2 gamma" |])) (Some "Hello World!") None; test 3 None None (Some " 123 "); test 4 (Some (ref [| |])) (Some "") (Some "abc") let _ = match "" with (RE (("a" as a) | ("b" as a))) | a -> a let hello_who s = match s with RE _* ['h''H']"ello" ","? space* ((word | space)* word as someone) -> String.capitalize someone | _ -> "nobody" let _ = printf "Extraction of the recipient's name\n"; List.iter (fun s -> printf "Hello who: %S\n" s; printf " -> %S\n" (hello_who s); flush stdout) [ "Hello World!"; "First of all, hello everybody."; "*** hello world ***"; "Hello, Caml riders!" ] let _ = printf "Test (local and global bindings):\n"; flush stdout; match "" with (RE (word as x | space+ (word as x))* ) | _ -> printf "Passed.\n" let _ = printf "Test (repetition range + end of line):\n"; flush stdout; let f s = match s with RE '-'? digit{1-4} eol -> printf "%S has less than 5 digits.\n" s | RE '-'? digit{5-} eol -> printf "%S has at least 5 digits.\n" s | _ -> printf "%S is not a number.\n" s in List.iter f ["123"; "1234"; "12345"; "12345678"; "-1234"; "*1"; "1*"; "9\n*" ] let test f (expected, s) = let (success, result) = f s in let passed = expected = success in if passed then (printf "[OK] %s%s\n" s (match result with None -> "" | Some x -> sprintf " -> %s"x); flush stdout) else (print_endline (s ^ "Failed"); flush stdout; failwith s) let () = printf "Test (no case: the ~ operator):\n"; flush stdout; List.iter (test (function RE "hello"~ " World!" -> true, None | _ -> false, None)) [ true, "Hello World!"; true, "hElLO World!"; false, "hello WORLD!" ] let () = printf "Test (try ... with):\n"; flush stdout; try failwith "Hello World!" with Failure RE "Hello" space* (word as w) -> printf "OK: %s\n" w | Failure s -> printf "Failure: %s\n" s let () = printf "Test (function ... -> ...):\n"; flush stdout; let f = function RE "Hello" space* (word as w) -> printf "OK: %s\n" w | _ -> printf "Error\n" in f "Hello Everybody"; f "Hello Caml!" let () = printf "Test (backreferences):\n"; flush stdout; let f s = match s with RE (digit+ as x | (word as x)) (* x = global id *) (" " as sp !sp)* (* sp = local id *) !x -> true, Some x | _ -> false, None in List.iter (test f) [ true, "123123"; false, "123 123"; true, "123 123"; true, "aaaa"; false, "abc"; false, "ab1ab1" ] let print_named_groups l = List.iter (fun (name, positions) -> printf "%s:" name; List.iter (fun i -> printf " %i" i) positions; printf "\n") l (* Lower level feature: RE_STR returns the source of the regexp, to be used with specific compilation or search options. *) let _ = let (src, named_groups) = RE_STR (("a"|"b"+)? digit{2}) as start (space* word)+ ( digit{1} (word as last_word) | digit{3} (word as last_word)) in printf "Str regexp source: %S\n" src; printf "Named groups and their locations:\n"; print_named_groups named_groups; flush stdout let charset_union = RE_STR digit | space | "a" | ['A'-'Z'] let _ = printf "Debugging test 1:\n"; flush stdout; match ["Coucou Martin"] with [ (RE (word as x) space (word as y)) | ("zobi" as x as y) ] | ("abc" as x :: y :: _) -> printf "Trop cool x=%S y=%S\n" x y | _ -> printf "Bof bof...\n" let _ = printf "Debugging test 2:\n"; flush stdout; match ["Hello"; "!"], 123 with "***" :: _, _ -> printf "Hop!\n" | RE word as w :: RE _ as c :: _, (122|123) when w <> "Bye" -> printf "Cool: %S %S\n" w c | [RE ""], _ -> printf "Glouglou\n" | _ -> printf "Sorry\n" let _ = printf "Debugging test 3:\n"; flush stdout; (match "hello" with (RE ' '{10}) | RE _* ' '{10} eol -> () | _ -> ()); printf "Passed.\n" let _ = match Some "x" with Some ((RE "a") | ("b"|"c")) -> true | _ -> false let _ = match "axxxxyz", 333 with RE "a" as s ("bc" %pos | "x"+ %pos) (_* as s'), _ | (s as s', pos) -> printf "%s, %i, %s\n%!" s pos s' let _ = match "123" with RE digit+ as n := fun _ -> 1 -> n | _ -> 2 (* Parametrized regexps *) let _ = let find s = match "abbcdefgghijkl" with RE _* @s @s -> assert true | _ -> assert false in find "b"; find "g"