This chapter describes the syntax and semantics of the extensible
printers of Camlp5.
Similar to the extensible grammars, the
extensible printers allow to define and extend printers of data or
programs. A specific statement "EXTEND_PRINTER" allow to
define these extensions.
Getting started
A printer is a value of type "Eprinter.t a" where
"a" is the type of the item to be printed. When applied, a
printer returns a string, representing the printed item.
To create a printer, one must use the function
"Eprinter.make" with, as parameter, the name of the
printer, (used in error messages). A printer is created empty,
i.e. it fails if it is applied.
As with grammar entries, printers may have several levels. When the
function "Eprinter.apply" is applied to a printer, the
first level is called. The function "Eprinter.apply_level"
allows to call a printer at some specific level possibly different
from the first one. When a level does not match any value of the
printed item, the next level is tested. If there is no more levels,
the printer fails.
In semantic actions of printers, functions are provided to
recursively call the current level and the next level. Moreover,
a printing context variable is also given, giving the
current indentation, what should be printed before in the same line
and what should be printed after in the same line (it is not
mandatory to use them).
The extension of printers can be done with the
"EXTEND_PRINTER" statement added by the parsing
kit "pa_extprint.cmo".
All printers are extended according to their corresponding
definitions which start with an optional "position" and follow with
the "levels" definition.
Optional position
After the colon, it is possible to specify where to insert the
defined levels:
The identifier "FIRST" (resp. "LAST")
indicates that the level must be inserted before (resp. after) all
possibly existing levels of the entry. They become their first
(resp. last) levels.
The identifier "BEFORE" (resp. "AFTER")
followed by a level label (a string) indicates that the levels
must be inserted before (resp. after) that level, if it exists. If
it does not exist, the extend statement fails at run time.
The identifier "LEVEL" followed by a label indicates
that the first level defined in the extend statement must be
inserted at the given level, extending and modifying it. The other
levels defined in the statement are inserted after this level, and
before the possible levels following this level. If there is no
level with this label, the extend statement fails at run
time.
By default, if the entry has no level, the levels defined in the
statement are inserted in the entry. Otherwise the first defined
level is inserted at the first level of the entry, extending or
modifying it. The other levels are inserted afterwards (before the
possible second level which may previously exist in the entry).
Levels
After the optional "position", the level list follow. The
levels are separated by vertical bars, the whole list being between
brackets.
A level starts with an optional label, which corresponds to its
name. This label is useful to specify this level in case of future
extensions, using the position (see previous section) or
for possible direct calls to this specific level.
Rules
A level is a list of rules separated by vertical bars, the
whole list being between brackets.
A rule is an usual pattern association (in a function or in the
"match" statement), i.e. a pattern, an arrow and an expression. The
expression is the semantic action which must be of type
"string".
Rules insertion
The rules are sorted by their patterns, according to the rules of
the extensible functions.
Semantic action
The semantic action, i.e. the expression following the right arrow
in rules, contains in its environment the variables bound by the
pattern and three more variables:
The variable "curr" which is a function which can be
called to recursively invoke the printer at the current
level,
The variable "next" which is a function which can be
called to invoke the printer at the next level,
The variable "pc" which contains the printing context
of type "Pprintf.pr_context" (see
chapter Pprintf).
The variables "curr" and "next" are of type:
pr_context -> t -> string
where "t" is the type of the printer (i.e. the type of its
patterns).
The variable "curr", "next" and "pc"
have predefined names and can hide the possible identifiers having
the same names in the pattern or in the environment of the
"EXTEND_PRINTER" statement.
This example illustrates the symmetry between parsers and
printers. A simple type of expressions is defined. A parser converts
a string to a value of this type, and a printer converts a value of
this type to a string.
In the printer, there is no use of the "pc" parameter and
no use of the "Pretty" module. The strings are printed on
a single line.
Here is the source (file "foo.ml"):
#load "pa_extend.cmo";
#load "pa_extprint.cmo";
open Printf;
type expr =
[ Op of string and expr and expr
| Int of int
| Var of string ]
;
value g = Grammar.gcreate (Plexer.gmake ());
value pa_e = Grammar.Entry.create g "expr";
value pr_e = Eprinter.make "expr";
EXTEND
pa_e:
[ [ x = SELF; "+"; y = SELF -> Op "+" x y
| x = SELF; "-"; y = SELF -> Op "-" x y ]
| [ x = SELF; "*"; y = SELF -> Op "*" x y
| x = SELF; "/"; y = SELF -> Op "/" x y ]
| [ x = INT -> Int (int_of_string x)
| x = LIDENT -> Var x
| "("; x = SELF; ")" -> x ] ]
;
END;
EXTEND_PRINTER
pr_e:
[ [ Op "+" x y -> sprintf "%s + %s" (curr pc x) (next pc y)
| Op "-" x y -> sprintf "%s - %s" (curr pc x) (next pc y) ]
| [ Op "*" x y -> sprintf "%s * %s" (curr pc x) (next pc y)
| Op "/" x y -> sprintf "%s / %s" (curr pc x) (next pc y) ]
| [ Int x -> string_of_int x
| Var x -> x
| x -> sprintf "(%s)" (Eprinter.apply pr_e pc x) ] ]
;
END;
value parse s = Grammar.Entry.parse pa_e (Stream.of_string s);
value print e = Eprinter.apply pr_e Pprintf.empty_pc e;
if Sys.interactive.val then ()
else print_endline (print (parse Sys.argv.(1)));
Remark on the use of "curr" and "next" while printing operators:
due to left associativity, the first operand uses "curr" and the
second operand uses "next". For right associativity operators, they
should be inverted. For no associativity, both should use
"next".
The last line of the file allows use in either the OCaml toplevel
or as standalone program, taking the string to be printed as
parameter. It can be compiled this way:
ocamlc -pp camlp5r -I +camlp5 gramlib.cma foo.ml
Examples of use (notice the redundant parentheses automatically
removed by the printing algorithm):
$ ./a.out "(3 * x) + (2 / y)"
3 * x + 2 / y
$ ./a.out "(x+y)*(x-y)"
(x + y) * (x - y)
$ ./a.out "x + y - z"
x + y - z
$ ./a.out "(x + y) - z"
x + y - z
$ ./a.out "x + (y - z)"
x + (y - z)
Printing OCaml programs
Complete examples of usage of extensible printers are the printers
in syntaxes and extended syntaxes provided by Camlp5 in the pretty
printing kits:
pr_r.cmo: pretty print in revised syntax
pr_o.cmo: pretty print in normal syntax
pr_rp.cmo: also pretty print the parsers in revised
syntax
pr_op.cmo: also pretty print the parsers in normal
syntax
This chapter presents the Camlp5 syntax tree when Camlp5 is installed
in transitional mode.
Introduction
This syntax tree is defined in the module "MLast" provided
by Camlp5. Each node corresponds to a syntactic entity of the
corresponding type.
For example, the syntax tree of the statement "if" can
be written:
MLast.ExIfe loc e1 e2 e3
where "loc" is the location in the source, and
"e1", "e2" and "e3" are
respectively the expression after the "if", the one after
the "then" and the one after the "else".
If a program needs to manipulate syntax trees, it can use the nodes
defined in the module "MLast". The programmer must know how
the concrete syntax is transformed into this abstract syntax.
A simpler solution is to use one of the quotation kit
"q_MLast.cmo". It
proposes quotations which represent the
abstract syntax (the nodes of the module "MLast") into
concrete syntax with antiquotations to bind variables inside. The
example above can be written:
<:expr< if $e1$ then $e2$ else $e3$ >>
This representation is very interesting when one wants to
manipulate complicated syntax trees. Here is an example taken from
the Camlp5 sources themselves:
<:expr<
match try Some $f$ with [ Stream.Failure -> None ] with
[ Some $p$ -> $e$
| _ -> raise (Stream.Error $e2$) ]
>>
This example was in a position of a pattern. In abstract syntax, it
should have been written:
Which is less readable and much more complicated to build and
update.
Instead of thinking of "a syntax tree", the programmer can think of
"a piece of program".
Location
In all syntax tree nodes, the first parameter is the source
location of the node.
In expressions
When a quotation is in the context of an expression, the location
parameter is "loc" in the node and in all its possible
sub-nodes. Example: if we consider the quotation:
<:sig_item< value foo : int -> bool >>
This quotation, in a context of an expression, is equivalent
to:
MLast.SgVal loc "foo"
(MLast.TyArr loc (MLast.TyLid loc "int") (MLast.TyLid loc "bool"));
The name "loc" is predefined. However, it is possible to
change it, using the argument "-loc" of the Camlp5 shell
commands.
Consequently, if there is no variable "loc" defined in the
context of the quotation, or if it is not of the correct type, a
semantic error occur in the OCaml compiler ("Unbound value
loc").
Note that in the extensible grammars,
the variable "loc" is bound, in all semantic actions, to
the location of the rule.
If the created node has no location, the programmer can define a
variable named "loc" equal to "Ploc.dummy".
In patterns
When a quotation is in the context of a pattern, the location
parameter of all nodes and possible sub-nodes is set to the wildcard
("_"). The same example above:
However, it is possible to generate a binding of the variable
"loc" on the top node by writing a "colon" before the
"less" in the quotation. The same example:
The expressions or patterns between dollar ($) characters are
called antiquotations. In opposition to quotations which
has its own syntax rules, the antiquotation is an area in the syntax
of the enclosing context (expression or pattern). See the chapter
about quotations.
If a quotation is in the context of an expression, the
antiquotation must be an expression. It can be any expression,
including function calls. Examples:
value f e el = <:expr< [$e$ :: $loop False el$] >>;
value patt_list p pl = <:patt< ( $list:[p::pl]$) >>;
If a quotation is in the context of an pattern, the antiquotation
is a pattern. Any pattern is possible, including the wildcard
character ("_"). Examples:
fun [ <:expr< $lid:op$ $_$ $_$ >> -> op ]
match p with [ <:patt< $_$ | $_$ >> -> Some p ]
Nodes and Quotations
This section describes all nodes defined in the module "MLast" of
Camlp5 and how to write them with quotations. Notice that, inside
quotations, one is not restricted to these elementary cases, but
any complex value can be used, resulting on possibly complex combined
nodes.
Node used in the quotation expanders to tells at conversion to
OCaml compiler syntax tree time, that all locations of the
sub-tree is correcty located in the quotation. By default, in
quotations, the locations of all generated nodes are the location
of the whole quotation. This node allows to make an exception to
this rule, since we know that the antiquotation belongs to the
universe of the enclosing program. See the chapter
about quotations and, in particular, its
section about antiquotations.
(2)
The variable "lpee" found in "function", "match" and "try"
statements correspond to a list of "(patt * option expr *
expr)" where the "option expr" is the "when"
optionally following the pattern:
Node internally used to specify a different file name applying to
the whole subtree. This is generated by the directive "use" and
used when converting to the OCaml syntax tree which needs the file
name in its location type.
sig_item
Signature items, i.e. phrases in a ".mli" file or elements
inside "sig ... end".
Camlp5 provides extensions kits to pretty print programs in revised
syntax and normal syntax. Some other extensions kits also allow to
rebuild the parsers, or the EXTEND statements in their initial
syntax. The pretty print system is itself extensible, by adding new
rules. We present here how it works in the Camlp5 sources.
The pretty print system of Camlp5 uses the library modules
Pretty, an original system to format
output) and Extfun, another original
system of extensible functions.
This chapter is designed for programmers that want to understand
how the pretty printing of OCaml programs work in Camlp5, want to
adapt, modify or debug it, or want to add their own pretty printing
extensions.
Introduction
The files doing the pretty printings are located in Camlp5 sources
in the directory "etc". Peruse them if you are interested in creating
new ones. The main ones are:
"etc/pr_r.ml": pretty print in revised syntax.
"etc/pr_o.ml": pretty print in normal syntax.
"etc/pr_rp.ml": rebuilding parsers in their original revised
syntax.
"etc/pr_op.ml": rebuilding parsers in their original normal
syntax.
"etc/pr_extend.ml": rebuilding EXTEND in its original
syntax.
We present here how this system works inside these files. First,
the general principles. Second, more details of the
implementation.
Principles
Using module Pretty
All functions in OCaml pretty printing take a parameter named "the
printing context" (variable pc). It is a record holding :
The current indendation : pc.ind
What should be printed before, in the same line
: pc.bef
What should be printed after, in the same line
: pc.aft
The dangling token, useful in normal syntax to know whether
parentheses are necessary : pc.dang
A typical pretty printing function calls the
function horiz_vertic of the library
module Pretty. This function takes two
functions as parameter:
The way to print the data in one only line (horizontal
printing)
The way to print the data in two or more lines (vertical
printing)
Both functions catenate the strings by using the
function sprintf of the library
module Pretty which controls whether the printed data
holds in the line or not. They generally call, recursively, other
pretty printing functions with the same behaviour.
Let us see an example (fictitious) of printing an OCaml
application. Let us suppose we have an application expression
"e1 e2" to pretty print where e1
and e2 are sub-expressions. If both expressions and
their application holds on one only line, we want to see:
e1 e2
On the other hand, if they do not hold on one only line, we want to
see e2 in another line with, say, an indendation of 2
spaces:
e1
e2
Here is a possible implementation. The function has been
named expr_app and can call the function expr to
print the sub-expressions e1 and e2:
value expr_app pc e1 e2 =
horiz_vertic
(fun () ->
let s1 = expr {(pc) with aft = ""} e1 in
let s2 = expr {(pc) with bef = ""} e2 in
sprintf "%s %s" s1 s2)
(fun () ->
let s1 = expr {(pc) with aft = ""} e1 in
let s2 =
expr
{(pc) with
ind = pc.ind + 2;
bef = tab (pc.ind + 2)}
e2
in
sprintf "%s\n%s" s1 s2)
;
The first function is the horizontal printing. It ends with a
sprintf separating the printing of e1
and e2 by a space. The possible "before part"
(pc.bef) and "after part" (pc.aft) are
transmitted in the calls of the sub-functions.
The second function is the vertical printing. It ends with a
sprintf separating the printing of e1
and e2 by a newline. The second line starts with an
indendation, using the "before part" (pc.bef) of the
second call to expr.
The pretty printing library
function Pretty.horiz_vertic calls the first
(horizontal) function, and if it fails (either
because s1 or s2 are too long or hold
newlines, or because the final string produced
by sprintf is too long), calls the second
(vertical) function.
Notice that the parameter pc contains a
field pc.bef (what should be printed before in the same
line), which in both cases is transmitted to the printing
of e1 (since the syntax {(pc) with aft =
""} is a record with pc.bef kept). Same for the
field pc.aft transmitted to the printing
of e2.
Using EXTEND_PRINTER statement
This system is combined to the extensible
printers to allow the extensibility of the pretty printing.
The code above actually looks like:
EXTEND_PRINTER
pr_expr:
[ [ <:expr< $e1$ $e2$ >> ->
horiz_vertic
(fun () ->
let s1 = curr {(pc) with aft = ""} e1 in
let s2 = next {(pc) with bef = ""} e2 in
sprintf "%s %s" s1 s2)
(fun () ->
let s1 = curr {(pc) with aft = ""} e1 in
let s2 =
next
{(pc) with
ind = pc.ind + 2;
bef = tab (pc.ind + 2)}
e2
in
sprintf "%s\n%s" s1 s2) ] ]
;
END;
The variable "pc" is implicit in the semantic actions of
the syntax "EXTEND_PRINTER", as well as two other
variables: "curr" and "next".
These parameters, "curr" and "next", correspond
to the pretty printing of, respectively, the current level and the
next level. Since the application in OCaml is left associative, the
first sub-expression is printed at the same (current) level and the
second one is printed at the next level. We also see a call
to next in the last (2nd) case of the function to treat
the other cases in the next level.
Dangling else, bar, semicolon
In normal syntax, there are cases where it is necessary to enclose
expressions between parentheses (or
between begin
and end, which is equivalent in
that syntax). Three tokens may cause problems: the
"else", the vertical bar "|" and the
semicolon ";". Here are examples where the presence of
these tokens constrains the previous expression to be
parenthesized. In these three examples, removing
the begin..end
enclosers would change the meaning of the expression because the
dangling token would be included in that expression:
Dangling else:
if a then begin if b then c endelse d
Dangling bar:
function
A ->
begin match a with
B -> c
| D -> e
end| F -> g
Dangling semicolon:
if a then b
else begin
let c = d in
e
end;
f
The information is transmitted by the
value pc.dang. In the first example above, while
displaying the "then" part of the outer
"if", the sub-expression is called with the
value pc.dang set to
"else" to inform the last sub-sub-expression that it is
going to be followed by that token. When a "if"
expression should be displayed without "else" part, and
that its "pc.dang" is "else", it should be enclosed
with spaces.
This problem does not exist in revised syntax. While pretty
printing in revised syntax, the parameter pc.dang is
not necessary and remains the empty string.
By level
As explained in the chapter about
the extensible printers (with the
EXTEND_PRINTER statement), printers contain levels. The global
printer variable of expressions is named "pr_expr" and
contain all definitions for pretty printing expressions, organized
by levels, just like the parsing of expressions. The definition of
"pr_expr" actually looks like this:
The Prtools module is defined inside Camlp5 for pretty printing
kits. It provides variables and functions to process comments, and
meta-functions to process lists (horizontally, vertically,
paragraphly).
Comments
value comm_bef : int -> MLast.loc -> string;
"comm_bef ind loc" get the comment from the source just
before the given location "loc". This comment may be
reindented using "ind". Returns the empty string if no
comment found.
value source : ref string;
The initial source string, which must be set by the pretty
printing kit. Used by [comm_bef] above.
value set_comm_min_pos : int -> unit;
Set the minimum position of the source where comments can be
found, (to prevent possible duplication of comments).
Meta functions for lists
type pr_fun 'a = pr_context -> 'a -> string;
Type of printer functions.
value hlist : pr_fun 'a -> pr_fun (list 'a);
[hlist elem pc el] returns the horizontally pretty printed
string of a list of elements; elements are separated with
spaces. The list is displayed in one only line. If this
function is called in the context of the [horiz] function of the
function [horiz_vertic] of the module Printing, and if the line
overflows or contains newlines, the function internally fails (the
exception is catched by [horiz_vertic] for a vertical pretty
print).
value hlist2 : pr_fun 'a -> pr_fun 'a -> pr_fun (list 'a);
horizontal list with a different function from 2nd element on.
value hlistl : pr_fun 'a -> pr_fun 'a -> pr_fun (list 'a);
horizontal list with a different function for the last element.
value vlist : pr_fun 'a -> pr_fun (list 'a);
[vlist elem pc el] returns the vertically pretty printed string
of a list of elements; elements are separated with newlines and
indentations.
value vlist2 : pr_fun 'a -> pr_fun 'a -> pr_fun (list
'a);
vertical list with different function from 2nd element on.
vertical list with different function from 2nd element on, the
boolean value being True for the last element of the list.
value vlistl : pr_fun 'a -> pr_fun 'a -> pr_fun (list
'a);
vertical list with different function for the last element.
value plist : pr_fun 'a -> int -> pr_fun (list ('a *
string));
[plist elem sh pc el] returns the pretty printed string of a
list of elements with separators. The elements are printed
horizontally as far as possible. When an element does not fit on
the line, a newline is added and the element is displayed in the
next line with an indentation of [sh]. [elem] is the function to
print elements, [el] a list of pairs (element * separator) (the
last separator being ignored).
value plistb : pr_fun 'a -> int -> pr_fun (list ('a *
string));
[plist elem sh pc el] returns the pretty printed string of the
list of elements, like with [plist] but the value of [pc.bef]
corresponds to an element already printed, as it were on the
list. Therefore, if the first element of [el] does not fit in the
line, a newline and a tabulation is added after [pc.bef].
value plistl : pr_fun 'a -> pr_fun 'a -> int -> pr_fun (list
('a * string));
paragraph list with a different function for the last element.
value hvlistl : pr_fun 'a -> pr_fun 'a -> pr_fun (list
'a);
applies "hlistl" if the context is horizontal; else
applies "vlistl".
Miscellaneous
value tab : int -> string;
[tab ind] is equivalent to [String.make ind ' ']
value flatten_sequence : MLast.expr -> option (list
MLast.expr);
[flatten_sequence e]. If [e] is an expression representing a
sequence, return the list of expressions of the sequence. If some
of these expressions are already sequences, they are flattened in
the list. If that list contains expressions of the form let..in
sequence, this sub-sequence is also flattened with the let..in
applying only to the first expression of the sequence. If [e] is a
let..in sequence, it works the same way. If [e] is not a sequence
nor a let..in sequence, return None.
Example : repeat..until
This pretty prints the example
repeat..until
statement programmed in the chapter Syntax
extensions (first version generating a "while"
statement).
The code
The pattern generated by the "repeat" statement is recognized
(sequence ending with a "while" whose contents is the same
than the beginning of the sequence) by the function "is_repeat" and
the repeat statement is pretty printed in its initial form using the
function "horiz_vertic" of the Pretty module. File
"pr_repeat.ml":
#load "pa_extprint.cmo";
#load "q_MLast.cmo";
open Pcaml;
open Pretty;
open Prtools;
value eq_expr_list el1 el2 =
if List.length el1 <> List.length el2 then False
else List.for_all2 eq_expr el1 el2
;
value is_repeat el =
match List.rev el with
[ [<:expr< while not $e$ do { $list:el2$ } >> :: rel1] ->
eq_expr_list (List.rev rel1) el2
| _ -> False ]
;
value semi_after pr pc = pr {(pc) with aft = sprintf "%s;" pc.aft};
EXTEND_PRINTER
pr_expr:
[ [ <:expr< do { $list:el$ } >> when is_repeat el ->
match List.rev el with
[ [<:expr< while not $e$ do { $list:el$ } >> :: _] ->
horiz_vertic
(fun () ->
sprintf "%srepeat %s until %s%s" pc.bef
(hlistl (semi_after curr) curr
{(pc) with bef = ""; aft = ""} el)
(curr {(pc) with bef = ""; aft = ""} e)
pc.aft)
(fun () ->
let s1 = sprintf "%srepeat" (tab pc.ind) in
let s2 =
vlistl (semi_after curr) curr
{(pc) with
ind = pc.ind + 2;
bef = tab (pc.ind + 2);
aft = ""}
el
in
let s3 =
sprintf "%suntil %s" (tab pc.ind)
(curr {(pc) with bef = ""} e)
in
sprintf "%s\n%s\n%s" s1 s2 s3)
| _ -> assert False ] ] ]
;
END;
Compilation
ocamlc -pp camlp5r -I +camlp5 -c pr_repeat.ml
Testing
Getting the same files "foo.ml" and "bar.ml" of the repeat syntax
example:
$ cat bar.ml
#load "./foo.cmo";
value x = ref 42;
repeat
print_int x.val;
print_endline "";
x.val := x.val + 3
until x.val > 70;
$ camlp
Without the pretty printing kit:
$ camlp5r pr_r.cmo bar.ml
#load "./foo.cmo";
value x = ref 42;
do {
print_int x.val;
print_endline "";
x.val := x.val + 3;
while not (x.val > 70) do {
print_int x.val;
print_endline "";
x.val := x.val + 3
}
};
With the pretty printing kit:
$ camlp5r pr_r.cmo ./pr_repeat.cmo bar.ml -l 75
#load "./foo.cmo";
value x = ref 42;
repeat
print_int x.val;
print_endline "";
x.val := x.val + 3
until x.val > 70;
The revised syntax is an alternative syntax of OCaml. It is close
to the normal syntax. We present here only the differences between
the two syntaxes.
Notice that there is a simple way to know how the normal syntax is
written in revised syntax: write the code in a file "foo.ml" in
normal syntax and type, in a shell:
camlp5o pr_r.cmo pr_rp.cmo foo.ml
And, conversely, how a file "bar.ml" written in revised syntax is
displayed in normal syntax:
camlp5r pr_o.cmo pr_op.cmo bar.ml
Even simpler, without creating a file:
camlp5o pr_r.cmo pr_op.cmo -impl -
... type in normal syntax ...
... type control-D ...
camlp5r pr_o.cmo pr_rp.cmo -impl -
... type in revised syntax ...
... type control-D ...
Lexing
The character quote (') can be written without backslash:
OCaml
Revised
'\''
'''
Modules, Structure and Signature items
Structure and signature items always end with a single semicolon
which is required.
In structures, the declaration of a value is introduced by the
keyword "value", instead of "let":
OCaml
Revised
let x = 42;;
value x = 42;
let x = 42 in x + 7;;
let x = 42 in x + 7;
In signatures, the declaration of a value is also introduced by
the keyword "value", instead of "val":
OCaml
Revised
val x : int;;
value x : int;
In signatures, abstract module types are represented by a quote
and an (any) identifier:
OCaml
Revised
module type MT;;
module type MT = 'a;
Functor application uses currying. Parentheses are not
required for the parameters:
OCaml
Revised
type t = Set.Make(M).t;;
type t = (Set.Make M).t;
module M = Mod.Make (M1) (M2);;
module M = Mod.Make M1 M2;
It is possible to group several declarations together either in
an interface or in an implementation by enclosing them between
"declare" and "end" (this is useful when using syntax extensions
to generate several declarations from one). Example in an
interface:
declare
type foo = [ Foo of int | Bar ];
value f : foo -> int;
end;
Expressions and Patterns
Imperative constructions
The sequence is introduced by the keyword "do" followed by "{"
and terminated by "}"; it is possible to put a semicolon after the
last expression:
OCaml
Revised
e1; e2; e3; e4
do { e1; e2; e3; e4 }
The "do" after the "while" loop and the "for" loop are followed
by a "{" and the loop end with "}"; it is possible to put a
semicolon after the last expression:
OCaml
Revised
while e1 do
while e1 do {
e1; e2; e3
e1; e2; e3
done
}
for i = e1 to e2 do
for i = e1 to e2 do {
e1; e2; e3
e1; e2; e3
done
}
Tuples and Lists
Parentheses are required in tuples:
OCaml
Revised
1, "hello", World
(1, "hello", World)
Lists are always enclosed with brackets. A list is a left
bracket, followed by a list of elements separated with semicolons,
optionally followed by colon-colon and an element, and ended by a
right bracket. Warning: the colon-colon is not an infix but is
just part of the syntactic construction.
OCaml
Revised
x :: y
[x :: y]
[x; y; z]
[x; y; z]
x :: y :: z :: t
[x; y; z :: t]
Records
In record update, parentheses are required around the initial
expression:
OCaml
Revised
{e with field = a}
{(e) with field = a}
It is allowable to use function binding syntax in record field
definitions:
OCaml
Revised
{field = fun a -> e}
{field a = e}
Irrefutable patterns
An irrefutable pattern is a pattern which is
syntactically visible and never fails. They are used in some
syntactic constructions. It is either:
A variable,
The wildcard "_",
The constructor "()",
A tuple with irrefutable patterns,
A record with irrefutable patterns,
A type constraint with an irrefutable pattern.
Notice that this definition is only syntactic: a constructor
belonging to a type having only one constructor is not considered as
an irrefutable pattern (except "()").
Constructions with matching
The keyword "function" no longer exists. Only "fun" is used.
The pattern matching, in constructions with "fun", "match" and
"try" is closed with brackets: an open bracket "[" before the
first case, and a close bracket "]" after the last case:
OCaml
Revised
match e with
match e with
p1 -> e1
[ p1 -> e1
| p2 -> e2
| p2 -> e2 ]
If there is only one case and if the pattern is irrefutable,
the brackets are not required. These examples work identically
in OCaml and in revised syntax:
OCaml
Revised
fun x -> x
fun x -> x
fun {foo=(y, _)} -> y
fun {foo=(y, _)} -> y
It is possible to write the empty function which always raises the
exception "Match_failure" when a parameter is applied. It is also possible to write and
empty "match", raising "Match_failure" after having evaluated its
expression and the empty "try", equivalent to its expression
without try:
fun []
match e with []
try e with []
The patterns after "let" and "value" must be irrefutable. The
following OCaml expression:
let f (x::y) = ...
must be written:
let f = fun [ [x::y] -> ...
It is possible to use a construction "where", equivalent to "let",
but usable only when where is only one binding. The expression:
e1 where p = e
is equivalent to:
let p = e in e1
Mutables and Assignment
The statement "<-" is written ":=":
OCaml
Revised
x.f <- y
x.f := y
The "ref" type is declared as a record type with one field named
"val", instead of "contents". The operator "!" does not exist any
more, and references are assigned like the other mutables:
OCaml
Revised
x := !x + y
x.val := x.val + y
Miscellaneous
The "else" is required in the "if"
statement:
OCaml
Revised
if a then b
if a then b else ()
The boolean operations "or" and "and" can only
be written with "||" and "&&":
OCaml
Revised
a or b & c
a || b && c
a || b && c
a || b && c
No more "begin end" construction. One must use
parentheses.
The operators as values are written with an backslash:
OCaml
Revised
(+)
\+
(mod)
\mod
Nested "as" patterns require parenthesis:
OCaml
Revised
function Some a as b, c ->
fun [ ((Some a as b), c) ->
...
...
But they are not required before the right arrow:
OCaml
Revised
function Some a as b ->
fun [ Some a as b ->
...
...
The operators with special characters are not automatically
infix. To define infixes, use syntax extensions.
Types and Constructors
The type constructors are before their type parameters, which
are curryfied:
OCaml
Revised
int list
list int
('a, bool) Hashtbl.t
Hashtbl.t 'a bool
type 'a foo = 'a list list
type foo 'a = list (list 'a)
The abstract types are represented by an unbound type variable:
OCaml
Revised
type 'a foo;;
type foo 'a = 'b;
type bar;;
type bar = 'a;
Parentheses are required in tuples of types:
OCaml
Revised
int * bool
(int * bool)
In declarations of a concrete type, brackets must enclose the
constructor declarations:
OCaml
Revised
type t = A of i | B;;
type t = [ A of i | B ];
It is possible to make the empty type, without constructor:
type foo = [];
There is a syntax difference between data constructors with
several parameters and data constructors with one parameter of type
tuple:
The declaration of a data constructor with several parameters
is done by separating the types with "and". In expressions and
patterns, these constructor parameters must be curryfied:
OCaml
Revised
type t = C of t1 * t2;;
type t = [ C of t1 and t2 ];
C (x, y);;
C x y;
The declaration of a data constructor with one parameter of type
tuple is done by using a tuple type. In expressions and patterns,
the parameter must not to be curryfied, since it is alone. In that
case the syntax of constructor parameters is the same between the
two syntaxes:
OCaml
Revised
type t = D of (t1 * t2);;
type t = [ D of (t1 * t2) ];
D (x, y);;
D (x, y);
The bool constructors start with an uppercase letter. The identifiers
"true" and "false" are not keywords:
OCaml
Revised
true && false
True && False
In record types, the keyword "mutable" must appear after the
colon:
OCaml
Revised
type t = {mutable x : t1};;
type t = {x : mutable t1};
Manifest types are with "==":
OCaml
Revised
type 'a t = 'a option =
type t 'a = option 'a ==
None
[ None
| Some of 'a
| Some of 'a ]
Polymorphic types start with "!":
OCaml
Revised
type t =
type t =
{ f : 'a . 'a list }
{ f : ! 'a . list 'a }
Streams and Parsers
The streams and the stream patterns are bracketed with "[:"
and ":]" instead of "[<" and ">]".
The stream component "terminal" is written with a back-quote
instead of a quote:
OCaml
Revised
[< '1; '2; s; '3 >]
[: `1; `2; s; `3 :]
The cases of parsers are bracketed with "[" and
"]", as with "fun", "match" and "try". If there is one
case, the brackets are not required:
OCaml
Revised
parser
parser
[< 'Foo >] -> e
[ [: `Foo :] -> e
| [< p = f >] -> f;;
| [: p = f :] -> f ];
parser [< 'x >] -> x;;
parser [: `x :] -> x;
It is possible to write the empty parser raising the exception
"Stream.Failure" whatever parameter is applied, and the empty
stream matching always raising "Stream.Failure":
parser []
match e with parser []
In normal syntax, the error indicator starts with a double question
mark, in revised syntax with a simple question mark:
OCaml
Revised
parser
parser
[< '1; '2 ?? "error" >] ->
[: `1; `2 ? "error" :] ->
...
...
In normal syntax, the component optimization starts with
"?!", in revised syntax with "!":
OCaml
Revised
parser
parser
[< '1; '2 ?! >] ->
[: `1; `2 ! :] ->
...
...
Classes and Objects
Object items end with a single semicolon which is required.
Class type parameters follow the class identifier:
OCaml
Revised
class ['a, 'b] point = ...
class point ['a, 'b] = ...
class c = [int] color;;
class c = color [int];
In the type of class with parameters, the type of the parameters
are between brackets. Example in signature:
OCaml
Revised
class c : int -> point;;
class c : [int] -> point;
The keywords "virtual" and "private" must be in this order:
OCaml
Revised
method virtual private m :
method virtual private m :
...
...
method private virtual m :
method virtual private m :
...
...
Object variables are introduced with "value" instead of "val":
OCaml
Revised
object val x = 3 end
object value x = 3; end
Type constraints in objects are introduced with "type" instead of
"constraint":
OCaml
Revised
object constraint 'a = int end
object type 'a = int; end
Labels and Variants
Labels in types must start with "~":
OCaml
Revised
val x : num:int -> bool;;
value x : ~num:int -> bool;
Types whose number of variants are fixed start with "[ =":
OCaml
Revised
type t = [`On | `Off];;
type t = [ = `On | `Off];
The "[" and the "<" in variant types must
not be sticked:
This way, the parsing is done by Camlp5. In case of syntax errors,
the parsing fails with an error message and the compilation is
aborted. Otherwise, the OCaml compiler continues with the syntax tree
provided by Camlp5.
In the toplevel, it is possible to preprocess the input phrases by
loading one of the files "camlp5o.cma" or
"camlp5r.cma". The common usage is:
In general, in this documentation, when a command requires:
-I +camlp5
it can be replaced by:
-I `camlp5 -where`
or, by:
-I <directory>
where "directory" is the directory path where the Camlp5 library
files are installed.
Parsing and Printing kits
Parsing and printing extensions are OCaml object files, i.e. files
with the extension ".cmo" or ".cma". They are the
result of the compilation of OCaml source files containing what is
necessary to do the parsing or printing. These object files are
named parsing and printing kits.
These files cannot be linked to produce executables because they
generally call functions and use variables defined only in Camlp5
core, typically belonging to the module "Pcaml". The kits
are designed to be loaded by the Camlp5 commands, either through
their command arguments or through directives in the source
files.
It is therefore important to compile the kits with the
option "-c" of the OCaml compiler (i.e. just compilation,
not producing an executable) and with the option "-I
+camlp5" (or "-I `camlp5 -where`") to inform the
compiler to find module interfaces in installed Camlp5 library.
In the OCaml toplevel, it is possible to use a kit by simply
loading it with the directive "#load".
Extending syntax
A syntax extension is a Camlp5 parsing kit. There are two ways to
use a syntax extension:
Either by giving this object file as parameter to the Camlp5
command. For example:
ocamlc -pp "camlp5o ./myext.cmo" foo.ml
Or by adding the directive "#load" in the source file:
#load "./myext.cmo";;
and then compile it simply like this:
ocamlc -pp camlp5o foo.ml
Several syntax extensions can be used for a single file. The way to
create one's own syntax extensions is explained in this document.
Pretty printing
As for syntax extensions, the pretty printing is defined or
extended through Camlp5 printing kits. Some pretty printing kits are
provided by Camlp5, the main ones being:
pr_o.cmo: to pretty print in normal syntax,
pr_r.cmo: to pretty print in revised syntax.
Examples: if we have a file, foo.ml, written in normal
syntax and and another one, bar.ml, written in revised
syntax, here are the commands to pretty print them in their own
syntax:
camlp5o pr_o.cmo foo.ml
camlp5r pr_r.cmo bar.ml
And how to convert them into the other syntax:
camlp5o pr_r.cmo foo.ml
camlp5r pr_o.cmo foo.ml
The way to create one's own pretty printing extensions is
explained in this document.
Note: the revised syntax
The revised syntax is a specific syntax whose aim is to
resolve some problems and inconsistencies of the normal OCaml
syntax. A chapter will explain the differences between the normal
and the revised syntax.
All examples of this documentation are written in that revised
syntax. Even if you don't know it, it is not difficult to
understand. The same examples can be written in normal syntax. In
case of problems, refer to the chapter describing it.
|b3\1e3|' |
sed -e '{:b N; s/\n//; tb}' |
sed -e 's/e2b3/\n
\n
/g' |
sed -e 's|e3b2|
\n
\n \n
|g' |
sed -e 's|e2b2|
\n
|g' |
sed -e 's|e2|
|g' |
sed -e 's/b2/
/g' |
sed -e 's|e3b3|
\n
|g' |
sed -e 's|e3|
\n \n |g'
echo ' '
echo '
'
) > toc.tmp
cat $FILE |
sed \
-e '/title="Normal"/a \
' |
sed \
-e '/
/i\aaa' \
-e '/
/,/^<\/div>/d' |
sed -e '/aaa/r menu.html' -e '/aaa/d' |
sed \
-e '/
/i\aaa' \
-e '/
/,/^<\/div>/d' |
sed -e '/aaa/r toc.tmp' -e '/aaa/d' |
sed \
-e '/
/i\aaa' \
-e '/
/,/^<\/div>/d' |
sed -e "/aaa/i\↑" |
sed -e '/aaa/r trailer.html' -e '/aaa/d' |
sed -e '/
/s|
\(.*\)
|
\1
|' |
sed -e '/
/s|
\(.*\)
|
\1
|' |
sed -e '{:b s/\(href="#[ab][^>]*\)[ .,#?"]\([^>]\)/\1-\2/; tb}' |
sed -e '{:b s/\(id="[ab][^>]*\)[ .,#?"]\([^>]\)/\1-\2/; tb}' |
sed -e "s||$VERSION|" |
sed -e '//r ../../LICENSE' -e '//d'
rm toc.tmp
camlp5-6.14/doc/htmlp/lexers.html 0000664 0001750 0001750 00000031772 12556457206 015764 0 ustar roglo roglo
lexers
Stream lexers
The file "pa_lexer.cmo" is a Camlp5 syntax extension kit
for parsers of streams of the type 'char'. This syntax is shorter
and more readable than its equivalent version written
with classical stream parsers. Like
classical parsers, they use recursive descendant parsing. They are
also pure syntax sugar, and each lexer written with this syntax can
be written using normal parsers syntax.
(An old version, named "pa_lex.cmo" was provided before
with a different syntax. It is no longer distributed, its proposed
syntax being confusing.)
Introduction
Classical parsers in OCaml apply to streams of any type of
values. For the specific type "char", it has been possible to
shorten the encoding in several ways, in particular by using strings
to group several characters together, and by hidding the management
of a "lexing buffer", a data structure recording the matched
characters.
Let us take an example. The following function parses a left
bracket followed by a less, a colon or nothing, and record the
result in a buffer. In classical parsers syntax, this could be
written like this:
The identifiers "STRING", "CHAR" and
"LIDENT" above represent the OCaml tokens corresponding to
string, character and lowercase identifier (identifier starting with
a lowercase character).
Moreover, together with that syntax extension, another extension is
added the entry expression, typically for the semantics
actions of the "lexer" statement above, but not only. It
is:
Remark: the identifiers "add", "buf", "empty" and "pos" are not
keywords (they are not reserved words) but just identifiers. On the
contrary, the identifier "lexer", which introduces the
syntax, is a new keyword and cannot be used as variable identifier
any more.
Semantics
A lexer defined in the syntax above is a shortcut version of a
parser applied to the specific case of streams of characters. It
could be written with a normal parser. The proposed syntax is much
shorter, easier to use and to understand, and silently takes care of
the lexing buffer for the programmer. The lexing buffers are data
structures, which are passed as parameters to called lexers and
returned by them.
Our lexers are of the type:
Plexing.Lexbuf.t -> Stream.t char -> u
where "u" is a type which depends on what the lexer
returns. If there is no semantic action (since it it optional), this
type is automatically "Plexing.Lexbuf.t" also.
A lexer is, actually, a function with two implicit parameters: the
first one is the lexing buffer itself, and the second one the
stream. When called, it tries to match the stream against its first
rule. If it fails, it tries its second rule, and so on, up to its
last rule. If the last rule fails, the lexer fails by raising the
exception "Stream.Failure". All of this is
the usual behaviour of stream
parsers.
In a rule, when a character is matched, it is inserted into the
lexing buffer, except if the "no record" feature is used (see
further).
Rules which have no semantic action return the lexing buffer
itself.
Symbols
The different kinds or symbols in a rule are:
The token "underscore", which represents any character. Fails
only if the stream is empty.
A character which represents a matching of this character.
A character followed by the minus sign and by another character
which represent all characters in the range between the two
characters in question.
A string with represents a matching of all its characters, one
after the other.
An expression corresponding to a call to another lexer, which
takes the buffer as first parameter and returns another lexing
buffer with all characters found in the stream added to the
initial lexing buffer.
The sequence "?=" introducing lookahead
characters.
A rule, recursively, between brackets, inlining a lexer.
In the cases matching characters (namely underscore, character,
characters range and string), the symbol can be optionally followed
by the "no record" character "slash" specifying that the found
character(s) are not added into the lexing buffer. By default, they
are. This feature is useful, for example, writing a lexer which
parses strings, when the initial double quote and final double quote
should not be part of the string itself.
Moreover, a symbol can be followed by an optional error indicator,
which can be:
The character ? (question mark) followed by a
string expression, telling that, if there is a syntax error at
this point (i.e. the symbol is not matched although the beginning
of the rule was), the exception Stream.Error is
raised with that string as parameter. Without this indicator, it
is raised with the empty string. This is the same behaviour than
with classical stream parsers.
The character ! (exclamation mark), which is just an
indicator to let the syntax expander optimize the code. If the
programmer is sure that the symbol never fails (i.e. never
raises Stream.Failure), in particular if this symbol
recognizes the empty rule, he can add this exclamation mark. If it
is used correctly (the compiler cannot check it), the behaviour is
identical as without the !, except that the code is
shorter and faster, and can sometimes be tail recursive. If the
indication is not correct, the behaviour of the lexer is
undefined.
Specific expressions
When loading this syntax extension, the
entry <expression>, at level labelled "simple" of the
OCaml language is extended with the following rules:
$add followed by a string, specifing that the
programmer wants to add all characters of the string in the lexing
buffer. It returns the new lexing buffer. It corresponds to an
iteration of calls to Plexing.Lexbuf.add with all
characters of the string with the current lexing buffer as initial
parameter.
$buf which returns the lexing buffer converted into
string.
$empty which returns an empty lexing buffer.
$pos which returns the current position of the
stream in number of characters (starting at zero).
Lookahead
Lookahead is useful in some cases, when factorization of rules is
impossible. To understand how it is useful, a first remark must be
done, about the usual behaviour of Camlp5 stream parsers.
Stream parsers (including these lexers) use a limited parsing
algorithm, in a way that when the first symbol of a rule is matched,
all the following symbols of the same rule must apply, otherwise it
is a syntax error. There is no backtrack. In most of the cases, left
factorization of rules resolve conflicting problems. For example, in
parsers of tokens (which is not our case here, since we parse only
characters), when one writes a parser to recognize both typical
grammar rules "if..then..else" and the shorter "if..then..", the
system transforms them into a single rule starting with "if..then.."
followed by a call to a parser recognizing
"else.." or nothing.
Sometimes, however, this left factorization is not possible. A
lookahead of the stream to check the presence of some elements
(these elements being characters, if we are using this "lexer"
syntax) might be necessary to decide if is a good idea to start the
rule. This lookahead feature may unfreeze several characters from
the input stream but without removing them.
Syntactically, a lookahead starts with ?= and is
followed by one or several lookahead sequences separated by the
vertical bar |, the whole list being enclosed by
braces.
If there are several lookaheads, they must all be of the same size
(contain the same number of characters).
If the lookahead sequence is just a string, it corresponds to all
characters of this string in the order (which is different for
strings outside lookahead sequences, representing a choice of all
characters).
Examples of lookaheads:
?= [ _ ''' | '\\' _ ]
?= [ "<<" | "<:" ]
The first line above matches a stream whose second character is a
quote or a stream whose first character is a backslash (real example
in the lexer of OCaml, in the library of Camlp5, named
"plexer.ml"). The second line matches a stream starting with the two
characters < and < or starting with
the two characters < and : (this is
another example in the same file).
Semantic actions of rules
By default, the result of a "lexer" is the current lexing buffer,
which is of type "Plexing.Lexbuf.t". But it is possible to
return other values, by adding "->" at end of rules
followed by the expression you want to return, as in usual pattern
matching in OCaml.
An interesting result, for example, could be the string
corresponding to the characters of the lexing buffer. This can be
obtained by returning the value "$buf".
A complete example
A complete example can be seen in the sources of Camlp5, file
"lib/plexer.ml". This is the lexer of OCaml, either "normal" or
"revised" syntax.
Compiling
To compile a file containing lexers, just
load pa_lexer.cmo using one of the following methods:
Either by adding pa_lexer.cmo among the Camlp5
options. See the Camlp5 manual page or documentation.
Or by adding #load "pa_lexer.cmo"; anywhere in the
file, before the usages of this "lexer" syntax.
How to display the generated code
You can see the generated code, for a file "bar.ml" containing
lexers, by typing in a command line:
camlp5r pa_lexer.cmo pr_r.cmo bar.ml
To see the equivalent code with stream parsers, use:
Information for developpers of the Camlp5 program.
Kernel
The sources are composed of:
the OCaml stuff, copied from the OCaml compiler
the kernel composed of the directories:
odyl : the dynamic loading system
lib : the library
main : the main program camlp5
meta : the parsers for revised syntax, ast quotations,
EXTEND statement, etc/
the rest: directories etc, compile, ocpp
Some other directories contain configuration files, tools,
documentation and manual pages.
The kernel is sufficient to make the core system work: it is possible
to compile and bootstrap only it. All sources being in revised syntax,
the first compilation of Camlp5 is done by a version of this kernel
in pure OCaml syntax, located in the directory ocaml_src.
These sources in pure OCaml syntax are not modified by hand. When
changes are made to the kernel, and a check is done that it
correctly compiles and bootstraps, the kernel in pure OCaml syntax
is rebuilt using Camlp5 pretty print. This is done by the command
"make bootstrap_sources".
Compatibility
This distribution of Camlp5 is compatible with several versions of
OCaml. The definition of OCaml syntax trees may change from OCaml
version to version, which can be a problem. Since OCaml does not
install the sources nor the compiled versions of its syntax tree, a
copy of the necessary source files, borrowed from the source of the
OCaml compiler is in the directory 'ocaml_stuff', in subdirectories
with the OCaml version number.
If the present distribution of Camlp5 is not compatible with the
version of OCaml you have (the command 'configure' tells you), it
is possible to add it. For that, you need the sources to your specific OCaml
distribution. If you have them then a 'configure' telling you
that camlp5 is not compatible, do:
make steal OCAML_SRC=<path-to-OCaml-sources>
This creates a new directory in 'ocaml_stuff' with sources of the
syntax tree of your OCaml compiler.
If you want to check that the sources of the syntax tree of OCaml
are up-to-date (e.g. if this is the current OCaml developpement),
do:
make compare_stolen OCAML_SRC=<path-to-OCaml-sources>
The compatibility is also done with the file 'lib/versdep.ml',
which is a module containing miscellaneous features depending to
the version of OCaml.
In the directory 'ocaml_src' which contains the pure OCaml sources
of the Camlp5 core (see chapter TREE STRUCTURE below), there are as
many versions of this files as versions of OCaml. They are named
'version.ml' in the directory 'lib/versdep'. If you are adding
a new version of OCaml, you need this file. As a first step, make a
copy from a close version:
cd ocaml_src/lib/versdep
cp <close_version>.ml <version>.ml
Then, you can rerun "configure" and do "make core". If the file
'ocaml_src/lib/versdep.ml' has compilation problems, fix them 'make
core' again. When it compiles, copy it into the subdirectory
'versdep' as '<version>.ml', overwriting the version you copied
from the close version.
Later, the same file 'lib/versdep.ml' in Camlp5 syntax may have
similar compilation problems. There is only a single version of this file,
thanks to IFDEF constructs used here or there.
While compiling with some specific version of OCaml, this file is
compiled with 'OCAML_vers' defined where 'vers' is the version number
form the beginning to the first space or charcter '+' with all dots
converted into underscores. For example, if your OCaml version is
7.04.2+dev35, you can see in the compilation process of versdep.ml
that OCAML_7_04_2 is defined, and you can add statements defined
by the syntax extension 'pa_macro.cmo', for example IFDEF OCAML_7_04_2.
Add statements like that in 'lib/versdep.ml' to make it compile
successfully.
Tree structure
The directory 'ocaml_src' contains images in pure OCaml syntax of the
directories odyl lib main and meta. This allows the creation of a core version
of Camlp5 with only the OCaml compiler installed.
You can decompose the building of the Camlp5 core into:
1. make library_cold
just makes the directory 'ocaml_src/lib' and copy the cmo and cmi
files into the directory 'boot'
2. make compile_cold
makes the other directories of ocaml_src
3. make promote_cold
copies the executables "camlp5", "camlp5r" and the syntax
extensions (cmo files) into the directory 'boot'
From this point, the core Camlp5 is in directory 'boot'. The real
sources in the top directories odyl, lib, main and meta, which are
written in revised syntax with some syntax extensions (grammars,
quotations) can be compiled. To achieve their compilation, you can
do:
make core
Or to compile everything do:
make all
or just:
make
Notice that doing "make core" or "make all" from scratch (after a
make clean), automatically starts by making the core files from
their pure OCaml versions.
Fast compilation from scratch
./configure
make clean core compare
make coreboot
make all opt opt.opt
Testing changes
1. do your changes
2. do:
make core compare
if it says that the bootstrap is ok, you can do:
make all
make opt
make opt.opt
otherwise, to make sure everything is ok, first do:
make coreboot
sometimes two bootstraps ('make coreboot' twice) are necessary,
in particular if you change things in the directory 'lib'. It is
even possible that three bootstraps are necessary.
If things go wrong, it is possible to return to the previous
version by typing:
make restore clean_hot
then you can change what is necessary and continue by typing:
make core
and test the bootstrap again:
make coreboot
After several bootstraps (by 'make coreboot' or 'make bootstrap'),
many versions are pushed in the directory 'boot' (you can type 'find
boot -type d -print' to see that). If your system correctly
bootstraps, you can clean that by typing:
make cleanboot
which keeps only two versions. (The command 'make clean' also
removes these stack of versions.)
Before committing your changes
Make sure that the cold start with pure OCaml sources work. For
that, do:
make compare_sources | less
This shows you the changes that would be done in the OCaml pure sources
of the directory ocaml_src.
To make the new versions, do:
make new_sources
make promote_sources
Notice that these pure OCaml sources are not supposed to be modified
by hand, but only created by the above commands. Although their source
is pretty printed they are usually not easy to read, particularly for
expanded grammars (of the statement 'EXTEND').
If these sources do not compile, due to changes in the OCaml
compiler, it is possible however to edit them. In this case, similar changes
may need to be performed in the normal sources in revised syntax.
After doing 'make new_sources' above, and before doing 'make
promote_sources' below, it is possible to do 'make untouch_sources'
which changes the dates of the newly created files with the dates of the
old files if they are not modified. This way, the "svn commit" will not
need to compare these files, which may be important if your
network is not fast.
The 'make new_sources' builds a directory named 'ocaml_src.new'.
If this directory still exists, due to a previous 'make new_sources',
the command fails. In this case, just delete it (rm -rf ocaml_src.new)
without problem: this directory is not part of the distribution, it is
just temporary.
The 'make clean_sources' deletes old versions of ocaml_src, keeping
only the last and the before last ones.
The command:
make bootstrap_sources
is a shortcut for:
make new_sources
make untouch_sources
make promote_sources
make clean_sources
If there are changes in the specific file 'lib/versdep.ml', do
also:
make compare_all_versdep
and possibly:
make bootstrap_all_versdep
because this file, in 'ocaml_src/lib/versdep' directory has different
versions according to the OCaml version.
After having rebuilt the pure OCaml sources, check that they work by
rebuilding everything from scratch, starting with "configure".
If you change the main parser
If you change the main parser 'meta/pa_r.ml', you should check that the
quotations expanders of syntax tree 'meta/q_MLast.ml' match the new
version. For that, do:
cd meta
make compare_q_MLast
If no differences are displayed, it means that 'q_MLast.ml' is ok,
relatively to 'pa_r.ml'.
Otherwise, if the displayed differences seem reasonable, update the
version by typing:
make bootstrap_q_MLast
Then returning to the top directory, do 'make core compare' and
possibly 'make coreboot' (one of several times) to check the
correctness of the file.
And don't forget, if you want to commit, to re-create the pure OCaml
sources like indicated above.
Adding new nodes to the syntax tree
If new nodes are necessary in the syntax tree, for example because
the OCaml language added itself new nodes, the steps are the
following (with the example of adding the "lazy" pattern node).
Add the node in the file 'main/mLast.mli'. Please respect the design
of the nodes by looking at the other nodes. Example:
PaLaz of loc and patt
Try to compile (do 'make' in the main directory). You are going to
have some errors in files telling you that nodes are missing in some
pattern matchings. Add them, according to the new nodes of OCaml or
looking at other nodes.
Once the compilation is done, try a 'make bootstrap' to be sure
everything is OK.
When it is, add the possible concrete syntax in the revised syntax,
i.e. in 'meta/pa_r.ml'. Since the quotation is not yet implemented,
put it in syntax without quotation. Example:
"lazy"; p = SELF -> MLast.PaLaz loc p
Do 'make bootstrap' again.
Go to the directory 'meta' and type:
make compare_q_MLast
This command try to compare what should be the AST quotation if it
perfectly matched the syntax. If this comparison seems reasonable,
change the file 'q_MLast.ml' by typing:
make bootstrap_q_MLast
Do a 'make bootstrap' again to check everything is OK. If it is
change the line of 'meta/pa_r.ml'. In the example, from:
"lazy"; p = SELF -> MLast.PaLaz loc p
to:
"lazy"; p = SELF -> <:patt< lazy $p$ >>
The new syntax should work now in revised syntax. You can complete
the compilation, do a 'make install' and check with the toplevel that
it works. Complete with the rest like said above.
You can then complete the other syntaxes (the 'normal' syntax, for
example in 'etc/pa_o.ml') and the pretty printers.
Switching between transitional and strict mode
If Camlp5 is compiled in some mode, it is possible to change its
mode in two boostrapping steps. Type:
make MODE=T coreboot
to switch to transitional mode, or:
make MODE=S coreboot
to switch to strict mode.
After two (necessary) bootstraps, the kernel is compiled in the new
mode. Complete the compilation by:
make MODE=T all opt opt.opt
or:
make MODE=S all opt opt.opt
according to the new mode you want to use.
Another solution is, of course, recompile everything from scratch:
make clean
./configure -transitional
make world.opt
or:
make clean
./configure -strict
make world.opt
Bootstrapping
Camlp5 is bootstrapped in numerous ways.
Camlp5 executable bootstrapping
The file 'main/camlp5r' is rebuilt each time a bootstrapping command
is used (like 'make coreboot' or 'make bootstrap'). This bootstrapping
command starts with copying it in the directory 'boot'. The file
'boot/camlp5r' is used to recompile the sources, creating another
file 'main/camlp5r'. When both files are the same (byte by byte),
the Camlp5 executable is bootstrapped.
Sometimes, in particular when changes are done in the library
(directory 'lib'), it is necessary to bootstrap twice before having
the message 'Fixpoint reached, bootstrap succeeded'.
The command 'make compare' tells you whether the Camlp5 executable
is currently bootstrapped or not.
Source bootstrapping
The compilation of Camlp5 starts with the compilation of files of
the directory ocaml_src written in pure OCaml. This creates the
files 'camlp5' and 'camlp5r' in the directory boot. This is called
the 'cold start'.
Once done, the sources of Camlp5 can be compiled using revised
syntax and several syntax extensions, like the statement 'EXTEND',
for example, and the quotations of syntax trees.
The core files of Camlp5 are in the directories lib, main, meta,
odyl. There are the same directories in the directory ocaml_src
where all files are equivalent.
When changes are done in the core files, and when the printer kit
in normal syntax 'etc/pr_o.cmo' has been created, the files of the
directory ocaml_src can be rebuilt using the command 'make
bootstrap_sources'. This updates the files in ocaml_src to exactly
reflect the ones in the core, but in pure OCaml syntax.
Bootstrap: the1 files in ocaml_src creates the first Camlp5
executable. The Camlp5 executable can rebuild the files in
ocaml_src.
Source file q_MLast.ml bootstrapping
The source file meta/q_MLast.ml (quotation of syntax trees) can be
recreated using the file meta/pa_r.ml (revised syntax). When changes
are done in the file meta/pa_r.ml, a good usage is to go to the
directory 'meta' and type:
make compare_q_MLast
This shows the possible changes that will be applied to
meta/q_MLast.ml. If they seem to be reasonable, do:
make bootstrap_q_MLast
This changes the source file meta/q_MLast.ml. After this command,
a new 'make compare_q_MLast' indicates no differences.
After that, a new 'make bootstrap' in the top directory ensures that
everythings works.
Bootstrap: the file meta/pa_r.ml uses the quotation expander
meta/q_MLast.cmo. The source file meta/q_MLast.ml is recreated by
meta/pa_r.ml.
Source file q_ast.ml bootstrapping
The source file meta/q_ast.ml contains another version of the
quotation expander of syntax trees which follows the current
syntax used (in normal syntax if the current syntax is used).
This works only in strict mode.
This file depends on the definition of the syntax tree
main/mLast.mli. When changes are done in this file, it is
possible to see what changes are impacted in meta/q_ast.ml. For
this, go to the directory 'meta' and type:
make compare_q_ast
This shows the possible changes that will be applied to
meta/q_ast.ml. If they seem to be reasonable, do:
make bootstrap_q_ast
This changes the source file meta/q_ast.ml. After this command, a
new 'make compare_q_ast' indicates no differences.
After that, a new 'make bootstrap' in the top directory ensures that
everythings works.
Lisp and Scheme syntax bootstrapping
The Lisp syntax is written in Lisp syntax in the directory etc. It
is the file 'etc/pa_lisp.ml'. To compile this file, there is another
file, named 'etc/pa_lispr.ml' written in revised syntax.
When changes are done in etc/pa_lisp.ml, the file etc/pa_lispr.ml
must be rebuilt. First, go to the directory 'etc' and type:
make compare_lisp
If changes seem to be reasonable, do:
make boostrap_lisp
This rebuilds 'etc/pa_lispr.ml'. A new 'make' in the directory 'etc'
will recompile it and recompile the Lisp version 'etc/pa_lisp.ml'.
Bootstrap: etc/pa_lispr.ml allows to compile etc/pa_lisp.ml, and
changes is etc/pa_lisp.ml are reported in the source file etc/pa_lispr.ml
through 'make bootstrap_lisp'.
Same for the Scheme syntax: the files are etc/pa_scheme.ml and
'etc/pa_scheme.ml'.
EXTEND statement bootstrapping
The EXTEND statement of Camlp5 is a syntax extension. The file
'meta/pa_extend.ml' contains the statement for the adding of this
syntax extension, therefore something like:
EXTEND
expr:
[ [ "EXTEND" .....
To be compiled, the file 'meta/pa_extend.ml' needs 'pa_extend.cmo'.
This is actually its previous version in the directory 'boot'. When
checking for a correct bootstrapping of Camlp5 (with the command
'make compare', for example), a test is done to verify that the
binary files 'meta/pa_extend.cmo' and 'boot/pa_extend.cmo' are the
same.
Notice that there is also a file 'ocaml_src/meta/pa_extend.ml' in
pure OCaml syntax, but, although this file is pretty printed, is is
hardly editable, because the expansion of the 'EXTEND' statement is
a very long expression rather difficult to understand. But this file
need not to be changes, since the command 'make
bootstrap_sources' (see above) rebuilts it.
This chapter describes "pprintf", a statement to pretty print data.
It looks like the "sprintf" function of the OCaml library, and
borrows some ideas of the Format OCaml library. Another statement,
"lprintf", is a slightly different version of "pprintf" handling
with locations.
Syntax of the pprintf statement
The "pprintf" statement is added by the parsing kit
"pa_pprintf.cmo".
Notice that, in opposition to "printf", "fprintf", "sprintf", and
all its variants, which are functions, this "pprintf" is a
statement, not a function: "pprintf" is a keyword and the
expander analyzes its string format parameter to generate specific
statements. In particular, it cannot be used alone and has no type
by itself.
The "pprintf" statement converts the format string into a string
like the "sprintf" of the OCaml library "Printf" does (see the OCaml
manual for details). The string format can accept new conversion
specifications, "%p" and "%q", and some pretty printing annotations,
starting with "@" like in the OCaml library "Format".
The "pprintf" statement takes as first parameter, a value of type
"pr_context" defined below. Its second parameter is the extended
format string. It can take other parameters, depending on the
format, like "sprintf".
The result of "pprintf" is always a string. There is no versions
applying to files or buffers.
The strings built by "pprintf" are concatened by the function
"Pretty.sprintf" (see the chapter entitled
"Pretty Print") which controls the line
length and prevents overflowing.
Printing context
The "pprintf" statement takes, as first parameter, a printing
context. It is a value of the following type:
type pr_context =
{ ind : int;
bef : string;
aft : string;
dang : string };
The fields are:
"ind" : the current indendation
"bef" : what should be printed before, in the same line
"aft" : what should be printed after, in the same line
"dang" : the dangling token to know whether parentheses
are necessary
Basically, the "pprintf" statement concats the "bef" string, the
formatted string and the "aft" string. The example:
pprintf pc "hello world"
is equivalent to (and indeed generates):
Pretty.sprintf "%shello world%s" pc.bef pc.aft
But if the format string contains conversion specifications "%p" or
"%q", the "bef" and "aft" strings are actually transmitted to the
corresponding functions:
pprintf pc "hello %p world" f x
is equivalent to:
f {(pc) with
bef = Pretty.sprintf "%shello " pc.bef;
aft = Pretty.sprintf " world%s" pc.aft}
x
Thus, the decision of including the "bef" and the "aft" strings are
delayed to the called function, allowing this function to possibly
concatenate "bef" and "aft" to its own strings.
A typical case is, while printing programs, when an expression
needs to be printed between parentheses. The code which does that
looks like:
pprintf pc "(%p)" expr e
The right parenthesis of this string format is included in the
"aft" string transmitted to the function "expr". In a situation when
several right parentheses are concatened this way, the fact that all
these parentheses are grouped together allows the function which
eventually print them to decide to break the line or not, these
parentheses being taken into account in the line length.
For example, if the code contains a print of an program containing
an application whose source is:
myfunction myarg
and if the "aft" contains "))))))", the decision of printing in one
line as:
myfunction myarg))))))
or in two lines as:
myfunction
myarg))))))
is exact, the right parentheses being added to "myarg" to determine
whether the line overflows or not.
Extended format
The extended format used by "pprintf" may contain any strings and
conversion specifications allowed by the "sprintf" function (see
module "Printf" of the OCaml library), plus:
the conversion specifications: "%p" and
"q",
the pretty printing annotations introduced by, "@"
and followed by:
the character ";" (semicolon), optionally followed
by "<", two numbers and ">",
the character "" (space),
the character "[", optionally followed by the character
"<" and either:
the character "a"
the character "b"
a number
and the character ">", then followed by format
string, and ended with "@]"
The format string is applied like in the "sprintf"
function. Specific actions are done for the extended features. The
result is a string like for the "sprintf" function. The "string
before" and "string after" defined by the fields "bef" and "aft" of
the printing context are taken into account and it is not necessary
to add them in the format.
Example:
pprintf pc "hello, world"
generates:
Pretty.sprintf "%shello, world%s" pc.bef pc.aft;
An empty format:
pprintf pc "";
just prints the "before" and "after" strings:
Pretty.sprintf "%s%s" pc.bef pc.aft;
Line length
The function "pprintf" uses the Camlp5 "Pretty" module. The line
length can be set by changing the value of the reference
"Pretty.line_length".
The conversion specifications "p" and "q"
The "%p" conversion specification works like the "%a" of the printf
statement. It takes two arguments and applies the first one to the
printing context and to the second argument. The first argument must
therefore have type "pr_context -> t -> unit" (for some
type "t") and the second one "t".
Notice that this function can be called twice: one to test whether
the resulting string holds in the line, and another one to possibly
recall this function to print it in several lines. In the two cases,
the printing context given as parameter is different.
It uses the functions defined in the
"Pretty" module.
Example: the following statement:
pprintf pc "hello, %p, world" f x
is equivalent to:
f {(pc) with
bef = Pretty.sprintf "%shello, " pc.bef;
aft = Pretty.sprintf ", world%s" pc.aft}
x
The "%q" conversion specification is like "%p" except that it takes
a third argument which is the value of the "dang" field, useful when
the syntax has "dangling" problems requiring parentheses. See
chapter Extensions of printing for more
explanations about dangling problems.
The same example with "%q":
pprintf pc "hello, %q, world" f x "abc"
is equivalent to:
f {(pc) with
bef = Pretty.sprintf "%shello, " pc.bef;
aft = Pretty.sprintf ", world%s" pc.aft;
dang = "abc"}
x
The pretty printing annotations
Breaks
The pretty printing annotations allow to indicate places where
lines can be broken. They all start with the "at" sign "@". The main
ones are called breaks and are:
"@;" specifying: write a space or 'a newline and an
indentation incremented by 2 spaces'
"@ " specifying: write a space or 'a newline
and the indentation'
Example - where "pc" is a variable of type "pr_context" (for
example "Pprintf.empty_pc"):
pprintf pc "hello,@;world"
builds the string, if it holds in the line:
hello, world
and if it does not:
hello,
world
The second form:
pprintf pc "hello,@ world"
is printed the same way, if it holds in the line, and if it does
not, as:
hello,
world
The general form is:
"@;<s o>", which is a break with "s" spaces
if the string holds in the line, or an indentation offset
(incrementation of the indentation) of "o" spaces if the
string does not hold in the line.
The break "@;" is therefore equivalent to "@;<1
2>" and "@ " is equivalent to "@;<1
0>".
Parentheses
A second form of the pretty printing annotations is the
parenthesization of format strings possibly containing other pretty
printing annotations. They start with "@[" and end with
"@]".
It allows to change the associativity of the breaks. For example:
pprintf pc "@[the quick brown fox@;jumps@]@;over the lazy dog"
If the whole string holds on the line, it is printed:
the quick brown fox jumps over the lazy dog
If the whole string does not hold on the line, but "the quick brow
fox jumps" does, it is printed:
the quick brown fox jumps
over the lazy dog
If the string "the quick brown fox jumps" does not hold on the line,
the whole string is printed:
the quick brown fox
jumps
over the lazy dog
Conversely, if the code is right associated:
pprintf pc "the quick brown fox@;@[jumps@;over the lazy dog@]"
It can be printed:
the quick brown fox jumps over the lazy dog
or:
the quick brown fox
jumps over the lazy dog
or:
the quick brown fox
jumps
over the lazy dog
The default is left associativity: without parentheses, it is
printed like in the first example.
Incrementation of indentation
The open parenthesis of the parenthesized form, "@[" can
be followed by "<n>" where "n" is a number. It
increments the current indentation (for possible newlines in the
parenthesized text) with this number.
Example:
pprintf pc "@[<4>Incrementation@;actually of six characters@]"
makes the string (if not holding in the line):
Incrementation
actually of six characters
Break all or nothing
The open parenthesis of the parenthesized form, "@[" can
be followed by "<a>". It specifies that if the string
does not hold in the line, all breaks between the parentheses (at
one only level) are printed in two lines, even if sub-strings could
hold on the line. For example:
pprintf pc "@[<a>the quick brown fox@;jumps@;over the lazy dog@]"
can be printed only as:
the quick brown fox jumps over the lazy dog
or as:
the quick brown fox
jumps
over the lazy dog
Break all
The open parenthesis of the parenthesized form, "@[" can
be followed by "<b>". It specifies that all breaks are
always printed in two lines. For example:
pprintf pc "@[<b>the quick brown fox@;jumps@;over the lazy dog@]"
is printed in all circumstances:
the quick brown fox
jumps
over the lazy dog
Break all if
The open parenthesis of the parenthesized form, "@[" can
be followed by "<i>". Depending on the value of the
boolean variable of the argument list, the breaks are all printed
in two lines like with the "break all" option above, or not.
For example:
pprintf pc "%s@;@[<i>%s,@;%s@]" "good" True "morning" "everybody";
pprintf pc "%s@;@[<i>%s,@;%s@]" "good" False "morning" "everybody";
are printed:
good
morning,
everybody
good morning, everybody
Parentheses not neighbours of breaks
In the examples above, we can remark that the left parentheses are
always the begin of the string or are preceeded by a break, and that
the right parentheses are always the end of the string or followed
by a break.
When the parentheses "@[" and "@]" are not
preceeded or followed by the string begin nor end, nor preceeded or
followed by breaks, they are considered as the "bef" or "aft" part
of the neighbour string. For example, the following forms:
pprintf pc "the quick brown fox@[ jumps over@]"
and:
pprintf pc "@[the quick brown fox @]jumps over"
are respectively equivalent to:
let pc = {(pc) with aft = sprintf " jumps over%s" pc.aft} in
Pretty.sprintf "%sthe quick brown fox%s" pc.bef pc.aft
and:
let pc = {(pc) with bef = sprintf "%sthe quick brown fox" pc.bef} in
Pretty.sprintf "%sjumps over%s" pc.bef pc.aft
In these examples, the results are identical, but it can be
important if the non-parenthesized part contain one or several
"%p". In this case, the corresponding function receives the "bef" or
"aft" part in its pr_context variable and can take it into account
when printing its data.
Lprintf
"Lprintf" is like "pprintf" with the same parameters. It is equivalent
to an call to the function "expand_lprintf":
lprintf pc "..."
is equivalent to:
expand_lprintf pc loc (fun pc -< pprintf pc "...")
The function "expand_lprintf" and the variable "loc" must be
defined by the user in the environment where "lprintf" is used.
"Lprintf" is used in predefined printers "pr_r.ml" and "pr_o.ml" to
allow optional insertions of location comments in the output.
Comparison with the OCaml modules Printf and Format
Pprintf and Printf
The statement "pprintf" acts like the function
"Printf.sprintf". But since it requires this extra parameter of type
"pr_context" and needs the "%p" and "%q" conversions specifications
(which do not exist in "Printf"), it was not possible to use the
"Printf" machinery directly and a new statement had to be added.
The principle of "pprintf" and "sprintf" are the same. However,
"pprintf" is a syntax extension and has no type by itself. It cannot
be used alone or without all its required parameters.
Pprintf and Format
The pretty printing annotations look like the ones of the OCaml
module Format. Actually, they have different semantics. They do not
use boxes like "Format" does. In "pprintf" statement, the
machinery acts only on indentations.
Notice that, with "pprintf", it is always possible to know the
current indentation (it is the field "ind" of the "pr_context"
variable) and it is therefore possible to take decisions before
printing.
For example, it is possible, in a printer of OCaml statements, to
decide to print all match cases symmetrically, i.e. all with one
line for each case or all with newlines after the patterns.
It is what is done in the option "-flag E" added by the
pretty printing kits "pr_r.cmo" (pretty print in revised syntax) and
"pr_o.cmo" (pretty print in normal syntax). See
chapter Commands and Files or type
"camlp5 pr_r.cmo -help" or "camlp5 pr_o.cmo
-help".
Another difference is that the internal behaviour of this printing
system is accessible, and it is always possible to use the basic
functions of the "Pretty" module ("horiz_vertic" and "sprintf") if
the behaviour of "pprintf" is not what is desired by the
programmer.
Relation with the Camlp5 extensible printers
The extensible printers of Camlp5 (see its
corresponding chapter) use the type
"pr_context" of "pprintf". It is therefore possible to use
"pprintf" in the semantic actions of the extensible printers. But
it is not mandatory. An extensible printer can just use the "Pretty"
module or even neither "pprintf" nor "Pretty".
The printing kits "pr_r.ml" and "pr_o.ml"
(respectively pretty print in revised and in normal syntax) and some
other related to them, are examples of usage of the "pprintf"
statement.
Purely functional parsers are an alternative
of stream parsers where the used stream
type is a lazy non-destructive type. These streams are lazy values,
as in classical stream parsers, but the values are not removed as
long as the parsing advances.
To make them work, the parsers of purely functional streams return,
not the simple values, but a value of type option :
"None" meaning "no match" (the equivalent of the
exception "Parse.Failure" of normal streams) and
"Some (r, s)" meaning "the result is r and the
remaining stream is s".
Syntax
The syntax of purely functional parsers, when loading
"pa_fstream.cmo", is the following:
Notice that, unlike classical parsers, there is no difference, in a
stream pattern, between the first stream pattern component and the
other ones. In particular, there is no "question mark" syntax and
expression optionnally ending those components. Moreover, the
"lookahead" case is not necessary, we see further why. The syntaxes
"pattern when" and "let..in" inside stream patterns we see in
classical parsers are not implemented.
Streams
The functional parsers are functions taking as parameters
functional streams, which are values of type "Fstream.t
a" for some type "a". It is possible to build
functional streams using the functions defined in the module
"Fstream":
Fstream.from
"Fstream.from f" returns a stream built from the
function "f". To create a new stream element, the
function "f" is called with the current stream count,
starting with zero. The user function "f" must return
either "Some <value>" for a value or
"None" to specify the end of the stream.
Fstream.of_list
Return a stream built from the list in the same order.
Fstream.of_string
Return a stream of the characters of the string parameter.
Fstream.of_channel
Return a stream of the characters read from the input channel
parameter.
Semantics of parsers
Fparser
The purely functional parsers act like classical parsers, with a
recursive descent algorithm, except that:
If the first stream pattern component matches the beginning of
the stream, there is no error if the following stream patterns
components do not match: the control simply passes to the next
parser case with the initial stream.
If the semantic actions are of type "t", the result
of the parser is of type "option (t * Fstream.t)",
not just "t" like in classical parsers. If a stream
pattern matches, the semantic action is evaluated, giving some
result "e" and the result of the parser is
"Some (e, strm)" where "strm" is the
remaining stream.
If no parser case matches, the result of the parser is
"None".
Error position
A difficulty, with purely functional parsers, is how to find the
position of the syntax error, when the input is wrong. Since the
system tries all parsers cases before returning "None",
and that the initial stream is not affected, it is not possible to
directly find where the error happened. This is a problem for
parsing using backtracking (here, it is limited backtracking, but
the problem is the same).
The solution is to use the function
"Fstream.count_unfrozen" applied to the initial
stream. Like its name says, it returns the number of unfrozen
elements of the stream, which is exactly the longest match found. If
the input is a stream of characters, the return of this function is
exactly the position in number of characters from the beginning of
the stream.
However, it is not possible to know directly which rule failed and
therefore it is not possible, as in classical parsers, to specify
and get clear error messages. Future versions of purely functional
parsers may propose solutions to resolve this problem.
Notice that, if using the "count_unfrozen" method, it is
not possible to reuse that same stream to call another parser, and
hope to get the right position of the error, if another error
happens, since it may test less terminals than the first parser. Use
a fresh stream in this case, if possible.
The location is a concept often used in Camlp5, bound to where
errors occur in the source. The basic type is "Ploc.t"
which is an abstract type.
Definitions
Internally a location is a pair of source positions: the
beginning and the end of an element in the source (file or
interactive). A located element can be a character (the end is just
the beginning plus one), a token, or a longer sequence generally
corresponding to a grammar rule.
A position is a count of characters since the beginning of
the file, starting at zero. When a couple of positions define a
location, the first position is the position of the first character
of the element, and the last position is the first
character not part of the element. The location length is
the difference between those two numbers. Notice that the position
corresponds exactly to the character count in the streams of
characters.
In the extensible grammars, a variable
with the specific name "loc" is predefined in all semantic
actions: it is the location of the associated rule. Since
the syntax tree quotations generate nodes
with "loc" as location part, this allow to generate
grammars without having to consider source locations.
It is possible to change the name "loc" to another name,
through the parameter "-loc" of the Camlp5 commands.
Remark: the reason why the type "location" is abstract is
that in future versions, it may contain other informations, such as
the associated comments, the type (for expressions nodes), things
like that, without having to change the already written
programs.
Building locations
Tools are provided in the module "Ploc" to manage
locations.
First, "Ploc.dummy" is a dummy location used when the
element does not correspond to any source, or if the programmer does
not want to worry about locations.
The function "Ploc.make" builds a location from three
parameters:
the line number, starting at 1
the position of the first column of the line
a couple of positions of the location: the first one belonging
to the given line, the second one being able to belong to another
line, further.
If the line number is not known, it is possible to use the function
"Ploc.make_unlined" taking only the couple of positions of
the location. In this case, error messages may indicate the first
line and a big count of characters from this line (actually from the
beginning of the file). With a good text editor, it is possible, to
find the good location, anyway.
If the location is built with "Ploc.make_unlined", and if
your program displays a source location itself, it is possible to
use the function "Ploc.from_file" which takes the file
name and the location as parameters and return, by reading that
file, the line number, and the character positions of the
location.
Raising with a location
The function "Ploc.raise" allows one to raise an exception
together with a location. All exceptions raised in
the extensible grammars use
"Ploc.raise". The raised exception is "Ploc.Exc"
with two parameters: the location and the exception itself.
Notice that "Ploc.raise" just reraises the exception
if it is already the exception "Ploc.Exc", ignoring then
the new given location.
A paradigm to print exceptions possibly enclosed by
"Ploc.Exc" is to write the "try..with" statement
like this:
try ... with exn ->
let exn =
match exn with
[ Ploc.Exc loc exn -> do { ... print the location ...; exn }
| _ -> exn ]
in
match exn with
...print the exception which is *not* located...
Other functions
Some other functions are provided:
Ploc.first_pos
returns the first position (an integer) of the location.
Ploc.last_pos
returns the last position (an integer) of the location (position
of the first character not belonging to the element.
Ploc.line_nb
returns the line number of the location or -1 if the
location does not contain a line number (i.e. built by
"Ploc.make_unlined").
Ploc.bol_pos
returns the position of the beginning of the line of the
location. It is zero if the location does not contain a line
number (i.e. built by "Ploc.make_unlined").
And still other ones used in Camlp5 sources:
Ploc.encl
"Ploc.encl loc1 loc2" returns the location starting at
the smallest begin of "loc1" and "loc2" and
ending at their greatest end.. In simple words, it is the location
enclosing "loc1" and "loc2" and all what is
between them.
Ploc.shift
"Ploc.shift sh loc" returns the location "loc"
shifted with "sh" characters. The line number is not
recomputed.
Ploc.sub
"Ploc.sub loc sh len" is the location "loc"
shifted with "sh" characters and with length
"len". The previous ending position of the location is
lost.
"Ploc.after"
"Ploc.after loc sh len" is the location just after
"loc" (i.e. starting at the end position of
"loc"), shifted with "sh" characters, and of
length "len".
We describe here the syntax and the semantics of the parsers of
streams of Camlp5. Streams are kinds of lazy lists. The parsers of
these streams use recursive descendent method without backtracking,
which is the most natural one in functional languages. In
particular, parsers are normal functions.
Notice that the parsers have existed in OCaml since many years (the
beginning of the 90ies), but some new features have been added in
2007 (lookahead, "no error" optimization, let..in statement and left
factorization) in Camlp5 distribution. This chapter describes them
also.
Introduction
Parsers apply to values of type "Stream.t" defined in the module
"Stream" of the standard library of OCaml. Like the type "list", the
type "Stream.t" has a type parameter, indicating the type of its
elements. They differ from the lists that they are lazy (the
elements are evaluated as long as the parser need them for its
actions), and imperative (parsers deletes their first elements when
they take their parsing decisions): notice that purely functional
parsers exist in Camlp5, where the corresponding streams are lazy
and functional, the analyzed elements remaining in the initial
stream and the semantic action returning the resulting stream
together with the normal result, which allow natural limited
backtrack but have the drawback that it is not easy to find the
position of parsing errors when they happen.
Parsers of lazy+imperative streams, which are described here, use a
method named "recursive descendent": they look at the first element,
they decide what to do in function of its value, and continue the
parsing with the remaining elements. Parsers can call other parsers,
and can be recursive, like normal functions.
Actually, parsers are just pure syntactic sugar. When writing a
parser in the syntax of the parser, Camlp5 transforms them into
normal call to functions, use of patterns matchings and try..with
statements. The pretty printer of Camlp5, by default, displays this
expanded result, without syntax of parsers. A pretty printing kit,
when added, can rebuild the parsers in their initial syntax and
display it.
Syntax
The syntax of the parsers, when loading "pa_rp.cmo" (or already
included in the command "camlp5r"), is the following:
The parsers are functions taking streams as parameter. Streams are
are values of type "Stream.t a" for some type
"a". It is possible to build streams using the
functions defined in the module "Stream":
Stream.from
"Stream.from f" returns a stream built from the
function "f". To create a new stream element, the
function "f" is called with the current stream count,
starting with zero. The user function "f" must return
either "Some <value>" for a value or
"None" to specify the end of the stream.
Stream.of_list
Return a stream built from the list in the same order.
Stream.of_string
Return a stream of the characters of the string parameter.
Stream.of_channel
Return a stream of the characters read from the input channel
parameter.
Semantics of parsers
Parser
A parser, defined with the syntax "parser" above, is of type
"Stream.t a -> b" where "a" is the type of the elements
of the streams and "b" the type of the result. The parser cases are
tested in the order they are defined until one of them applies. The
result is the semantic action of the parser case which applies. If
no parser case applies, the exception "Stream.Failure"
is raised.
When testing a parser case, if the first stream pattern component
matches, all remaining stream pattern components of the stream
pattern must match also. If one does not match, the parser raises
the exception "Stream.Error" which has a parameter of
type string: by default, this string is the empty string, but if the
stream pattern component which does not match is followed by a
question mark and an expression, this expression is evaluated and
given as parameter to "Stream.Error".
In short, a parser can return with three ways:
A normal result, of type "b" for a parser of type
"Stream.t a -> b".
Raising the exception "Stream.Failure".
Raising the exception "Stream.Error".
Fundamentally, the exception "Stream.Failure" means
"this parser does not apply and no element have been removed from
the initial stream". This is a normal case when parsing: the parser
locally fails, but the parsing can continue.
Conversely, the exception "Stream.Error" means that
"this parser encountered a syntax error and elements have probably
been removed from the stream". In this case, there is no way to
recover the parsing, and it definitively fails.
Left factorization
In parsers, consecutive rules starting with the same
components are left factorized. It means that they are transformed
into one only rule starting with the common path, and continuing
with a call to a parser separating the two cases. The order is
kept, except that the possible empty rule is inserted at the
end.
only the first component, "If" is factorized, the second
one being different because of different patterns ("e1" and
"e4").
Match with parser
The syntax "match expression with parser" allows to match a stream
against a parser. It is, for "parser", the equivalent of "match
expression with" for "fun". The same way we could say:
match expression with ...
could be considered as an equivalent to:
(fun ...) expression
we could consider that:
match expression with parser ...
is an equivalent to:
(parser ...) expression
Error messages
A "Stream.Error" exception is raised when a stream
pattern component does not match and that it is not the first one of
the parser case. This exception has a parameter of type string,
useful to specify the error message. By default, this is the empty
string. To specify an error message, add a question mark and an
expression after the stream pattern component. A typical error
message is "that stream pattern component expected". Example with
the parser of "if..then..else.." above:
parser
[: `If; e1 = expr ? "expression expected after 'if'";
`Then ? "'then' expected";
e2 = expr ? "expression expected after 'then'";
a =
parser
[ [: `Else; e3 = expr ? "expression expected" :] -> f e1 e2 e3
| [: :] -> g e1 e2 ] :] -> a
Notice that the expression after the question mark is evaluated
only in case of syntax error. Therefore, it can be a complicated
call to a complicated function without slowing down the normal
parsing.
Stream pattern component
In a stream pattern (starting with "[:" and ending
with ":]"), the stream pattern components are separated
with the semicolon character. There are three cases of stream
pattern components with some sub-cases for some of them, and an
extra syntax can be used with a "let..in" construction. The three
cases are:
A direct test of one or several stream elements
(called terminal symbol), in three ways:
The character "backquote" followed by a pattern, meaning: if
the stream starts with an element which is matched by this
pattern, the stream pattern component matches, and the stream
element is removed from the stream.
The character "backquote" followed by a pattern, the keyword
"when" and an expression of type "bool", meaning:
if the stream starts with an element which is matched by this
pattern and if the evaluation of the expression is
"True", the stream pattern component matches, and
the first element of the stream is removed.
The character "question mark" followed by the character
"equal" and a lookahead expression (see further), meaning: if
the lookahead applies, the stream pattern component
matches. The lookahead may unfreeze one or several elements on
the stream, but does not remove them.
A pattern followed by the "equal" sign and an expression of type
"Stream.t x -> y" for some types "x" and
"y". This expression is called a non terminal
symbol. It means: call the expression (which is a parser) with the
current stream. If this sub-parser:
Returns an element, the pattern is bound to this result and
the next stream pattern component is tested.
Raises the exception "Stream.Failure", there
are two cases:
if the stream pattern component is the first one of the
stream case, the current parser also fails with the
exception "Stream.Failure".
if the stream pattern component is not the first one of
the stream case, the current parser fails with the
exception "Stream.Error".
In this second case:
If the stream pattern component is followed by a
"question mark" and an expression (which must be of type
"string"), the expression is evaluated and
given as parameter of the exception
"Stream.Error".
If the expression is followed by an "exclamation mark",
the test and conversion from "Stream.Failure"
to "Stream.Error" is not done, and the parser
just raises "Stream.Failure" again. This is
an optimization which must be assumed by the programmer,
in general when he knows that the sub-parser called never
raises "Stream.Failure" (for example if the
called parser ends with a parser case containing an empty
stream pattern). See "no error optionization" below.
Otherwise the exception parameter is the empty string.
A pattern, which is bound to the current stream.
Notice that patterns are bound immediately and can be used in the
next stream pattern component.
Let statement
Between stream pattern components, it is possible to use the
"let..in" construction. This is not considered as a real stream
pattern component, in the fact that is is not tested against the
exception "Stream.Failure" it may raise. It can be
useful for intermediate computation. In particular, it is used
internally by the lexers (see chapter
about lexers as character stream
parsers).
Example of use, when an expression have to be used several times
(in the example, "d a", which is bound to the variable
"c"):
parser
[: a = b;
let c = d a in
e =
parser
[ [: f = g :] -> h c
| [: :] -> c ] :] -> e
Lookahead
The lookahead feature allows to look at several terminals in the
stream without removing them, in order to take decisions when more
than one terminal is necessary.
For example, when parsing the normal syntax of the OCaml language,
there is a problem, in recursing descendent parsing, for the cases
where to treat and differentiate the following inputs:
(-x+1)
(-)
The first case is treated in a rule, telling: "a left parenthesis,
followed by an expression, and a right parenthesis". The second one
is "a left parenthesis, an operator, a right
parenthesis". Programming it like this (left factorizing the first
parenthesis):
parser
[: `Lparen;
e =
parser
[ [: e = expr; `Rparen :] -> e
| [: `Minus; `Rparen :] -> minus_op ] :] -> e
does not work if the input is "(-)" because the rule
"e = expr" accepts the minus sign as expression start,
removing it from the input stream and fails as parsing error, while
encountering the right parenthesis.
Conversely, writing it this way:
parser
[: `Lparen;
e =
parser
[ [: `Minus; `Rparen :] -> minus_op
| [: e = expr; `Rparen :] -> e ] :] -> e
does not help, because if the input is "(-x+1)" the
rule above starting with "`Minus" is accepted and the
exception "Stream.Error" is raised while encountering
the variable "x" since a right parenthesis is
expected.
In general, this kind of situation is best resolved by a left
factorization of the parser cases (see the section "Semantics"
above), but that is not possible in this case. The solution is to
test whether the character after the minus sign is a right
parenthesis:
parser
[: `Lparen;
e =
parser
[ [: ?= [ _ Rparen ]; `Minus; `Rparen :] -> minus_op
| [: e = expr; `Rparen :] -> e ] :] -> e
It is possible to put several lists of patterns separated by a
vertical bar in the lookahead construction, but with a limitation
(due to the implementation): all lists of patterns must have the
same number of elements.
No error optimization
The "no error optimization" is the fact to end a stream pattern
component of kind "non-terminal" ("pattern" "equal" "expression") by
the character "exclamation mark". Like said above, this inhibits the
transformation of the exception "Stream.Failure",
possibly raised by the called parser, into the exception
"Stream.Error".
The code:
parser [: a = b; c = d ! :] -> e
is equivalent to:
parser [: a = b; s :] -> let c = d s in e
One interest of the first syntax is that it shows to readers that
"d" is indeed a syntactic sub-parser. In the second
syntax, it is called in the semantic action, which makes the parser
case not so clear, as far as readability is concerned.
If the stream pattern component is at end of the stream pattern,
this allow possible tail recursion by the OCaml compiler, in the
following case:
parser [: a = b; c = d ! :] -> c
since it is equivalent (with the fact that "c" is at
the same time the pattern of the last case and the expression of the
parser case semantic action) to:
parser [: a = b; s :] -> d s
The call to "d s" can be a tail recursive
call. Without the use of the "exclamation mark" in the rule, the
equivalent code is:
parser [: a = b; s :] ->
try d s with [ Stream.Failure -> raise (Stream.Error "") ]
which is not tail recursive (due to the "try..with" construction
pushes a context), preventing the compiler to optimize its
code. This can be important when many recursive calls happen, since
it can overflow the OCaml stack.
Position
The optional "pattern" before and after a stream pattern is bound
to the current stream count. Indeed, streams internally contain a
count of their elements. At the beginning the count is zero. When an
element is removed, the count is incremented. The example:
parser [: a = b :] ep -> c
is equivalent to:
parser [: a = b; s :] -> let ep = Stream.count s in c
There is no direct syntax equivalent to the optional pattern at
beginning of the stream pattern:
parser bp [: a = b :] -> c
These optional patterns allow disposal of the stream count at the
beginning and at the end of the parser case, allowing to compute
locations of the rule in the source. In particular, if the stream is
a stream of characters, these counts are the source location in
number of characters.
Semantic action
In a parser case, after the stream pattern, there is an "arrow" and
an expression, called the "semantic action". If the parser case is
matched the parser returns with the evaluated expression whose
environment contains all values bound in the stream pattern.
Remarks
Simplicity vs Associativity
This parsing technology has the advantage of simplicity of use and
understanding, but it does not treat the associativity of
operators. For example, if you write a parser like this (to compute
arithmetic expressions):
value rec expr =
parser
[ [: e1 = expr; `'+'; e2 = expr :] -> e1 + e2
| [: `('0'..'9' as c) :] -> Char.code c - Char.code '0' ]
this would loop endlessly, exactly as if you wrote code starting
with:
value rec expr e =
let e1 = expr e in
...
One solution is to treat the associativity "by hand": by reading a
sub-expression, then looping with a parser which parses the operator
and another sub-expression, and so on.
An alternative solution is to write parsing "combinators". Indeed,
parsers being normal functions, it is possible to make a function
which takes a parser as parameter and returning a parser using
it. For example, left and right associativity parsing
combinators:
value rec left_assoc op elem =
let rec op_elem x =
parser
[ [: t = op; y = elem; r = op_elem (t x y) :] -> r
| [: :] -> x ]
in
parser [: x = elem; r = op_elem x :] -> r
;
value rec right_assoc op elem =
let rec op_elem x =
parser
[ [: t = op; y = elem; r = op_elem y :] -> t x r
| [: :] -> x ]
in
parser [: x = elem; r = op_elem x :] -> r
;
which can be used, e.g. like this:
value expr =
List.fold_right (fun op elem -> op elem)
[left_assoc (parser [: `'+' :] -> fun x y -> x +. y);
left_assoc (parser [: `'*' :] -> fun x y -> x *. y);
right_assoc (parser [: `'^' :] -> fun x y -> x ** y)]
(parser [: `('0'..'9' as c) :] -> float (Char.code c - Char.code '0'))
;
and tested, e.g. in the toplevel, like that:
expr (Stream.of_string "2^3^2+1");
The same way, it is possible to parse non-context free grammars, by
programming parsers returning other parsers.
A third solution, to resolve the problem of associativity, is to
use the grammars of Camlp5, which have the other advantage that they
are extensible.
Lexing vs Parsing
In general, while analyzing a language, there are two levels:
The level where the input, considered as a stream of characters,
is read to make a stream of tokens (for example "words", if it is
a human language, or punctuation). This level is generally called
"lexing".
The level where the input is a stream of tokens where grammar
rules are parsed. This level is generally called "parsing".
The "parser" construction described here can be used for both,
thanks to the polymorphism of OCaml:
The lexing level is a "parser" of streams of characters
returning tokens.
The parsing level is a "parser" of streams of tokens returning
syntax trees.
By comparison, the programs "lex" and "yacc" use two different
technologies. With "parser"s, it is possible to use the same one for
both.
Lexer syntax vs Parser syntax
For "lexers", i.e. for the specific case of parsers when the input
is a stream of characters, it is possible to use a shorter
syntax. See the chapter on lexers. They
have another syntax, shorter and adapted for the specific type
"char". But they still are internally parsers of
streams with the same semantics.
Purely functional parsers
This system of parsers is imperative: while parsing, the stream
advances and the already parsed terminals disappear from the stream
structure. This is useful because it is not necessary to return the
remaining stream together with the normal result. This is the reason
there is this "Stream.Error" exception: when it is raised,
it means that some terminals have been consummed from the stream,
which are definitively lost, and therefore that are no more possible
parser cases to try.
An alternative is to use functional
parsers which use a new stream type, lazy but not
destructive. Their advantage is that they use a limited backtrack:
the case of "if..then..else.." and the shorter "if..then.." work
without having to left factorize the parser cases, and there is no
need to lookahead. They have no equivalent to the exception
"Stream.Error": when all cases are tested, and have
failed, the parsers return the value "None". The
drawback is that, when a parsing error happens, it is not easily
possible to know the location of the error in the input, as the
initial stream has not been modified: the system would indicate a
failure at the first character of the first line: this is a general
drawback of backtracking parsers. See the solutions found to this
problem in the chapter about purely
functional parsers.
A second alternative is to use the
backtracking parsers. They use the same
stream type as the functional parsers, but they test more cases than
them. They have the same advantages and drawbacks than the functional
parsers.
All modules defined in "gramlib.cma", but not including
all Camlp5 modules used by the Camlp5 commands and kits.
Ploc module
Building and combining locations. This module also contains some
pervasive types and functions.
type t = 'abstract;
Location type.
located exceptions
exception Exc of location and exn;
"Ploc.Exc loc e" is an encapsulation of the exception
"e" with the input location "loc". To be used to
specify a location for an error. This exception must not be raised
by the OCaml function "raise", but rather by
"Ploc.raise" (see below), to prevent the risk of several
encapsulations of "Ploc.Exc".
value raise : t -> exn -> 'a;
"Ploc.raise loc e", if "e" is already the
exception "Ploc.Exc", re-raise it (ignoring the new
location "loc"), else raise the exception
"Ploc.Exc loc e".
making locations
value make_loc : string -> int -> int -> (int * int) -> string -> t;
"Ploc.make_loc fname line_nb bol_pos (bp, ep) comm"
creates a location starting at line number "line_nb",
where the position of the beginning of the line is
"bol_pos" and between the positions "bp"
(included) and "ep" excluded. And "comm" is the
comment before the location. The positions are in number of
characters since the begin of the stream.
value make_unlined : (int * int) -> t;
"Ploc.make_unlined" is like "Ploc.make" except
that the line number is not provided (to be used e.g. when the
line number is unknown).
value dummy : t;
"Ploc.dummy" is a dummy location, used in situations
when location has no meaning.
getting location info
value file_name : t -> string;
"Ploc.file_name loc" returns the file name of the
location.
value first_pos : t -> int;
"Ploc.first_pos loc" returns the initial position
of the location in number of characters since the beginning of the
stream.
value last_pos : t -> int;
"Ploc.last_pos loc" returns the final position plus one
of the location in number of characters since the
beginning of the stream.
value line_nb : t -> int;
"Ploc.line_nb loc" returns the line number of the
location or "-1" if the location does not contain a line
number (i.e. built with "Ploc.make_unlined" above).
value bol_pos : t -> int;
"Ploc.bol_pos loc" returns the position of the
beginning of the line of the location in number of characters
since the beginning of the stream, or "0" if the location
does not contain a line number (i.e. built the with
"Ploc.make_unlined" above).
value comment : t -> string;
"Ploc.comment loc" returns the comment before the
location.
combining locations
value encl : t -> t -> t;
"Ploc.encl loc1 loc2" returns the location starting at
the smallest start and ending at the greatest end of the locations
"loc1" and "loc2". In other words, it is the
location enclosing "loc1" and "loc2".
value shift : int -> t -> t;
"Ploc.shift sh loc" returns the location "loc"
shifted with "sh" characters. The line number is not
recomputed.
value sub : t -> int -> int -> t;
"Ploc.sub loc sh len" is the location "loc"
shifted with "sh" characters and with length
"len". The previous ending position of the location is
lost.
value after : t -> int -> int -> t;
"Ploc.after loc sh len" is the location just after loc
(starting at the end position of "loc") shifted with
"sh" characters and of length "len".
value with_comment : t -> string -> t;
Change the comment part of the given location
miscellaneous
value name : ref string;
"Ploc.name.val" is the name of the location variable
used in grammars and in the predefined quotations for OCaml syntax
trees. Default: ""loc"".
value get : string -> t -> (int * int * int * int * int);
"Ploc.get fname loc" returns in order: 1/ the line
number of the begin of the location, 2/ its column, 3/ the line
number of the first character not in the location, 4/ its
column and 5/ the length of the location. The parameter
"fname" is the file where the location occurs.
value from_file : string -> t -> (string * int * int * int);
"Ploc.from_file fname loc" reads the file
"fname" up to the location "loc" and returns the
real input file, the line number and the characters location in
the line; the real input file can be different from
"fname" because of possibility of line directives
typically generated by /lib/cpp.
pervasives
type vala 'a =
[ VaAnt of string
| VaVal of 'a ]
;
Encloser of many abstract syntax tree notes types, in
"strict" mode. This allow the system of antiquotations of
abstract syntax tree quotations to work when using the quotation
kit "q_ast.cmo".
value call_with : ref 'a -> 'a -> ('b -> 'c) -> 'b -> 'c;
"Ploc.call_with r v f a" sets the reference
"r" to the value "v", then calls "f a",
and resets "r" to its initial value. If "f a"
raises an exception, its initial value is also reset and the
exception is reraised. The result is the result of "f
a".
Plexing module
Lexing for Camlp5 grammars.
This module defines the Camlp5 lexer type to be used in extensible
grammars (see module "Grammar"). It also provides some
useful functions to create lexers.
type pattern = (string * string);
Type for values used by the generated code of the EXTEND
statement to represent terminals in entry rules.
The first string is the constructor name (must start with an
uppercase character). When empty, the second string should
be a keyword.
The second string is the constructor parameter. Empty if it
has no parameter (corresponding to the 'wildcard' pattern).
The way tokens patterns are interpreted to parse tokens is
done by the lexer, function "tok_match" below.
The type for lexers compatible with Camlp5 grammars. The parameter
type "'te" is the type of the tokens.
The field "tok_func" is the main lexer
function. See "lexer_func" type below.
The field "tok_using" is a function called by the
"EXTEND" statement to warn the lexer that a rule uses
this pattern (given as parameter). This allow the lexer 1/ to
check that the pattern constructor is really among its
possible constructors 2/ to enter the keywords in its
tables.
The field "tok_removing" is a function possibly
called by the "DELETE_RULE" statement to warn the
lexer that this pattern (given as parameter) is no longer used
in the grammar (the grammar system maintains a number of usages
of all patterns and calls this function when this number falls
to zero). If it is a keyword, this allows the lexer to remove
it in its tables.
The field "tok_match" is a function called by the
Camlp5 grammar system to ask the lexer how the input tokens
should be matched against the patterns. Warning: for
efficiency, this function must be written as a function
taking patterns as parameters and, for each pattern value,
returning a function matching a token, not as a
function with two parameters.
The field "tok_text" is a function called by the
grammar system to get the name of the tokens for the error
messages, in case of syntax error, or for the displaying of
the rules of an entry.
The field "tok_comm" is a mutable place where the
lexer can put the locations of the comments, if its initial
value is not "None". If it is "None",
nothing has to be done by the lexer.
The type of a lexer function (field "tok_func" of the
type "lexer"). The character stream is the input stream
to be lexed. The result is a pair of a token stream and a location
function (see below) for this tokens stream.
and location_function = int -> Ploc.t;
The type of a function giving the location of a token in the
source from the token number in the stream (starting from
zero).
value lexer_text : pattern -> string;
A simple "tok_text" function.
value default_match : pattern -> (string * string) ->
string;
A simple "tok_match" function, appling to the token
type "(string * string)".
lexers from parsers or ocamllex
The functions below create lexer functions either from a "char
stream" parser or for an "ocamllex" function. With
the returned function "f", it is possible to get a simple
lexer (of the type "Plexing.lexer" above):
Note that a better "tok_using" function would check the
used tokens and raise "Plexing.Error" for incorrect
ones. The other functions "tok_removing",
"tok_match" and "tok_text" may have other
implementations as well.
value lexer_func_of_parser :
((Stream.t char * ref int * ref int) -> ('te * Ploc.t)) -> lexer_func 'te;
A lexer function from a lexer written as a char stream parser
returning the next token and its location. The two references
with the char stream contain the current line number and the
position of the beginning of the current line.
value lexer_func_of_ocamllex : (Lexing.lexbuf -> 'te) -> lexer_func 'te;
A lexer function from a lexer created by "ocamllex".
function to build a stream and a location function
Convert a char or a string token, where the backslashes are not
been interpreted into a real char or string; raise
"Failure" if a bad backslash sequence is found;
"Plexing.eval_char (Char.escaped c)" returns "c"
and "Plexing.eval_string (String.escaped s)"
returns s.
value restore_lexing_info : ref (option (int * int));
value line_nb : ref (ref int);
value bol_pos : ref (ref int);
Special variables used to reinitialize line numbers and position
of beginning of line with their correct current values when a parser
is called several times with the same character stream. Necessary
for directives (e.g. #load or #use) which interrupt the parsing.
Without usage of these variables, locations after the directives
can be wrong.
backward compatibilities
Deprecated since version 4.08.
type location = Ploc.t;
value make_loc : (int * int) -> location;
value dummy_loc : location;
Plexer module
This module contains a lexer used for OCaml syntax (revised and
normal).
lexer
value gmake : unit -> Plexing.lexer (string * string);
"gmake ()" returns a lexer compatible with the
extensible grammars. The returned tokens follow the normal syntax
and the revised syntax lexing rules.
The token type is "(string * string)" just like the pattern
type.
The meaning of the tokens are:
("", s) is the keyword s,
("LIDENT", s) is the ident s starting with a
lowercase letter,
("UIDENT", s) is the ident s starting with an
uppercase letter,
("INT", s) is an integer constant whose string source
is s,
("INT_l", s) is an 32 bits integer constant whose
string source is s,
("INT_L", s) is an 64 bits integer constant whose
string source is s,
("INT_n", s) is an native integer constant whose string
source is s,
("FLOAT", s) is a float constant whose string source is
s,
("STRING", s) is the string constant s,
("CHAR", s) is the character constant s,
("TILDEIDENT", s) is the tilde character "~"
followed by the ident s,
("TILDEIDENTCOLON", s) is the tilde character
"~" followed by the ident s and a colon
":",
("QUESTIONIDENT", s) is the question mark "?"
followed by the ident s,
("QUESTIONIDENTCOLON", s) is the question mark
"?" followed by the ident s and a colon
":",
("QUOTATION", "t:s") is a quotation "t"
holding the string s,
("ANTIQUOT", "t:s") is an antiquotation "t"
holding the string s,
("EOI", "") is the end of input.
The associated token patterns in the EXTEND statement hold the same
names as the first string (constructor name) of the tokens
expressions above.
Warning: the string associated with the "STRING"
constructor is the string found in the source without any
interpretation. In particular, the backslashes are not
interpreted. For example, if the input is "\n" the string
is *not* a string with one element containing the "newline"
character, but a string of two elements: the backslash and the
"n" letter.
Same thing for the string associated with the "CHAR"
constructor.
The functions "Plexing.eval_string" and
"Plexing.eval_char" allow to convert them into the real
corresponding string or char value.
flags
value dollar_for_antiquotation : ref bool;
When True (default), the next call to "Plexer.gmake ()"
returns a lexer where the dollar sign is used for antiquotations.
If False, there is no antiquotations and the dollar sign can be
used as normal token.
value specific_space_dot : ref bool;
When "False" (default), the next call to
"Plexer.gmake ()" returns a lexer where there is no
difference between dots which have spaces before and dots which
don't have spaces before. If "True", dots which have
spaces before return the keyword " ." (space dot) and the
ones which don't have spaces before return the
keyword "." (dot alone).
value no_quotations : ref bool;
When "True", all lexers built by "Plexer.gmake
()" do not lex the quotation syntax. Default is
"False" (quotations are lexed).
value utf8_lexing : ref bool;
When "True", all lexers built by "Plexer.gmake
()]" use utf-8 encoding to specify letters and punctuation
marks. Default is False (all characters between '\128' and
'\255' are considered as letters).
Gramext module
This module is not intended to be used by the casual programmer.
It shows, in clear, the implementations of grammars and entries
types, the normal access being through the "Grammar" module
where these types are abstract. It can be useful for programmers
interested in scanning the contents of grammars and entries, for
example to make analyses on them.
The visible type of grammars, i.e. the implementation of the abstract
type "Grammar.g". It is also the implementation of an internal
grammar type used in the Grammar functorial interface.
The type parameter "'te" is the type of the tokens, which
is "(string * string)" for grammars built with
"Grammar.gcreate", and any type for grammars built with the
functorial interface. The field "gtokens" records the
count of usages of each token pattern, allowing to call the lexer
function "tok_removing" (see the
Plexing module) when this count
reaches zero. The field "lexer" is the lexer.
The visible type for grammar entries, i.e. the implementation of
the abstract type "Grammar.Entry.e" and the type of
entries in the Grammar functorial interface. Notice that these
entry types have a type parameter which does not appear in the
"g_entry" type (the "'te" parameter is, as for
grammars above, the type of the tokens). This is due to the
specific typing system of the EXTEND statement which sometimes
must hide real types, the OCaml normal type system not being
able to type Camlp5 grammars.
Meaning of the fields:
egram : the associated grammar
ename : the entry name
elocal : True if the entry is local (local entries
are written with a star character "*" by Grammar.Entry.print)
estart and econtinue are parsers of the
entry used in
the grammar
machinery
edesc : the entry description (see below)
and g_desc 'te =
[ Dlevels of list (g_level 'te)
| Dparser of Stream.t 'te -> Obj.t ]
The entry description.
The constructor "Dlevels" is for entries built by
"Grammar.Entry.create" and extendable by the EXTEND
statement.
The constructor "Dparser" is for entries built by
"Grammar.Entry.of_parser".
lsuffix : the tree composed of the rules starting with
"SELF"
lprefix : the tree composed of the rules not
starting with "SELF"
and g_symbol 'te =
[ Smeta of string and list (g_symbol 'te) and Obj.t
| Snterm of g_entry 'te
| Snterml of g_entry 'te and string
| Slist0 of g_symbol 'te
| Slist0sep of g_symbol 'te and g_symbol 'te
| Slist1 of g_symbol 'te
| Slist1sep of g_symbol 'te and g_symbol 'te
| Sopt of g_symbol 'te
| Sflag of g_symbol 'te
| Sself
| Snext
| Stoken of Plexing.pattern
| Stree of g_tree 'te ]
Description of a rule symbol.
The constructor "Smeta" is used by the extensions
FOLD0 and
FOLD1
The constructor "Snterm" is the representation of a
non-terminal (a call to another entry)
The constructor "Snterml" is the representation of a
non-terminal at some given level
The constructor "Slist0" is the representation of
the symbol LIST0
The constructor "Slist0sep" is the representation
of the symbol LIST0 followed by SEP
The constructor "Slist1" is the representation of
the symbol LIST1
The constructor "Slist1sep" is the representation
of the symbol LIST1 followed by SEP
The constructor "Sopt" is the representation
of the symbol OPT
The constructor "Sflag" is the representation
of the symbol FLAG
The constructor "Sself" is the representation
of the symbol SELF
The constructor "Snext" is the representation
of the symbol NEXT
The constructor "Stoken" is the representation
of a token pattern
The constructor "Stree" is the representation
of a anonymous rule list (between brackets).
and g_action = Obj.t
The semantic action, represented by a type "Obj.t" due
to the specific typing of the EXTEND statement (the semantic action
being able to be any function type, depending on the rule).
and g_tree 'te =
[ Node of g_node 'te
| LocAct of g_action and list g_action
| DeadEnd ]
and g_node 'te =
{ node : g_symbol 'te; son : g_tree 'te; brother : g_tree 'te }
;
The types of tree and tree nodes, representing a list of
factorized rules in an entry level.
The constructor "Node" is a representation of a
symbol (field "node"), the rest of the rule tree
(field "son"), and the following node, if this node
fails (field "brother")
The constructor "LocAct" is the representation of an
action, which is a function having all pattern variables of
the rule as parameters and returning the rule semantic action.
The list of actions in the constructor correspond to possible
previous actions when it happens that rules are masked by
other rules.
The constructor "DeadEnd" is a representation of a
nodes where the tree fails or is in syntax error.
type position =
[ First
| Last
| Before of string
| After of string
| Level of string ]
;
The type of position where an entry extension takes place.
First : corresponds to FIRST
Last : corresponds to LAST
Before s : corresponds to BEFORE "s"
After s : corresponds to AFTER "s"
Level s : corresponds to LEVEL "s"
The module contains other definitions but for internal use.
Grammar module
Extensible grammars.
This module implements the Camlp5 extensible grammars system.
Grammars entries can be extended using the EXTEND
statement, added by loading the Camlp5 "pa_extend.cmo"
file.
main types and values
type g = 'abstract;
The type of grammars, holding entries.
value gcreate : Plexing.lexer (string * string) -> g;
Create a new grammar, without keywords, using the given lexer.
value tokens : g -> string -> list (string * int);
Given a grammar and a token pattern constructor, returns the list of
the corresponding values currently used in all entries of this grammar.
The integer is the number of times this pattern value is used.
Examples:
The call: Grammar.tokens g "" returns the keywords
list.
The call: Grammar.tokens g "IDENT" returns the
list of all usages of the pattern "IDENT" in
the EXTEND statements.
value glexer : g -> Plexing.lexer token;
Return the lexer used by the grammar
type parsable = 'abstract;
value parsable : g -> Stream.t char -> parsable;
Type and value allowing to keep the same token stream between
several calls of entries of the same grammar, to prevent loss of
tokens. To be used with Entry.parse_parsable below
module Entry =
sig
type e 'a = 'x;
value create : g -> string -> e 'a;
value parse : e 'a -> Stream.t char -> 'a;
value parse_all : e 'a -> Stream.t char -> list 'a;
value parse_token : e 'a -> Stream.t token -> 'a;
value parse_parsable : e 'a -> parsable -> 'a;
value name : e 'a -> string;
value of_parser : g -> string -> (Stream.t token -> 'a) -> e 'a;
value print : e 'a -> unit;
value find : e 'a -> string -> e Obj.t;
external obj : e 'a -> Gramext.g_entry token = "%identity";
end;
Module to handle entries.
Grammar.Entry.e : type for entries returning values
of type "'a".
Grammar.Entry.create g n : creates a new entry
named "n" in the grammar "g".
Grammar.Entry.parse e : returns the stream parser
of the entry "e".
Grammar.Entry.parse_all e : returns the stream
parser returning all possible values while parsing with the
entry "e": may return more than one value when the
parsing algorithm is "Grammar.Backtracking".
Grammar.Entry.parse_token e : returns the token
parser of the entry "e".
Grammar.Entry.parse_parsable e : returns the
parsable parser of the entry "e".
Grammar.Entry.name e : returns the name of the
entry "e".
Grammar.Entry.of_parser g n p : makes an entry from
a token stream parser.
Grammar.Entry.print e : displays the entry
"e" using "Format".
Grammar.Entry.find e s : finds the entry named
s in the rules of "e".
Grammar.Entry.obj e : converts an entry into a
"Gramext.g_entry" allowing to see what it holds.
value of_entry : Entry.e 'a -> g;
Return the grammar associated with an entry.
printing grammar entries
The function "Grammar.Entry.print" displays the current
contents of an entry. Interesting for debugging, to look at the
result of a syntax extension, to see the names of the levels.
The display does not include the patterns nor the semantic actions,
whose sources are not recorded in the grammar entries data.
Moreover, the local entries (not specified in the GLOBAL indicator
of the EXTEND statement) are indicated with a star ("*") to
inform that they are not directly accessible.
clearing grammars and entries
module Unsafe :
sig
value gram_reinit : g -> Plexing.lexer token -> unit;
value clear_entry : Entry.e 'a -> unit;
end;
Module for clearing grammars and entries. To be manipulated
with care, because: 1) reinitializing a grammar destroys all tokens
and there may be problems with the associated lexer if there are
keywords; 2) clearing an entry does not destroy the tokens used
only by itself.
Grammar.Unsafe.reinit_gram g lex removes the tokens
of the grammar and sets "lex" as a new lexer for
"g". Warning: the lexer itself is not
reinitialized.
Grammar.Unsafe.clear_entry e removes all rules of
the entry "e".
scan entries
value print_entry : Format.formatter -> Gramext.g_entry 'te -> unit;
General printer for all kinds of entries (obj entries).
"Grammar.iter_entry f e" applies "f" to the
entry "e" and transitively all entries called by
"e". The order in which the entries are passed to
"f" is the order they appear in each entry. Each entry is
passed only once.
value fold_entry : (Gramext.g_entry 'te -> 'a -> 'a) -> Gramext.g_entry 'te -> 'a -> 'a;
"Grammar.fold_entry f e init" computes "(f eN .. (f
e2 (f e1 init)))", where "e1 .. eN" are "e"
and transitively all entries called by "e". The order in
which the entries are passed to "f" is the order they
appear in each entry. Each entry is passed only once.
Backtracking: use functional streams with full
backtracking
DefaultAlgorithm: found in the variable
"backtrack_parse" below.
The default, when a grammar is created,
is DefaultAlgorithm.
value set_algorithm : g -> parse_algorithm -> unit;
Set the parsing algorithm for all entries of a given grammar.
value backtrack_parse : ref bool;
If True, the default parsing uses full backtracking. If
False, it uses parsing with normal streams. If the
environment variable CAMLP5PARAM contains "b", the default is
True; otherwise, the default is False.
value backtrack_stalling_limit : ref int;
Limitation of backtracking to prevent stalling in case of syntax
error. In backtracking algorithm, when there is a syntax error,
the parsing continues trying to find another solution. It some
grammars, it can be very long before checking all possibilities.
This number limits the number of tokens tests after a backtrack.
(The number of tokens tests is reset to zero when the token stream
overtakes the last reached token.) The default is 10000. If set
to 0, there is no limit. Can be set by the environment variable
CAMLP5PARAM by "l=value".
functorial interface
Alternative for grammar use. Grammars are not Ocaml values:
there is no type for them. Modules generated preserve the rule "an
entry cannot call an entry of another grammar" by normal OCaml
typing.
module type GLexerType =
sig
type te = 'x;
value lexer : Plexing.lexer te;
end;
The input signature for the functor "Grammar.GMake":
"te" is the type of the tokens.
module type S =
sig
type te = 'x;
type parsable = 'x;
value parsable : Stream.t char -> parsable;
value tokens : string -> list (string * int);
value glexer : Plexing.lexer te;
value set_algorithm : parse_algorithm -> unit;
module Entry :
sig
type e 'a = 'y;
value create : string -> e 'a;
value parse : e 'a -> parsable -> 'a;
value parse_token : e 'a -> Stream.t te -> 'a;
value name : e 'a -> string;
value of_parser : string -> (Stream.t te -> 'a) -> e 'a;
value print : e 'a -> unit;
external obj : e 'a -> Gramext.g_entry te = "%identity";
end;
module Unsafe :
sig
value gram_reinit : Plexing.lexer te -> unit;
value clear_entry : Entry.e 'a -> unit;
end;
end;
Signature type of the functor "Grammar.GMake". The types
and functions are almost the same than in generic interface, but:
Grammars are not values. Functions holding a grammar as
parameter do not have this parameter yet.
The type "parsable" is used in function
"parse" instead of the char stream, avoiding the
possible loss of tokens.
The type of tokens (expressions and patterns) can be any
type (instead of (string * string)); the module parameter
must specify a way to show them as (string * string).
module GMake (L : GLexerType) : S with type te = L.te;
grammar flags
value error_verbose : ref bool;
Flag for displaying more information in case of parsing error;
default = "False".
value warning_verbose : ref bool;
Flag for displaying warnings while extension; default =
"True".
value strict_parsing : ref bool;
Flag to apply strict parsing, without trying to recover errors;
default = "False".
Diff module
Differences between two arrays. Used in Camlp5 sources, but can
be used for other applications, independantly from Camlp5 stuff.
value f : array 'a -> array 'a -> (array bool * array bool);
Diff.f a1 a2 returns a pair of boolean arrays (d1, d2).
d1 has the same size as a1.
d2 has the same size as a2.
d1.(i) is True if a1.(i) has no
corresponding value in a2.
d2.(i) is True if a2.(i) has no
corresponding value in a1.
d1 and d2 have the same number of values equal to
False.
Can be used, e.g., to write the diff program
(comparison of two files), the input arrays being the array of
lines of each file.
Can be used also to compare two strings (they must have been
exploded into arrays of chars), or two DNA strings, and so on.
Extfold module
Module internally used to make the
symbols FOLD0
and FOLD1 work in the EXTEND statement + extension
"pa_extfold.cmo".
Extfun module
Extensible functions.
This module implements pattern matching extensible functions which
work with the parsing kit "pa_extfun.cmo", the syntax of
an extensible function being:
The type of the extensible functions of type 'a ->
'b.
value empty : t 'a 'b;
Empty extensible function.
value apply : t 'a 'b -> 'a -> 'b;
Apply an extensible function.
exception Failure;
Match failure while applying an extensible function.
value print : t 'a 'b -> unit;
Print patterns in the order they are recorded in the data
structure.
Eprinter module
This module allows creation of printers, apply them and clear them. It
is also internally used by the "EXTEND_PRINTER" statement.
type t 'a = 'abstract;
Printer type, to print values of type "'a".
type pr_context = Pprintf.pr_context;
Printing context.
value make : string -> t 'a;
Builds a printer. The string parameter is used in error
messages. The printer is created empty and can be extended with
the "EXTEND_PRINTER" statement.
value apply : t 'a -> pr_context -> 'a -> string;
Applies a printer, returning the printed string of the
parameter.
value apply_level : t 'a -> string -> pr_context -> 'a ->
string;
Applies a printer at some specific level. Raises "Failure"
if the given level does not exist.
value clear : t 'a -> unit;
Clears a printer, removing all its levels and rules.
value print : t 'a -> unit;
Print printer patterns, in the order they are recorded, for
debugging purposes.
Some other types and functions exist, for internal use.
Fstream module
This module implement functional streams and parsers together with
backtracking parsers.
To be used with syntax "pa_fstream.cmo". The syntax is:
Fstream.t 'a -> option ('b * Fstream.t 'a * Fstream.kont 'a 'b)
Functional parsers use limited backtrack, i.e if a rule fails, the
next rule is tested with the initial stream; limited because in the
case of a rule with two consecutive symbols "a" and
"b", if "b" fails, the rule fails: there is no try
with the next rule of "a".
Backtracking parsers have full backtrack. If a rule fails, the
next case of the previous rule is tested.
Functional streams
type t 'a = 'x;
The type of 'a functional streams.
value from : (int -> option 'a) -> t 'a;
"Fstream.from f" returns a stream built from the
function "f". To create a new stream element, the
function "f" is called with the current stream
count. The user function "f" must return either
"Some <value>" for a value or "None" to
specify the end of the stream.
value of_list : list 'a -> t 'a;
Return the stream holding the elements of the list in the same
order.
value of_string : string -> t char;
Return the stream of the characters of the string parameter.
value of_channel : in_channel -> t char;
Return the stream of the characters read from the input channel.
value iter : ('a -> unit) -> t 'a -> unit;
"Fstream.iter f s" scans the whole stream s, applying
function "f" in turn to each stream element
encountered.
value next : t 'a -> option ('a * t 'a);
Return "Some (a, s)" where "a" is the first
element of the stream and s the remaining stream, or
"None" if the stream is empty.
value empty : t 'a -> option (unit * t 'a);
Return "Some ((), s)" if the stream is empty where
s is itself, else "None".
value count : t 'a -> int;
Return the current count of the stream elements, i.e. the number
of the stream elements discarded.
value count_unfrozen : t 'a -> int;
Return the number of unfrozen elements in the beginning of the
stream; useful to determine the position of a parsing error (longest
path).
Backtracking parsers
type kont 'a 'b = [ K of unit -> option ('b * t 'a * kont 'a 'b) ];
The type of continuation of a backtracking parser.
type bp 'a 'b = t 'a -> option ('b * t 'a * kont 'a 'b);
The type of a backtracking parser.
value bcontinue : kont 'a 'b -> option ('b * t 'a * kont 'a 'b);
"bcontinue k" return the next solution of a backtracking
parser.
value bparse_all : bp 'a 'b -> t 'a -> list 'b;
"bparse_all p strm" return the list of all solutions of a
backtracking parser applied to a functional stream.
Pprintf module
Definitions for pprintf statement.
This module contains types and functions for the "pprintf"
statement used by the syntax extension "pa_pprintf.cmo".
type pr_context = { ind : int; bef : string; aft : string;
dang : string };
Printing context.
"ind" : the current indendation
"bef" : what should be printed before, in the same line
"aft" : what should be printed after, in the same line
"dang" : the dangling token to know whether parentheses
are necessary
value sprint_break :
int -> int -> pr_context -> (pr_context -> string) ->
(pr_context -> string) -> string;
"sprint_break nspaces offset pc f g" concat the two strings
returned by "f" and "g", either in one line, if
it holds without overflowing (see module "Pretty"), with
"nspaces" spaces betwen them, or in two lines with
"offset" spaces added in the indentation for the second
line. This function don't need to be called directly. It is
generated by the "pprintf" statement according to its
parameters when the format contains breaks, like "@;"
and "@ ".
value sprint_break_all :
bool -> pr_context -> (pr_context -> string) ->
list (int * int * pr_context -> string) -> string;
"sprint_break_all force_newlines pc f fl" concat all
strings returned by the list with separators "f-fl", the
separators being the number of spaces and the offset like in the
function "sprint_break". The function works as "all or
nothing", i.e. if the resulting string does not hold on the line,
all strings are printed in different lines (even if sub-parts could
hold in single lines). If the parameter "force_newline" is
"True", all strings are printed in different lines, no
horizontal printing is tested. This function don't need to be
called directly. It is generated by the "pprintf"
statement according to its parameters when the format contains
parenthesized parts with "break all" like "@[<a>" and
"@]", or "@[<b>" and "@]".
"horiz_vertic h v" first calls "h" to print
the data horizontally, i.e. without newlines. If the displaying
contains newlines or if its size exceeds the maximum line length
(see variable "line_length" below), then the function
"h" stops and the function "v" is called which
can print using several lines.
value sprintf : format 'a unit string -> 'a;
"sprintf fmt ..." formats some string like
"Printf.sprintf" does, except that, if it is called in
the context of the *first* function of "horiz_vertic"
above, it checks whether the resulting string has chances to fit
in the line. If not, i.e. if it contains newlines or if its length
is greater than "max_line_length.val", the function gives
up (raising some internal exception). Otherwise the built string
is returned. "sprintf" behaves like
"Printf.sprintf" if it is called in the context of the
*second* function of "horiz_vertic" or without context at
all.
value line_length : ref int;
"line_length" is the maximum length (in characters) of
the line. Default = 78. Can be set to any other value before
printing.
value horizontally : unit -> bool;
"horizontally ()" returns the fact that the context is
an horizontal print.
Deprecated modules Stdpp and Token
The modules "Stdpp" and "Token" have been
deprecated since version 5.00. The module "Stdpp" was
renamed "Ploc" and most of its variables and types were also
renamed. The module "Token" was renamed
"Plexing"
Backward compatibility is assured. See the files
"stdpp.mli" and "token.mli" in the Camlp5
distribution to convert from old to new names, if any. After
several versions or years, the modules "Stdpp" and
"Token" will disappear from Camlp5.
camlp5-6.14/doc/htmlp/camlp5.rss 0000664 0001750 0001750 00000024323 12556457206 015500 0 ustar roglo roglo
Camlp5Camlp5 documentationMon Sep 17 15:10:47 CEST 2007
http://pauillac.inria.fr/~ddr/camlp5/doc/html/
Macros
Added ability to use IFDEF and IFNDEF for record labels declarations.
Wed Dec 31 11:51:33 CET 2008
../macros.html
Plexer
Added flag "Plexer.utf8_lexing" to specify that the lexer
built by "Plexer.gmake ()" should take utf8 punctuation
characters into account.
Mon Dec 24 11:24:34 CET 2007
../opretty.html
quotation example
Fixed example in chapter "Quotations" about lambda terms. Did not
work because using an old version of the "lexer" syntax extension.
Fri Dec 28 23:05:19 CET 2007
../quot.html
pprintf
Added chapter about the new statement "pprintf".
Fri Dec 28 23:05:19 CET 2007
../pprintf.html
Grammar.Entry.parse_all
Added function "Grammar.Entry.parse_all".
Thu Dec 27 20:42:30 CET 2007
../library.html
Pretty print
Added function "Pretty.horizontally" returning the fact that
the current print is horizontal or not. Added function
"Prtools.hvlistl" using "hlistl" or "vlistl" according to
the fact that the current print is horizontal or not.
Mon Dec 24 11:24:34 CET 2007
../opretty.html
Located exceptions
Explained in a new section "Located errors" how put locations
to exceptions raised in quotations expanders.
Sun Dec 16 18:18:08 CET 2007
../quot.html
Backtracking example
Added an example of backtracking.
Fri Nov 30 12:59:11 CET 2007
../bparser.html
Fstream and Grammar library modules
In Fstream module, added section about backtracking parsers.
Grammar module, added section about parsing algorithm. They
are new features of Camlp5 5.04.
Sun Nov 25 08:23:25 CET 2007
../library.html
Backtracking parsers
Added chapter about backtracking parsers, a new feature of
Camlp5 5.04.
Sun Nov 25 03:47:32 CET 2007
../bparsers.html
mkcamlp5
Added paragraph about new commands mkcamlp5 and mkcamlp5.opt.
Tue Nov 20 04:30:54 CET 2007
../commands.html
Lexers 2
Updated the chapter about lexers, the lexing buffer being now
provided in Camlp5 library, instead of having to be written
by the programmer.
Thu Oct 11 12:56:15 CEST 2007
../lexers.html
Lexers
Changed the chapter about lexers, their syntax being changed.
Thu Oct 11 12:56:15 CEST 2007
../lexers.html
Future work
Deleted section about the rewritting of the pretty print in
Scheme syntax (this is implemented). Added section about GLR
grammars.
Thu Oct 11 12:56:15 CEST 2007
../conclusion.html
Locations in quotations and antiquotations
Added detailed explanations about the problem of locations in
quotations and antiquotations.
Tue Oct 9 19:10:10 CEST 2007
../quot.html#a:Locations-in-quotations-and-antiquotations
Module Diff and Equilibrated pretty print
Added module Diff in Camlp5 library to compare two arrays (using
the algorithm of the Unix 'diff' command). Added flag 'E' in
pretty print kit in revised syntax (pr_o.cmo) to allow display
equilibrate cases (all cases in 'match' statement are cut if one
is cut, and same equilibration in 'if' statement).
Mon Oct 1 21:47:55 CEST 2007
../library.html
SLIST0...
Deleted section about internal extensions SLIST0, SLIST1, SOPT
and SFLAG, which are not yet supported.
Mon Oct 1 11:38:46 CEST 2007
../grammars.html
Syntax tree - strict mode - nodes without quotations
Added a section about types of nodes without quotations:
type_var, type_decl and class_infos.
Wed Sep 26 14:05:15 CEST 2007
../ast_strict.html
Syntax tree - strict mode - 2
Updated doc of 'patt', 'ctyp', 'str_item' and 'sig_item'.
Tue Sep 25 05:14:45 CEST 2007
../ast_strict.html
Version number
New version number will be 5.00 instead of 4.09, because the
AST quotations in user syntax is a major change.
Sat Sep 22 07:14:37 CEST 2007
../index.html
Syntax tree - strict mode - 1
General antiquoting will start with "_" (underscore) instead
of "a".
Fri Sep 21 21:06:25 CEST 2007
../ast_strict.html
Syntax tree - strict mode
Trying a different display for the "expr" quotation, the
displaying with a table being to wide.
Tue Sep 18 22:35:13 CEST 2007
../ast_strict.html
Syntax tree - 3
Improved the text of the description of the chapters about
syntax tree nodes in strict and in transitional modes. Added
a chapter (empty for the moment) where the quotation kit
"q_ast.cmo" (AST quotations in user syntax) will be specifically
described.
Tue Sep 18 16:16:23 CEST 2007
../ml_ast.html
Syntax tree - 2
Separated the "syntax tree in strict mode" and "syntax tree in
transitional mode" into two chapters.
Tue Sep 18 12:15:14 CEST 2007
../ml_ast.html
Syntax tree - 1
Added section about the syntax tree in "strict" mode, actually
almost a copy of the section in "transitional" mode.
Tue Sep 18 06:34:42 CEST 2007
../ml_ast.html
Future work
Changed intro sentence. Deleted section "extensible syntax
tree quotations" (implemented). Added section "rewritting
pretty printer in Scheme syntax".
Tue Sep 18 03:51:21 CEST 2007
../conclusion.html
Syntax tree
Changed structure of this chapter. Added introduction to explain
the general differences between "transitional" and "strict" modes.
Put the rest inside a section "syntax tree in transitional mode".
Added a section "syntax tree in strict mode" (to be completed).
Mon Sep 17 15:10:47 CEST 2007
../ml_ast.html
Library - module Ploc
Explained that this module also contains pervasive type and function.
Commented the pervasive type Ploc.vala and added the pervasive
function Ploc.call_with.
Mon Sep 17 11:41:44 CEST 2007
../library.html
Transitional and Strict - 1
Added more explanations about the two modes, and a short
example of what the syntax tree quotation "q_ast.cmo" allows
to do.
Mon Sep 17 06:29:44 CEST 2007
../strict.html
Transitional and Strict
Added chapter about "strict" and "transitional", allowing to
choose between backward compatibility and a new feature: the
syntax tree quotations in user syntax ("q_ast.cmo").
Mon Sep 17 05:28:48 CEST 2007
../strict.html
camlp5-6.14/doc/htmlp/redef.html 0000664 0001750 0001750 00000010576 12556457206 015546 0 ustar roglo roglo
Redefining OCaml syntax
Redefining OCaml syntax
Better than just syntax extensions, it is
possible to redefine the whole syntax of the language. For example,
to:
have a version whose keywords are translated in your native
language,
restrict the OCaml language,
interpret XML (or other languages) as OCaml source,
and so on...
Starting with an example
A way to start doing this is to take, in Camlp5 sources, one of the
files "etc/pa_o.ml" or file "meta/pa_r.ml". The first one defines
the OCaml standard syntax and the second one the revised syntax.
Let's say you want to take the normal syntax and make some
readjustments. You first make a copy of "etc/pa_o.ml" naming it,
e.g., "mysyntax.ml" (the example below works similarly if you
take "meta/pa_r.ml" instead):
This produces the file "mysyntax.cmo". Now you can compile one of
your files, e.g. "foo.ml", if written in this syntax, i.e. the
normal OCaml syntax if you made no changes in "mysyntax.ml":
If there si no changes in "mysyntax.ml" from "pa_o.ml", this is
just a compilation with the normal OCaml syntax. To make changes,
you can edit the file "mysyntax.ml" and recompile it. As an
exercice, try to translate some keywords in your native language
(or another language if it is not English).
Reading the way Camlp5 extensible
grammars and syntax tree work
(both used in "pa_o.ml" and "pa_r.ml"), you can make more
complicated changes or change everything, if you want.
A file for an OCaml syntax
This is what you can find in the files "pa_o.ml" and "pa_r.ml".
An OCaml syntax files uses the Camlp5 library
module Pcaml. All grammar entries are
defined there. The first thing is the reinitialization of the
grammar (which clear all tokens and define a lexer) and all grammar
entries, to be sure that no possible previous loaded grammars
remain.
If using the same lexer (provided in Camlp5 library
module Plexer), it is
done by:
The cleanup of all grammar entries are done by calls to the function
"Grammar.Unsafe.clear_entry". The main entries are Pcaml.interf, for
compiling an interface (a ".mli" file) and Pcaml.implem, for compiling
an implementation (a ".ml" file). And all other grammars entries you
want to use must be cleared:
Actually, the camlp5 command can compile the input file with other
ways than using the Camlp5 grammars. The variables "Pcaml.parse_interf"
and "Pcaml.parse_implem" are references to the functions called by
camlp5. By default, it is the Camlp5 grammar syntax, but to be sure
it goes on using it (if a previous load changed that), the following
statement are added:
In the files "pa_o.ml" and "pa_r.ml", some local functions follow,
which are themselves followed by a call to the big statement
"EXTEND", the main statement of the Camlp5 extensible grammars
system.
A pretty print system is provided in the library module Pretty. It
allows one to pretty print data or programs. The Pretty module
contains:
The function "horiz_vertic" to specify how data must be printed.
The function "sprintf" to format strings.
The variable "line_length" which is a reference specifying the
maximum lines lengths.
Module description
horiz_vertic
The function "horiz_vertic" takes two functions as parameters. When
invoked, it calls its first function. If that function fails with
some specific internal error (that the function "sprintf"
below may raise), the second function is called.
The type of "horiz_vertic" is:
(unit -> 'a) -> (unit -> 'a) -> 'a
the horizontal function
The first function is said to be the "horizontal" function. It
tries to pretty print the data on a single line. In the context of
this function, if the strings built by the function "sprintf" (see
below) contain newlines or have lengths greater than "line_length",
the function fails (with a internal exception local to the
module).
the vertical function
In case of failure of the "horizontal function", the second
function of "horiz_vertic", the "vertical" function, is called. In
the context of that function, the "sprintf" function behaves like
the normal "sprintf" function of the OCaml library module
"Printf".
sprintf
The function "sprintf" works like its equivalent in the module
"Printf" of the OCaml library, and takes the same parameters. Its
difference is that if it is called in the context of the first
function (the "horizontal" function) of the function "horiz_vertic"
(above), all strings built by "sprintf" are checked for newlines or
length greater than the maximum line length. If either occurs, the
"sprintf" function fails and the horizontal function fails.
If "sprintf" is not in the context of the horizontal function, it
behaves like the usual "sprintf" function.
line_length
The variable "line_length" is a reference holding the maximum line
length of lines printed horizontally. Its default is 78. This can be
changed by the user before using "horiz_vertic".
horizontally
The call "horizontally ()" returns a boolean telling whether
the context is horizontal.
Example
Suppose you want to pretty print the XML code
"<li>something</li>". If the "something" is
short, you want to see:
<li>something</li>
If the "something" has several lines, you want to see that:
Notice that the "sprintf" above is the one of the
library Pretty.
Notice also that, in a program displaying XML code, this
"something" may contain other XML tags, and is therefore generally
the result of other pretty printing functions, and the program
should rather look like:
Parts of this "something" can be printed horizontally and other
vertically using other calls to "horiz_vertic" in the user function
"print" above. But it is important to remark that if they are called
in the context of the first function parameter of "horiz_vertic"
above, only horizontal functions are accepted: the first failing
"horizontal" function triggers the failure of the horizontal pretty
printing.
Programming with Pretty
Hints
Just start with a call to "horiz_vertic".
As its first function, use "sprintf" just to concat the strings
without putting any newlines or indentations, e.g. just using spaces
to separate pieces of data.
As its second function, consider how you want your data to be cut.
At the cutting point or points, add newlines. Notice that you
probably need to give the current indentation string as parameter of
the called functions because they need to be taken into account in
the called "horizontal" functions.
In the example below, don't put the indentation in the sprintf
function but give it as parameter of your "print" function:
Now, the "print" function could look like, supposing you print
other things with "other" of the current indentation and "things"
with a new shifted one:
value print ind something =
horiz_vertic
(fun () -> sprintf "%sother things..." ind)
(fun () -> sprintf "%sother\n%s things..." ind ind);
Supposing than "other" and "things" are the result of two other
functions "print_other" and "print_things", your program could look
like:
value print ind (x, y) =
horiz_vertic
(fun () -> sprintf "%s%s %s" ind (print_other 0 x) (print_things 0 y))
(fun () -> sprintf "%s\n%s" (print_other ind x) (print_things (ind ^ " ") y));
How to cancel a horizontal print
If you want to prevent a pretty printing function from being called
in a horizontal context, constraining the pretty print to be on
several lines in the calling function, just do:
horiz_vertic
(fun () -> sprintf "\n")
(fun () -> ... (* your normal pretty print *))
In this case, the horizontal print always fails, due to the newline
character in the sprintf format.
Remarks
Kernel
The module "Pretty" is intended to be basic, a "kernel" module to
pretty print data. It presumes that the user takes care of the
indentation. Programs using "Pretty" are not as short as the ones
using "Format" of the OCaml library, but are more flexible. To
pretty print with a shorter syntax like in the OCaml module "Format"
(with the "@" convention), see statement
"pprintf" (which internally uses the
module "Pretty").
Strings vs Channels
In "Pretty", the pretty printing is done only on strings, not on
files. To pretty print files, just build the strings and print them
afterwards with the usual output functions. Notice that OCaml
allocates and frees strings quickly, and if pretty printed values
are not huge, which is generally the case, it is not a real problem,
memory sizes these days being more than enough for this job.
Strings or other types
The "horiz_vertic" function can return values of types other than
"string". For example, if you are interested only in the result of
horizontal context and not on the vertical one, it is perfectly
correct to write:
horiz_vertic
(fun () -> Some (sprintf "I hold on a single line")
(fun () -> None)
Why raising exceptions ?
One could ask why this pretty print system raises internal
exceptions. Why not simply write the pretty printing program like
this:
first build the data horizontally (without newlines)
if the string length is lower than the maximum line length,
return it
if not, build the string by adding newlines in the specific
places
This method works but is generally very slow (exponential in time)
because while printing horizontally, many useless strings are
built. If, for example, the final printed data holds on 50 lines,
tens of lines may be built uselessly again and again before the
overflowing is corrected.
This chapter describes the syntax and semantics of the extensible
grammars of Camlp5.
The extensible grammars are the most advanced parsing tool of
Camlp5. They apply to streams of characters using a lexer which has
to be previously defined by the programmer. In Camlp5, the syntax of
the OCaml language is defined with extensible grammars, which makes
Camlp5 a bootstrapped system (it compiles its own features by
itself).
Getting started
The extensible grammars are a system to build grammar
entries which can be extended dynamically. A grammar entry is
an abstract value internally containing a stream parser. The type
of a grammar entry is "Grammar.Entry.e t"
where "t" is the type of the values returned by the
grammar entry.
To start with extensible grammars, it is necessary to build
a grammar, a value of type "Grammar.g", using the
function "Grammar.gcreate":
value g = Grammar.gcreate lexer;
where "lexer" is a lexer previously defined. See the
section explaining the interface with lexers. In a first time, it is
possible to use a lexer of the module "Plexer" provided by
Camlp5:
value g = Grammar.gcreate (Plexer.gmake ());
Each grammar entry is associated with a grammar. Only grammar
entries of the same grammar can call each other. To create a grammar
entry, one has to use the function "Grammar.Entry.create" with
takes the grammar as first parameter and a name as second parameter. This
name is used in case of syntax errors. For example:
value exp = Grammar.Entry.create g "expression";
To apply a grammar entry, the function
"Grammar.Entry.parse" can be used. Its first parameter is the
grammar entry, the second one a stream of characters:
Other statements, "GEXTEND", "DELETE_RULE",
"GDELETE_RULE" are also defined by the same syntax extension
kit. See further.
In the description above, only "EXTEND" and "END"
are new keywords (reserved words which cannot be used in variables,
constructors or module names). The other strings
(e.g. "GLOBAL", "LEVEL", "LIST0",
"LEFTA", etc.) are not reserved.
Semantics of the EXTEND statement
The EXTEND statement starts with the "EXTEND" keyword and ends
with the "END" keyword.
GLOBAL indicator
After the first keyword, it is possible to see the identifier
"GLOBAL" followed by a colon, a list of entries names and a
semicolon. It says that these entries correspond to visible
(previously defined) entry variables, in the context of the EXTEND
statement, the other ones being locally and silently defined
inside.
If an entry, which is extended in the EXTEND statement, is in the
GLOBAL list, but is not defined in the context of the EXTEND
statement, the OCaml compiler will fail with the error "unbound
value".
If there is no GLOBAL indicator, and an entry, which is extended
in the EXTEND statement, is not defined in the contex of the EXTEND
statement, the OCaml compiler will also fail with the error "unbound
value".
Example:
value exp = Grammar.Entry.create g "exp";
EXTEND
GLOBAL: exp;
exp: [ [ x = foo; y = bar ] ];
foo: [ [ "foo" ] ];
bar: [ [ "bar" ] ];
END;
The entry "exp" is an existing variable (defined by value exp =
...). On the other hand, the entries "foo" and "bar" have not been
defined. Because of the GLOBAL indicator, the system define them
locally.
Without the GLOBAL indicator, the three entries would have been
considered as global variables, therefore the OCaml compiler would
say "unbound variable" under the first undefined entry, "foo".
Entries list
Then the list of entries extensions follow. An entry extension
starts with the entry name followed by a colon. An entry may have
several levels corresponding to several stream parsers which call the
ones the others (see further).
Optional position
After the colon, it is possible to specify a where to insert the
defined levels:
The identifier "FIRST" (resp. "LAST")
indicates that the level must be inserted before (resp. after) all
possibly existing levels of the entry. They become their first
(resp. last) levels.
The identifier "BEFORE" (resp. "AFTER")
followed by a level label (a string) indicates that the levels
must be inserted before (resp. after) that level, if it exists. If
it does not exist, the extend statement fails at run time.
The identifier "LIKE" followed by a string indicates
that the first level defined in the extend statement must be
inserted in the first already existing level with a rule
containing this string as keyword or token name. For example,
"LIKE "match"" is the first level having "match"
as keyword. If there is no level with this string, the extend
statement fails at run time.
The identifier "LEVEL" followed by a level label
indicates that the first level defined in the extend statement
must be inserted at the given level, extending and modifying
it. The other levels defined in the statement are inserted after
this level, and before the possible levels following this
level. If there is no level with this label, the extend statement
fails at run time.
By default, if the entry has no level, the levels defined in the
statement are inserted in the entry. Otherwise the first defined
level is inserted at the first level of the entry, extending or
modifying it. The other levels are inserted afterwards (before the
possible second level which may previously exist in the entry).
Levels
After the optional "position", the level list follow. The
levels are separated by vertical bars, the whole list being between
brackets.
A level starts with an optional label, which corresponds to its
name. This label is useful to specify this level in case of future
extensions, using the position (see previous section) or
for possible direct calls to this specific level.
The level continues with an optional associativity indicator, which
can be:
LEFTA for left associativity (default),
RIGHTA for right associativity,
NONA for no associativity.
Rules
At last, the grammar rule list appear. The rules are
separated by vertical bars, the whole list being brackets.
A rule looks like a match case in the "match" statement or
a parser case in the "parser" statement: a list of psymbols
(see next paragraph) separated by semicolons, followed by a right
arrow and an expression, the semantic action. Actually, the right
arrow and expression are optional: in this case, it is equivalent to
an expression which would be the unit "()" constructor.
A psymbol is either a pattern, followed with the equal sign and a
symbol, or by a symbol alone. It corresponds to a test of this symbol,
whose value is bound to the pattern if any.
Symbols
A symbol is an item in a grammar rule. It is either:
a keyword (a string): the input must match this keyword,
a token name (an identifier starting with an uppercase
character), optionally followed by a string: the input must match
this token (any value if no string, or that string if a string
follows the token name), the list of the available tokens
depending on the associated lexer (the list of tokens available
with "Plexer.gmake ()" is: LIDENT, UIDENT, TILDEIDENT,
TILDEIDENTCOLON, QUESTIONIDENT, INT, INT_l, INT_L, INT_n, FLOAT,
CHAR, STRING, QUOTATION, ANTIQUOT and EOI; other lexers may
propose other lists of tokens),
an entry name, which correspond to a call to this entry,
an entry name followed by the identifier "LEVEL" and a
level label, which correspond to the call to this entry at that
level,
the identifier "SELF" which is a recursive call to the
present entry, according to the associativity (i.e. it may be a
call at the current level, to the next level, or to the top level
of the entry): "SELF" is equivalent to the name of the
entry itself,
the identifier "NEXT", which is a call to the next level
of the current entry,
a left brace, followed by a list of rules separated by vertical
bars, and a right brace: equivalent to a call to an entry, with
these rules, inlined,
a meta symbol (see further),
a symbol between parentheses.
The syntactic analysis follow the list of symbols. If it fails,
depending on the first items of the rule (see the section about the
kind of grammars recognized):
the parsing may fail by raising the exception
"Stream.Error"
the parsing may continue with the next rule.
Meta symbols
Extra symbols exist, allowing to manipulate lists or optional
symbols. They are:
LIST0 followed by a symbol: this is a list of this symbol,
possibly empty,
LIST0 followed by a symbol, SEP and another symbol, and optional
OPT_SEP: this is a list, possibly empty, of the first symbol
separated by the second one, possibly ended with the separator if
OPT_SEP is present,
LIST1 followed by a symbol: this is a list of this symbol,
with at least one element,
LIST1 followed by a symbol, SEP and another symbol, and optional
OPT_SEP: this is a list, with at least one element, of the first
symbol separated by the second one, possibly ended with the
separator if OPT_SEP is present,
OPT followed by a symbol: equivalent to "this symbol or
nothing" returning a value of type "option".
FLAG followed by a symbol: equivalent to "this symbol or
nothing", returning a boolean.
The V meta symbol
The V meta symbol is destinated to allow antiquotations while using
the syntax tree quotation
kit q_ast.cmo. It works only in
strict mode. In transitional mode, it is just equivalent to its
symbol parameter.
Antiquotation kind
The antiquotation kind is the optional identifier between the
starting "$" (dollar) and the ":" (colon) in a
quotation of syntax tree (see the
chapter syntax tree).
The optional list of strings following the "V" meta symbol and its
symbol parameter gives the allowed antiquotations kinds.
By default, this string list, i.e. the available antiquotation
kinds, is:
["flag"] for FLAG
["list"] for LIST0 and LIST1
["opt"] for OPT
For example, the symbol:
V (FLAG "rec")
is like "FLAG" while normally parsing, allowing to parse the keyword
"rec". While using it in quotations, also allows the parse
the keyword "rec" but, moreover, the antiquotation
"$flag:..$" where ".." is an expression or a pattern
depending on the position of the quotation.
There are also default antiquotations kinds for the tokens used in
the OCaml language predefined parsers "pa_r.cmo" (revised
syntax) and "pa_o.cmo" (normal syntax), actually all
parsers using the provided lexer "Plexer" (see the
chapter Library). They are:
["chr"] for CHAR
["flo"] for FLOAT
["int"] for INT
["int32"] for INT_l
["int64"] for INT_L
["nativeint"] for INT_n
["lid"] for LIDENT
["str"] for STRING
["uid"] for UIDENT
It is also possible to use the "V" meta symbol over non-terminals
(grammars entries), but there is no default antiquotation kind. For
example, while parsing a quotation, the symbol:
V foo "bar" "oops"
corresponds to either a call to the grammar entry "foo",
or to the antiquotations "$bar:...$" or
"$oops:...$".
Type
The type of the value returned by a V meta symbol is:
in transitional mode, the type of its symbol parameter,
in strict mode, "Ploc.vala t", where "t" is
its symbol parameter.
In strict mode, if the symbol parameter is found, whose value is,
say, "x", the result is "Ploc.VaVal x". If an
antiquotation is found the result is "Ploc.VaAnt s" where
"s" is some string containing the antiquotation text and
some other internal information.
Rules insertion
Remember that "EXTEND" is a statement, not a declaration:
the rules are added in the entries at run time. Each rule is
internally inserted in a tree, allowing the left factorization of the
rule. For example, with this list of rules (borrowed from the Camlp5
sources):
"method"; "private"; "virtual"; l = label; ":"; t = poly_type
"method"; "virtual"; "private"; l = label; ":"; t = poly_type
"method"; "virtual"; l = label; ":"; t = poly_type
"method"; "private"; l = label; ":"; t = poly_type; "="; e = expr
"method"; "private"; l = label; sb = fun_binding
"method"; l = label; ":"; t = poly_type; "="; e = expr
"method"; l = label; sb = fun_binding
the rules are inserted in a tree and the result looks like:
This tree is built as long as rules are inserted. When used, by
applying the function "Grammar.Entry.parse" to the current
entry, the input is matched with that tree, starting from the tree
root, descending on it as long as the parsing advances.
There is a different tree by entry level.
Semantic action
The semantic action, i.e. the expression following the right arrow
in rules, contains in its environment:
the variables bound by the patterns of the symbols found in the
rules,
the specific variable "loc" which contain the location
of the whole rule in the source.
The location is an abstract type defined in the module
"Ploc" of Camlp5.
It is possible to change the name of this variable by using the option
"-loc" of Camlp5. For example, compiling a file like this:
camlp5r -loc foobar file.ml
the variable name, for the location will be "foobar"
instead of "loc".
The DELETE_RULE statement
The "DELETE_RULE" statement is also added in the
expressions of the OCaml language when the syntax extension kit
"pa_extend.cmo" is loaded. Its syntax is:
expression ::= delete-rule
delete-rule ::= "DELETE_RULE" delete-rule-body "END"
delete-rule-body ::= entry-name ":" symbols
symbols ::= symbol symbols
| symbol
See the syntax of the EXTEND statement for the meaning of the syntax
entries not defined above.
The entry is scanned for a rule matching the giving symbol
list. When found, the rule is removed. If no rule is found, the
exception "Not_found" is raised.
Extensions FOLD0 and FOLD1
When loading "pa_extfold.cmo" after
"pa_extend.cmo", the entry "symbol" of the EXTEND
statement is extended with what is named the fold
iterators, like this:
symbol ::= "FOLD0" simple_expr simple_expr symbol
| "FOLD1" simple_expr simple_expr symbol
| "FOLD0" simple_expr simple_expr symbol "SEP" symbol
| "FOLD1" simple_expr simple_expr symbol "SEP" symbol
simple_expr ::= expr (level "simple")
Like their equivalent with the lists iterators: "LIST0",
"LIST1", "LIST0SEP", "LIST1SEP", they
read a sequence of symbols, possibly with the separators, but
instead of building the list of these symbols, apply a fold function
to each symbol, starting at the second "expr" (which must be a
expression node) and continuing with the first "expr" (which must be
a function taking two expressions and returing a new
expression).
The list iterators can be seen almost as a specific case of these
fold iterators where the initial "expr" would be:
<:expr< [] >>
and the fold function would be:
fun e1 e2 -> <:expr< [$e1$ :: $e2$ ] >>
except that, implemented like that, they would return the list in
reverse order.
Actually, a program using them can be written with the lists
iterators with the semantic action applying the function
"List.fold_left" to the returned list, except that with the
fold iterators, this operation is done as long as the symbols
are read on the input, no intermediate list being built.
Example, file "sum.ml":
#load "pa_extend.cmo";
#load "pa_extfold.cmo";
#load "q_MLast.cmo";
let loc = Ploc.dummy in
EXTEND
Pcaml.expr:
[ [ "sum";
e =
FOLD0 (fun e1 e2 -> <:expr< $e2$ + $e1$ >>) <:expr< 0 >>
Pcaml.expr SEP ";";
"end" -> e ] ]
;
END;
which can be compiled like this:
ocamlc -pp camlp5r -I +camlp5 -c sum.ml
and tested:
ocaml -I +camlp5 camlp5r.cma sum.cmo
Objective Caml version ...
Camlp5 Parsing version ...
# sum 3;4;5 end;
- : int = 12
Grammar machinery
We explain here the detail of the mechanism of the parsing of an
entry.
Start and Continue
At each entry level, the rules are separated into two trees:
The tree of the rules not starting with the current entry
name nor by "SELF".
The tree of the rules starting with the current entry name or by
the identifier "SELF", this symbol not being included in
the tree.
They determine two functions:
The function named "start", analyzing the first tree.
The function named "continue", taking, as parameter, a value
previously parsed, and analyzing the second tree.
A call to an entry, using "Grammar.Entry.parse" correspond
to a call to the "start" function of the first level of the
entry.
The "start" function tries its associated tree. If it works, it
calls the "continue" function of the same level, giving the result
of "start" as parameter. If this "continue" function fails, this
parameter is simply returned. If the "start" function fails, the
"start" function of the next level is tested. If there is no more
levels, the parsing fails.
The "continue" function first tries the "continue" function of the
next level. If it fails, or if it is the last level, it tries its
associated tree, then calls itself again, giving the result as
parameter. If its associated tree fails, it returns its extra
parameter.
Associativity
While testing the tree, there is a special case for rules ending
with SELF or with the current entry name. For this last symbol,
there is a call to the "start" function: of the current level if the
level is right associative, or of the next level otherwise.
There is no behaviour difference between left and non associative,
because, in case of syntax error, the system attempts to recover the
error by applying the "continue" function of the previous symbol (if
this symbol is a call to an entry).
When a SELF or the current entry name is encountered in the middle
of the rule (i.e. if it is not the last symbol), there is a call to
the "start" function of the first level of the current entry.
Example. Let us consider the following grammar:
EXTEND
expr:
[ "minus" LEFTA
[ x = SELF; "-"; y = SELF -> x -. y ]
| "power" RIGHTA
[ x = SELF; "**"; y = SELF -> x ** y ]
| "simple"
[ "("; x = SELF; ")" -> x
| x = INT -> float_of_int x ] ]
;
END
The left "SELF"s of the two levels "minus" and "power" correspond
to a call to the next level. In the level "minus", the right "SELF"
also, and the left associativity is treated by the fact that the
"continue" function is called (starting with the keyword "-" since
the left "SELF" is not part of the tree). On the other hand, for the
level "power", the right "SELF" corresponds to a call to the current
level, i.e. the level "power" again. At end, the "SELF" between
parentheses of the level "simple" correspond to a call to the first
level, namely "minus" in this grammar.
Parsing algorithm
By default, the kind of grammar is predictive parsing grammar,
i.e. recursive descent parsing without backtrack. But with some
nuances, due to the improvements (error recovery and token starting
rules) indicated in the next sections.
However, it is possible to change the parsing algorithm, by calling
the function "Grammar.set_algorithm". The possible values
are:
Grammar.Predictive
internally using normal parsers, with
a predictive (recursive descent without backtracking)
algorithm.
the parsing algorithm is determined by the environment variable
"CAMLP5PARAM". If this environment variable exists and contains
"b", the parsing algorithm is "backtracking". Otherwise it is
"predictive".
An interesting function, when using then backtracking algorithm, is
"Grammar.Entry.parse_all" which returns all solutions of a
given input.
See details in the chapter Library,
section "Grammar module".
Errors and recovery
In extensible grammars, the exceptions are encapsulated with the
exception "Ploc.Exc" giving the location of the error together with
the exception itself.
If the parsing algorithm is "Grammar.Predictive", the
system internally uses stream
parsers. Two exceptions may happen: "Stream.Failure" or
"Stream.Error". "Stream.Failure" indicates that the parsing just
could not start. "Stream.Error" indicates that the parsing started
but failed further.
With this algorithm, when the first symbol of a rule has been
accepted, all the symbols of the same rule must be accepted,
otherwise the exception "Stream.Error" is raised.
If the parsing algorithm is "Grammar.Backtracking", the
system internally uses backtracking
parsers. If no solution is found, the exception
"Stream.Error" is raised and the location of the error is
the location of the last unfrozen token, i.e. where the stream
advanced the farthest.
In extensible grammars, unlike stream parsers, before the
"Stream.Error" exception, the system attempts to recover the error
by the following trick: if the previous symbol of the rule was a
call to another entry, the system calls the "continue" function of
that entry, which may resolve the problem.
Tokens starting rules
Another improvement (other than error recovery) is that when a rule
starts with several tokens and/or keywords, all these tokens and
keywords are tested in one time, and the possible "Stream.Error" may
happen, only from the symbol following them on, if any.
To create a grammar, the function "Grammar.gcreate" must
be called, with a lexer as parameter.
A simple solution, as possible lexer, is the predefined lexer built
by "Plexer.gmake ()", lexer used for the OCaml grammar of
Camlp5. In this case, you can just put it as parameter of
"Grammar.gcreate" and it is not necessary to read this
section.
The section first introduces the notion of "token patterns" which
are the way the tokens and keywords symbols in the EXTEND statement
are represented. Then follow the description of the type of the
parameter of "Grammar.gcreate".
Token patterns
A token pattern is a value of the type defined like this:
type pattern = (string * string);
This type represents values of the token and keywords symbols in
the grammar rules.
For a token symbol in the grammar rules, the first string is the
token constructor name (starting with an uppercase character), the
second string indicates whether the match is "any" (the empty
string) or some specific value of the token (an non-empty
string).
For a keyword symbol, the first string is empty and the second
string is the keyword itself.
the different symbols and keywords are represented by the following
couples of strings:
the keyword "for" is represented by ("", "for"),
the keyword "=" by ("", "="),
the keyword "to" by ("", "to")),
and the token symbol LIDENT by ("LIDENT", "").
The symbol UIDENT "Foo" in a rule would be represented
by the token pattern:
("UIDENT", "Foo")
Notice that the symbol "SELF" is a specific symbol of the
EXTEND syntax: it does not correspond to a token pattern and is
represented differently. A token constructor name must not belong to
the specific symbols: SELF, NEXT, LIST0, LIST1, OPT and FLAG.
The lexer record
The type of the parameter of the function
"Grammar.gcreate" is "lexer", defined in the
module "Plexing". It is a record type with the following
fields:
The lexer takes a character stream as parameter and return a couple
of containing: a token stream (the tokens being represented by a
couple of strings), and a location function.
The location function is a function taking, as parameter, a integer
corresponding to a token number in the stream (starting from zero),
and returning the location of this token in the source. This is
important to get good locations in the semantic actions of the
grammar rules.
Notice that, despite the lexer taking a character stream as
parameter, it is not mandatory to use the stream parsers technology
to write the lexer. What is important is that it does the job.
tok_using
Is a function of type:
pattern -> unit
The parameter of this function is the representation of a token
symbol or a keyword symbol in grammar rules. See the section about
token patterns.
This function is called for each token symbol and each keyword
encountered in the grammar rules of the EXTEND statement. Its goal
is to allow the lexer to check that the tokens and keywords do
respect the lexer rules. It checks that the tokens exist and are not
mispelled. It can be also used to enter the keywords in the lexer
keyword tables.
Setting it as the function that does nothing is possible, but the
check of correctness of tokens is not done.
In case or error, the function must raise the exception
"Plexing.Error" with an error message as parameter.
tok_removing
Is a function of type:
pattern -> unit
It is possibly called by the DELETE_RULE statement for tokens and
keywords no longer used in the grammar. The grammar system maintains
a number of usages of all tokens and keywords and calls this
function only when this number reaches zero. This can be interesting
for keywords: the lexer can remove them from its tables.
tok_match
Is a function of type:
pattern -> ((string * string) -> unit)
The function tells how a token of the input stream is matched
against a token pattern. Both are represented by a couple of
strings.
This function takes a token pattern as parameter and return a
function matching a token, returning the matched string or raising
the exception "Stream.Failure" if the token does not
match.
Notice that, for efficiency, it is necessary to write this function
as a match of token patterns returning, for each case, the function
which matches the token, not a function matching the token
pattern and the token together and returning a string for each
case.
An acceptable function is provided in the module "Plexing"
and is named "default_match". Its code looks like this:
value default_match =
fun
[ (p_con, "") ->
fun (con, prm) -> if con = p_con then prm else raise Stream.Failure
| (p_con, p_prm) ->
fun (con, prm) ->
if con = p_con && prm = p_prm then prm else raise Stream.Failure ]
;
tok_text
Is a function of type:
pattern -> string
Designed for error messages, it takes a token pattern as parameter
and returns the string giving its name.
It is possible to use the predefined function "lexer_text"
of the Plexing module. This function just returns the name of the
token pattern constructor and its parameter if any.
For example, with this default function, the token symbol IDENT
would be written as IDENT in error message (e.g. "IDENT expected").
The "text" function may decide to print it differently, e.g., as
"identifier".
tok_comm
Is a mutable field of type:
option (list location)
It asks the lexer (the lexer function should do it) to record the
locations of the comments in the program. Setting this field to
"None" indicates that the lexer must not record them. Setting it to
"Some []" indicated that the lexer must put the comments location
list in the field, which is mutable.
Minimalist version
If a lexer have been written, named "lexer", here is the
minimalist version of the value suitable as parameter to
"Grammar.gcreate":
The normal interface for grammars described in the previous sections
has two drawbacks:
First, the type of tokens of the lexers must be "(string *
string)"
Second, since the entry type has no parameter to specify the
grammar it is bound to, there is no static check that entries are
compatible, i.e. belong to the same grammar. The check is done at
run time.
The functorial interface resolve these two problems. The functor
takes a module as parameter where the token type has to be defined,
together with the lexer returning streams of tokens of this
type. The resulting module define entries compatible the ones to the
other, and this is controlled by the OCaml type checker.
The syntax extension must be done with the statement GEXTEND, instead
of EXTEND, and deletion by GDELETE_RULE instead of DELETE_RULE.
The lexer type
In the section about the interface with the lexer, we presented the
"Plexing.lexer" type as a record without type
parameter. Actually, this type is defined as:
where the type parameter is the type of the token, which can be any
type, different from "(string * string)", providing the
lexer function (tok_func) returns a stream of this token
type and the match function (tok_match) indicates how to
match values of this token type against the token patterns (which
remain defined as "(string * string)").
Here is an example of an user token type and the associated match
function:
type mytoken =
[ Ident of string
| Int of int
| Comma | Equal
| Keyw of string ]
;
value mymatch =
fun
[ ("IDENT", "") ->
fun [ Ident s -> s | _ -> raise Stream.Failure ]
| ("INT", "") ->
fun [ Int i -> string_of_int i | _ -> raise Stream.Failure ]
| ("", ",") ->
fun [ Comma -> "" | _ -> raise Stream.Failure ]
| ("", "=") ->
fun [ Equal -> "" | _ -> raise Stream.Failure ]
| ("", s) ->
fun
[ Keyw k -> if k = s then "" else raise Stream.Failure
| _ -> raise Stream.Failure ]
| _ -> raise (Plexing.Error "bad token in match function") ]
;
The functor parameter
The type of the functor parameter is defined as:
module type GLexerType =
sig
type te = 'x;
value lexer : Plexing.lexer te;
end;
The token type must be specified (type "te") and the lexer
also, with the interface for lexers, of the lexer type defined
above, the record fields being described in the section "interface
with the lexer", but with a general token type.
The resulting grammar module
Once a module of type "GLexerType" has been built
(previous section), it is possible to create a grammar module by
applying the functor "Grammar.GMake". For example:
module MyGram = Grammar.GMake MyLexer;
Notice that the function "Entry.parse" of this resulting
module does not take a character stream as parameter, but a value of
type "parsable". This function is equivalent to the
function "parse_parsable" of the non functorial
interface. In short, the parsing of some character stream
"cs" by some entry "e" of the example grammar
above, must be done by:
MyGram.Entry.parse e (MyGram.parsable cs)
instead of:
MyGram.Entry.parse e cs
GEXTEND and GDELETE_RULE
The "GEXTEND" and "GDELETE_RULE" statements are
also added in the expressions of the OCaml language when the syntax
extension kit "pa_extend.cmo" is loaded. They must be used
for grammars defined with the functorial interface. Their syntax
is:
See the syntax of the EXTEND statement for the meaning of the syntax
entries not defined above.
An example: arithmetic calculator
Here is a small calculator of expressions. They are given as
parameters of the command.
File "calc.ml":
#load "pa_extend.cmo";
value g = Grammar.gcreate (Plexer.gmake ());
value e = Grammar.Entry.create g "expression";
EXTEND
e:
[ [ x = e; "+"; y = e -> x + y
| x = e; "-"; y = e -> x - y ]
| [ x = e; "*"; y = e -> x * y
| x = e; "/"; y = e -> x / y ]
| [ x = INT -> int_of_string x
| "("; x = e; ")" -> x ] ]
;
END;
open Printf;
for i = 1 to Array.length Sys.argv - 1 do {
let r = Grammar.Entry.parse e (Stream.of_string Sys.argv.(i)) in
printf "%s = %d\n" Sys.argv.(i) r;
flush stdout;
};
The link needs the library "gramlib.cma" provided with Camlp5:
The ideas behind Camlp5 were expressed in the 1990s by Michel
Mauny. In 1996, Daniel de Rauglaudre implemented the first version
named Camlp4 (the four "p" standing for
"Pre-Processor-Pretty-Printer"). In 2002, Camlp4 was maintained by
Michel Mauny, and later extended by Nicolas Pouillard, with
different basic ideas, introducing some incompatibilities. In
2006, Daniel de Rauglaudre restarted this work, renaming it
Camlp5.
Since version 5.00, Camlp5 has been able to be installed in two
modes: the transitional mode and the strict
mode. When Camlp5 is installed, it works with one only of these
modes (the two modes contain indeed different definitions of some
interfaces and are incompatible with one another). The user must
choose in which mode he wants to use Camlp5.
This notion has been introduced to ensure backward compatibility of
the Camlp5 syntax tree, together with the usage of a new quotation
kit "q_ast.cmo", which allows to use Camlp5 syntax tree
quotations in user syntax (with all its possible extensions).
A short example of these syntax tree quotations:
If the syntax of the extensible
grammars has been added, it is possible to write things
like:
<:expr< EXTEND a: [ [ c = d -> $e$ ] ]; END >>;
representing the syntax tree of this statement: this is not
possible with the classical quotation kit "q_MLast.cmo"
because all quotations must be there only
in revised syntax and without syntax
extensions.
Here are the differences between the two modes:
Transitional
Compatibility
The syntax tree is fully compatible with the previous
versions of Camlp5, no changes has to be done in the users'
programs.
Quotation kit "q_ast.cmo"
The antiquotations are not available: when used, a syntax
error message is displayed.
Strict
Compatibility
The syntax tree is different, users' programs may have to be
modified, but not necessarily.
Quotation kit "q_ast.cmo"
All antiquotations are available.
In strict mode, the programs have more chances to be compatible
with the previous versions if they use syntax tree quotations rather
than syntax tree nodes. A solution is therefore to change the
expressions and patterns using nodes into expressions and patterns
using quotations (which is backward compatible).
Which mode is installed ?
To determine the mode of an installed version of Camlp5, type:
camlp5 -pmode
Selecting mode when compiling Camlp5
When compiling Camlp5 from source, the mode must first be selected
at configuration time. The configure script must be run
with one of these options:
./configure -strict
./configure -transitional
The default is "transitional", i.e. without option, the sources are
compiled in transitional mode.
The first parsing tool, the stream parsers, is the elementary
system. It is pure syntactic sugar, i.e. the code is directly
converted into basic OCaml statements: essentially functions,
pattern matchings, try. A stream parser is just a function. But the
system does not manage associativity, nor parsing level. Left
recursion results on infinite loops, just like functions whose first
action would be a call to itself.
The second parsing tool, the extensible grammars, are more
sophisticated. A grammar written with them is more readable, and
look like grammars written with tools like "yacc". They take care of
associativity, left recursion, and level of parsing. They are
dynamically extensible, what allows the syntax extensions what
Camlp5 provides for OCaml syntax.
In both cases, the input data are streams.
Camlp5 also provides:
a pretty printing module
extensible printers
The next sections give an overview of the parsing and printing
tools.
Stream parsers
The stream parsers is a system of recursive descendant
parsing. Streams are actually lazy lists. At each step, the head of
the list is compared against a stream pattern. There are
three kinds of streams parsers:
The imperative streams parsers, where
the elements are removed from the stream as long as they are
parsed. Parsers return either:
A value, in case of success,
The exception "Stream.Failure" when the parser does
not apply and no elements have been removed from the stream,
indicating that, possibly, other parsers may apply,
The exception "Stream.Error" when the parser does
not apply, but one or several elements have been removed from
the stream, indicating that nothing can to be done to make up
the error.
The functional stream parsers
where the elements are not removed from the stream during the
parsing. These parsers return a value of type "option", i.e
either:
"Some" a value and the remaining stream, in case of
success,
"None", in case of failure.
The backtracking stream parsers
which are like the functional stream parsers but with a backtracking
algorithm, testing all possibilities. These parsers also return a
value of type "option" different from the functional stream parsers,
i.e either:
"Some" a value, the remaining stream and a continuation, in
case of success,
"None", in case of failure.
The differences are about:
Syntax errors: in the imperative version,
the location of the error is clear, it is at the current position
of the stream, and the system provides a specific error message
(typically, that some "element" was "expected"). On the other
hand, in the functional and backtracking version, the position is
not clear since it returns nothing and the initial stream is
unaffected. The only solution to know where the error happened is
to analyze that stream to see how many elements have be
unfrozen. No clear error message is available, just "syntax error"
(but this could be improved in a future version).
Power: in the imperative version, when a
rule raises the exception "Stream.Error", the parsing
cannot continue. In the functional version, the parsing can
continue by analyzing the next rule with the initial unaffected
stream: this is limited backtrack. In the backtracking
version, more powerful, the parsing continues by analyzing the
next case of the previous symbol of the rule; moreover it is
possible to get the list of all possible solutions.
Neatness: functional streams are neater,
just like functional programming is neater than imperative
programming.
The imperative parsers implement what is called "predictive
parsing", i.e. recursive descendant parsing without backtrack.
In the imperative version, there also exist
lexers, a shorter syntax when the stream
elements are of the specific type 'char'.
Extensible grammars
Extensible grammars manipulate grammar entries. Grammar
entries are abstract values internally containing mutable stream
parsers. When a grammar entry is created, its internal parser is
empty, i.e. it always fails when used. A specific syntactic
construction, with the keyword "EXTEND" allows one to
extend grammar entries with new grammar rules.
In opposition to stream parsers, grammar entries manage
associativity, left factorization, and levels. Moreover, these
grammars allow optional calls, lists and lists with separators. They
are not however functions and hence cannot have parameters.
Since the internal system is stream parsers, extensible grammars
use recursive descendant parsing.
The parsers of the OCaml language in Camlp5 are written with
extensible grammars.
Pretty module
The "Pretty" module is an original tool allowing control
over the displaying of lines. The user must specify two functions
where:
the data is printed on a single line
the data is printed on several lines
The system first tries to print on a single line. At any time, if
the line overflows, i.e. if its size is greater than some "line
length" specified in the module interface, or if it contains
newlines, the function is aborted and control is given to the second
function, to print on several lines.
This is a basic, but powerful, system. It supposes that the programmer
takes care of the current indentation, and the beginning and the end of
its lines.
The module will be extended in the future to hide the management of
indendations and line continuations, and by the supply of functions
combinating the two cases above, in which the programmer can specify
the possible places where newlines can be inserted.
Extensible printers
The extensible printers are symmetric to the extensible grammars.
The extensible grammars take syntax rules and return syntax trees.
The extensible printers are actually extensible functions taking
syntax trees as parameters and returning the pretty printed
statements in strings.
The extensible printers can have printing levels, just like
grammars have parsing levels, and it is possible to take the
associativity into account by provided functions to call either the
current level or the next level.
The printers of the OCaml language are written with extensible
printers.
All about language parsing entries, language printing functions,
quotation management at parsing time, extensible directives, extensible
options, and generalities about Camlp5.
Language parsing
Main parsing functions
The two functions below are called when parsing an interface (.mli
file) or an implementation (.ml file) to build the syntax tree; the
returned list contains the phrases (signature items or structure
items) and their locations; the boolean tells whether the parser has
encountered a directive; in this case, since the directive may
change the syntax, the parsing stops, the directive is evaluated,
and this function is called again. These functions are
references, because they can be changed to use another technology
than the Camlp5 extended grammars. By default, they use the grammars
entries [implem] and [interf] defined below.
Function called when parsing an implementation (".ml") file
Grammar
value gram : Grammar.g;
Grammar variable of the language.
Entries
Grammar entries which return syntax
trees. These are set by the parsing kit of the current syntax,
through the statement EXTEND. They
are usable by other possible user syntax extensions.
value expr : Grammar.Entry.e MLast.expr;
Expressions.
value patt : Grammar.Entry.e MLast.patt;
Patterns.
value ctyp : Grammar.Entry.e MLast.ctyp;
Types.
value sig_item : Grammar.Entry.e MLast.sig_item;
Signature items, i.e. items between "sig" and
"end", or inside an interface (".mli") file.
value str_item : Grammar.Entry.e MLast.str_item;
Structure items, i.e. items between "struct" and
"end", or inside an implementation (".ml") file.
value module_type : Grammar.Entry.e MLast.module_type;
Module types, e.g. signatures, functors, identifiers.
value module_expr : Grammar.Entry.e MLast.module_expr;
Module expressions, e.g. structures, functors, identifiers.
value let_binding : Grammar.Entry.e (MLast.patt *
MLast.expr);
Specific entry for the "let binding", i.e. the association
"let pattern = expression".
value type_declaration : Grammar.Entry.e
MLast.type_decl;
Specific entry for the "type declaration", i.e. the association
"type name = type-expression"
value class_sig_item : Grammar.Entry.e
MLast.class_sig_item;
Class signature items, i.e. items of class objects types.
value class_str_item : Grammar.Entry.e
MLast.class_str_item;
Class structure items, i.e. items of class objects.
value class_type : Grammar.Entry.e MLast.class_type;
Class types, e.g. object types, class types functions,
identifiers.
value class_expr : Grammar.Entry.e MLast.class_expr;
Class expressions, e.g. objects, class functions, identifiers.
value interf : Grammar.Entry.e (list (MLast.sig_item * MLast.loc) * bool);
Interface, i.e. files with extension ".mli". The location is the
top of the tree. The boolean says whether the parsing stopped
because of the presence of a directive (which potentially could
change the syntax).
value implem : Grammar.Entry.e (list (MLast.str_item * MLast.loc) * bool);
Implementation, i.e. files with extension ".ml". Same remark
about the location and the boolean.
value top_phrase : Grammar.Entry.e (option
MLast.str_item);
Phrases of the OCaml interactive toplevel. Return "None" in case
of end of file.
value use_file : Grammar.Entry.e (list MLast.str_item *
bool);
Phrases in files included by the directive "#use". The
boolean indicates whether the parsing stopped because of a
directive (as for "interf" above).
Language printing
Main printing functions
The two function below are called when printing an interface (.mli
file) of an implementation (.ml file) from the syntax tree; the list
is the result of the corresponding parsing function. These
functions are references, to allow using other technologies than the
Camlp5 extended printers.
value print_interf :
ref (list (MLast.sig_item * MLast.loc) -> unit);
Function called when printing an interface (".mli") file
value print_implem :
ref (list (MLast.str_item * MLast.loc) -> unit);
Function called when printing an implementation (".ml") file
By default, these functions fail. The printer kit
"pr_dump.cmo" (loaded by most Camlp5 commands) sets them to
functions dumping the syntax tree in binary (for the OCaml
compiler). The pretty printer kits, such as "pr_r.cmo" and
"pr_o.cmo" set them to functions calling the predefined
printers (see next section).
Printers
Printers taking syntax trees as
parameters and returning pretty printed strings. These are set by
the printing kits, through the
statement EXTEND_PRINTER. They are
usable by other possible user printing extensions.
value pr_expr : Eprinter.t MLast.expr;
Expressions.
value pr_patt : Eprinter.t MLast.patt;
Patterns.
value pr_ctyp : Eprinter.t MLast.ctyp;
Types.
value pr_sig_item : Eprinter.t MLast.sig_item;
Signature items, i.e. items between "sig" and
"end", or inside an interface (".mli") file.
value pr_str_item : Eprinter.t MLast.str_item;
Structure items, i.e. items between "struct" and
"end", or inside an implementation (".ml") file.
value pr_module_type : Eprinter.t MLast.module_type;
Module types, e.g. signatures, functors, identifiers.
value pr_module_expr : Eprinter.t MLast.module_expr;
Module expressions, e.g. structures, functors, identifiers.
value pr_class_sig_item : Eprinter.t
MLast.class_sig_item;
Class signature items, i.e. items of class objects types.
value pr_class_str_item : Eprinter.t
MLast.class_str_item;
Class structure items, i.e. items of class objects.
value pr_class_type : Eprinter.t MLast.class_type;
Class types, e.g. object types, class types functions,
identifiers.
value pr_class_expr : Eprinter.t MLast.class_expr;
Class expressions, e.g. objects, class functions, identifiers.
Quotation management
value handle_expr_quotation : MLast.loc -> (string * string)
-> MLast.expr;
Called in the semantic actions of the rules parsing a quotation
in position of expression.
value handle_patt_quotation : MLast.loc -> (string * string)
-> MLast.patt;
Called in the semantic actions of the rules parsing a quotation
in position of pattern.
value quotation_dump_file : ref (option string);
"Pcaml.quotation_dump_file" optionally tells the
compiler to dump the result of an expander (of kind "generating a
string") if this result is syntactically incorrect. If
"None" (default), this result is not dumped. If "Some
fname", the result is dumped in the file "fname".
The same effect can be done with the option "-QD" of
Camlp5 commands.
value quotation_location : unit -> Ploc.t;
While expanding a quotation, returns the location of the
quotation text (between the quotation quotes) in the source;
raises "Failure" if not in the context of a quotation
expander.
Extensible directives and options
type directive_fun = option MLast.expr -> unit;
The type of functions called to treat a directive with its
syntactic parameter. Directives act by side effect.
value add_directive : string -> directive_fun -> unit;
Add a new directive.
value find_directive : string -> directive_fun;
Find the function associated with a directive. Raises
"Not_found" if the directive does not exists.
value add_option : string -> Arg.spec -> string -> unit;
Add an option to the command line of the Camlp5 command.
Equalities over syntax trees
These equalities skip the locations.
value eq_expr : MLast.expr -> MLast.expr -> bool;
value eq_patt : MLast.patt -> MLast.patt -> bool;
value eq_ctyp : MLast.ctyp -> MLast.ctyp -> bool;
value eq_str_item : MLast.str_item -> MLast.str_item -> bool;
value eq_sig_item : MLast.sig_item -> MLast.sig_item -> bool;
value eq_module_expr : MLast.module_expr -> MLast.module_expr -> bool;
value eq_module_type : MLast.module_type -> MLast.module_type -> bool;
value eq_class_sig_item : MLast.class_sig_item -> MLast.class_sig_item -> bool;
value eq_class_str_item : MLast.class_str_item -> MLast.class_str_item -> bool;
value eq_class_type : MLast.class_type -> MLast.class_type -> bool;
value eq_class_expr : MLast.class_expr -> MLast.class_expr -> bool;
Generalities
value version : string;
The current version of Camlp5.
value syntax_name : ref string;
The name of the current syntax. Set by the loaded syntax kit.
value input_file : ref string;
The file currently being parsed.
value output_file : ref (option string);
The output file, stdout if None (default).
value no_constructors_arity : ref bool;
True if the current syntax does not generate constructor arity,
which is the case of the normal syntax, and not of the revised
one. This has an impact when converting Camlp5 syntax tree into
OCaml compiler syntax tree.
Quotations are a syntax extension in Camlp5 to build
expressions and patterns in any syntax independant from the one of
OCaml. Quotations are expanded, i.e. transformed, at parse
time to produce normal syntax trees, like the rest of the program.
Quotations expanders are normal OCaml functions writable by
any programmer.
The aim of quotations is to use concrete syntax for manipulating
abstract values. That makes programs easier to write, read, modify,
and understand. The drawback is that quotations are linguistically isolated from the
rest of the program, in opposition to syntax
extensions, which are included in the language.
Introduction
A quotation is syntactically enclosed by specific quotes formed by
less (<) and greater (>) signs. Namely:
starting with either "<<" or "<:ident<"
where "ident" is the quotation name,
ending with ">>"
Examples:
<< \x.x x >>
<:foo< hello, world >>
<:bar< @#$%;* >>
The text between these particular parentheses can be any text. It
may contain enclosing quotations and the characters "<",
">" and "\" can be escaped by
"\". Notice that possible double-quote, parentheses, OCaml
comments do not necessarily have to balanced inside them.
As far as the lexer is concerned, a quotation is just a kind of
string.
Quotation expander
Quotations are treated at parse time. Each quotation name is
associated with a quotation expander, a function
transforming the content of the quotation into a syntax tree. There
are actually two expanding functions, depending on whether the
quotation is in the context of an expression or if it is in the
context of a pattern.
If a quotation has no associated quotation expander, a parsing
error is displayed and the compilation fails.
The quotation expander, or, rather, expanders, are functions taking
the quotation string as parameter and returning a syntax tree. There
is no constraint about which parsing technology is used. It can be
stream
parsers, extensible grammars, string
scanning, ocamllex and yacc, any.
To build syntax trees, Camlp5 provides a way to easily build them
see the chapter about them: it is possible
to build abstract syntax through concrete syntax using,
precisely... quotations.
Defining a quotation
By syntax tree
To define a quotation, it is necessary to program the quotation
expanders and to, finally, end the source code with a call to:
Quotation.add name (Quotation.ExAst (f_expr, f_patt));
where "name" is the name of the quotation, and
"f_expr" and "f_patt" the respective quotations
expanders for expressions and patterns.
After compilation of the source file (without linking, i.e.
using option "-c" of the OCaml compiler), an object file is created
(ending with ".cmo"), which can be used as syntax
extension kit of Camlp5.
By string
There is another way to program the expander: a single
function which returns, not a syntax tree, but a string which is
parsed, afterwards, by the system. This function takes a boolean as
first parameter telling whether the quotation is in position of
expression (True) or in position of a pattern (False).
Used that way, the source file must end with:
Quotation.add name (Quotation.ExStr f);
where "f" is that quotation expander. The advantage of
this second approach is that it is simple to understand and use.
The drawback is that there is no way to specify different source
locations for different parts of the quotation (what may be
important in semantic error messages).
Default quotation
It is possible to use some quotation without its name. Use for that
the variable "Quotation.default_quotation". For example,
ending a file by:
allows to use the quotation "foo" without its name, i.e.:
<< ... >>
instead of:
<:foo< ... >>
If several files set the variable "Quotation.default", the
default quotation applies to the last loaded one.
Antiquotations
A quotation obeys its own rules of lexing and parsing. Its result is
a syntax tree, of type "Pcaml.expr" if the quotation is in the
context of an expression, or "Pcaml.patt" if the quotation is
in the context of a pattern.
It can be interesting to insert portions of expressions or
patterns of the enclosing context in its own quotations. For that,
the syntax of the quotation must define a syntax
for antiquotations areas. It can be, for example:
A character introducing a variable: in this case the antiquotation
can just be a variable.
A couple of characters enclosing the antiquotations. For
example, in the predefined syntax tree
quotations, the antiquotations are enclosed with dollar
("$") signs.
In quotations, the locations in the resulting syntax tree are all
set to the location of the quotation itself (if this resulting tree
contains locations, they are overwritten with this
location). Consequently, if there are semantic (typing) errors, the
OCaml compiler will underline the entire quotation.
But in antiquotations, since they can be inserted in the resulting
syntax tree, it is interesting to keep their initial quotations. For
that, the nodes:
Let us take an example, without this node, and with this specific
node.
Let us consider an elementary quotation system whose contents is just
an antiquotation. This is just a school example, since the quotations
brackets are not necessary, in this case. But we are going to see how
the source code is underlined in errors messages.
Example without antiquotation node
The code for this quotation is (file "qa.ml"):
#load "q_MLast.cmo";
let expr s = Grammar.Entry.parse Pcaml.expr (Stream.of_string s) in
Quotation.add "a" (Quotation.ExAst (expr, fun []));
The quotation expander "expr" just takes the string
parameter (the contents of the quotation), and returns the result
of the expression parser of the OCaml language.
Compilation:
ocamlc -pp camlp5r -I +camlp5 -c qa.ml
Let us test it in the toplevel with a voluntary typing error:
$ ocaml -I +camlp5 camlp5r.cma
Objective Caml version ...
Camlp5 Parsing version ...
# #load "qa.cmo";
# let x = "abc" and y = 25 in <:a< x ^ y >>;
Characters 28-41:
let x = "abc" and y = 25 in <:a< x ^ y >>;
^^^^^^^^^^^^^
This expression has type int but is here used with type string
We observe that the full quotation is underlined, although it concerns
only the variable "y".
Example with antiquotation node
Let us consider this second version (file "qb.ml"):
#load "q_MLast.cmo";
let expr s =
let ast = Grammar.Entry.parse Pcaml.expr (Stream.of_string s) in
let loc = Ploc.make 1 0 (0, String.length s) in
<:expr< $anti:ast$ >>
in
Quotation.add "b" (Quotation.ExAst (expr, fun []));
As above, the quotation expander "expr" takes the string
parameter (the contents of the quotation) and applies the expression
parser of the OCaml language. Its result, instead of being
returned as it is, is enclosed with the antiquotation node. (The
location built is the location of the whole string.)
Compilation:
ocamlc -pp camlp5r -I +camlp5 -c qb.ml
Now the same test gives:
$ ocaml -I +camlp5 camlp5r.cma
Objective Caml version ...
Camlp5 Parsing version ...
# #load "qb.cmo";
# let x = "abc" and y = 25 in <:b< x ^ y >>;
Characters 37-38:
let x = "abc" and y = 25 in <:b< x ^ y >>;
^
This expression has type int but is here used with type string
Notice that, now, only the variable "y" is underlined.
In conclusion
In the resulting tree of the quotation expander:
only portions of this tree, which are sons of the expr and patt
antiquotation nodes, have the right location they have in the
quotation (provided the quotation expander gives it the right
location of the antiquation in the quotation),
the other nodes inherit, as location, the location of the full
quotation.
Locations in quotations and antiquotations
This section explains in details the problem of locations in
quotations and antiquotations. It is designed for programmers of
quotation expanders.
Locations are the difficult point in quotations and antiquotations.
If they are not set correctly, error messages may highlight wrong parts
of the source.
The locations are controlled:
for syntax errors: by the exception "Ploc.Exc", raised
by the function "Ploc.raise",
for typing errors, by the syntax tree nodes
"<:expr< $anti:...$ >>" and
"<:meta< $anti:...$ >>".
If the quotation expander never uses them, all syntax and typing errors
highlight the whole quotation.
Remark that in extensible grammars,
syntax errors are automatically enclosed by the exception
"Ploc.Exc". Therefore, if the quotation is parsed by an
extensible grammar entry, this exception can be raised.
In the syntax tree nodes
"<:expr< $anti:...$ >>" and
"<:meta< $anti:...$ >>", the location is
indicated by the implicit variable named "loc". Their usage
is therefore something like:
let loc = ...computation of the location... in
<:expr< $anti:...$ >>
In the quotation
All locations must be computed relatively to the quotation
string. The quotation string is the string between
"<<" or "<:name<" and ">>"
(excluded), the first character of this string being at location
zero.
The quotation system automatically shifts all locations with the
location of the quotation: the programmer of the quotation expander
does not therefore need to care about where the quotation appears
in the source.
In antiquotations
In antiquotations, it is important to control how the antiquotation
string is parsed. For example, if the function parsing the antiquotation
string raises "Ploc.Exc", the location of this exception must
be shifted with the location of the antiquotation in the quotation.
For example, let us suppose that the source contains:
<< abc^ijk^(xyz) >>
where the antiquotation is specified between the caret ("^")
characters. The antiquotation string is "ijk". It can be built
in the quotation expander by:
<:expr< ijk >>
If used just like this, without the "<:expr< $anti:x$ >>",
in case of typing error (for example if the variable "ijk" is
unbound), the OCaml error message will be:
<< abc^ijk^(xyz) >>
^^^^^^^^^^^^^^^^^^^
Unbound value ijk
To put a location to "ijk", since its location in the quotation
is "(5, 8)" (the "i" being the 5th characater of the
quotation string, starting at zero), the quotation expander can build
it like this:
let e = <:expr< ijk >> in
let loc = Ploc.make_unlined (5, 8) in
<:expr< $anti:e$ >>
In this case, the possible typing error message will be:
<< abc^ijk^(xyz) >>
^^^
Unbound value ijk
This case is simple, since the antiquotation is just an identifier,
and there is no parser computing it.
If the antiquotation has to be parsed, for example if it is a
complicated expression, there are two points to care about:
First, the syntax error messages. If the parser of the
antiquotation raises "Ploc.Exc", its location is relative
to the antiquotation. It must therefore be shifted to correspond to
a location in the quotation. If "f" is the parsing
function and "sh" the shift of the antiquotation
in the quotation (whose value is "5" in the
example), the code must be something like:
try f () with
[ Ploc.Exc loc exc -> Ploc.raise (Ploc.shift sh loc) exc ]
Second, the typing error messages. Here, the above code with
"<:expr< $anti:e$ >>" can apply to the
resulting tree.
The complete code, taking the possible syntax error messages and
the possible typing error messages into account, can be (where
"len" is the antiquotation length):
let e =
try f () with
[ Ploc.Exc loc exc -> Ploc.raise (Ploc.shift sh loc) exc ]
in
let loc = Ploc.make_unlined (sh, sh + len) in
<:expr< $anti:e$ >>
Located errors
If the quotation expander raises an exception, by default, the
whole quotation is underlined:
$ cat foo.ml
#load "q_MLast.cmo";
let expr s = raise (Failure "hello") in
Quotation.add "a" (Quotation.ExAst (expr, fun []));
$ ocaml -I +camlp5 camlp5r.cma
Objective Caml version ...
Camlp5 Parsing version ...
# #use "foo.ml";
- : unit = ()
# <:a< good morning >>;
Toplevel input:
# <:a< good morning >>;
^^^^^^^^^^^^^^^^^^^^
While expanding quotation "a":
Failure: hello
To specify a location of the exception, use the function
"Ploc.raise" instead of "raise". In this
example, let us suppose that we want only the characters
5 to 7 are underlined. This can be done like this:
$ cat foo.ml
#load "q_MLast.cmo";
let expr s = Ploc.raise (Ploc.make 1 0 (5, 7)) (Failure "hello") in
Quotation.add "a" (Quotation.ExAst (expr, fun []));
$ ledit ocaml -I +camlp5 camlp5r.cma
Objective Caml version ...
Camlp5 Parsing version ...
# #use "foo.ml";
- : unit = ()
# <:a< good morning >>;
Toplevel input:
# <:a< good morning >>;
^^
While expanding quotation "a":
Failure: hello
The Quotation module
type expander =
[ ExStr of bool -> string -> string
| ExAst of (string -> MLast.expr * string -> MLast.patt) ]
;
Is the type for quotation expander kinds:
"Quotation.ExStr exp" corresponds to an expander
"exp" returning a string which is parsed by the
system to create a syntax tree. Its boolean parameter tells
whether the quotation is in position of an expression (True)
or in position of a pattern (False). Quotations expanders
created this way may work for some particular OCaml syntax,
and not for another one (e.g. may work when used with revised
syntax and not when used with normal syntax, and
conversely).
"Quotation.ExAst (expr_exp, patt_exp)" corresponds
to expanders returning syntax trees, therefore not necessitating
to be parsed afterwards. The function "expr_exp" is
called when the quotation is in position of an expression, and
"patt_exp" when the quotation is in position of a
pattern. Quotation expanders created this way are independent
from the enclosing syntax.
value add : string -> expander -> unit;
"Quotation.add name exp" adds the quotation "name"
associated with the expander "exp".
value find : string -> expander;
"Quotation.find name" returns the quotation expander of
the given name.
value default : ref string;
The name of the default quotation : it is possible to use this
quotation between "<<" and ">>"
without having to specify its name.
value translate : ref (string -> string);
Function translating quotation names; default = identity. Used
in the predefined quotation "q_phony.cmo". See below.
Some other useful functions for quotations are defined in the
module "Pcaml". See the chapter "The
Pcaml module", section "Quotation management".
Predefined quotations
q_MLast.cmo
This extension kit add quotations of OCaml syntax tree, allowing to
use concrete syntax to represent abstract syntax. See the chapter
Syntax tree.
q_ast.cmo
As with the previous quotation, this extension kit add quotations of
OCaml syntax tree, but in the current user syntax with all
extensions, the previous one being restricted to revised syntax
without extension. See the chapters
Syntax tree
and Syntax tree quotations in user
syntax.
q_phony.cmo
This extension kit is designed for pretty printing and must be loaded
after a language pretty printing kit (in normal or in revised syntax).
It prevents the expansions of quotations, transforming them
into variables. The pretty printing then keeps the initial (source) form.
The macros (extension
"pa_macro.cmo") are also displayed in their initial form,
instead of expanded.
A full example: lambda terms
This example allows to represent lambda terms by a concrete syntax
and to be able to combine them using antiquotations.
A lambda term is defined like this:
type term =
[ Lam of string and term
| App of term and term
| Var of string ]
;
Examples:
value fst = Lam "x" (Lam "y" (Var "x"));
value snd = Lam "x" (Lam "y" (Var "y"));
value delta = Lam "x" (App (Var "x") (Var "x"));
value omega = App delta delta;
value comb_s =
Lam "x"
(Lamb "y"
(Lamb "z"
(App (App (Var "x") (Var "y")) (App (Var "x") (Var "z")))));
Since combinations of lambda term may be complicated, The idea is to
represent them by quotations in concrete syntax. We want to be able to
write the examples above like this:
value fst = << \x.\y.x >>;
value snd = << \x.\y.y >>;
value delta = << \x.x x >>
value omega = << ^delta ^delta >>;
value comb_s = << \x.\y.\z.(x y)(x z) >>;
which is a classic representation of lambda terms.
Notice, in the definition of "omega", the use of the
caret ("^") sign to specify antiquotations. Notice also the
simplicity of the representation of the expression defining
"comb_s".
Here is the code of the quotation expander, term.ml. The expander
uses the extensible grammars. It has its
own lexer (using the stream lexers)
because the lexer of OCaml programs ("Plexer.gmake ()"),
cannot recognize the backslashes alone.
Lexer
(* lexer *)
#load "pa_lexer.cmo";
value rec ident =
lexer
[ [ 'a'-'z' | 'A'-'Z' | '0'-'9' | '-' | '_' | '\128'-'\255' ]
ident!
| ]
;
value empty _ = parser [: _ = Stream.empty :] -> [];
value rec next_tok =
lexer
[ "\\" -> ("BSLASH", "")
| "^" -> ("CARET", "")
| 'a'-'z' ident! -> ("IDENT", $buf)
| "(" -> ("", "(")
| ")" -> ("", ")")
| "." -> ("", ".")
| empty -> ("EOS", "")
| -> raise (Stream.Error "lexing error: bad character") ]
;
value rec skip_spaces =
lexer
[ " "/ skip_spaces!
| "\n"/ skip_spaces!
| "\r"/ skip_spaces! | ]
;
value record_loc loct i (bp, ep) = do {
if i >= Array.length loct.val then do {
let newt =
Array.init (2 * Array.length loct.val + 1)
(fun i ->
if i < Array.length loct.val then loct.val.(i)
else Ploc.dummy)
in
loct.val := newt;
}
else ();
loct.val.(i) := Ploc.make_unlined (bp, ep)
};
value lex_func cs =
let loct = ref [| |] in
let ts =
Stream.from
(fun i -> do {
ignore (skip_spaces $empty cs : Plexing.Lexbuf.t);
let bp = Stream.count cs in
let r = next_tok $empty cs in
let ep = Stream.count cs in
record_loc loct i (bp, ep);
Some r
})
in
let locf i =
if i < Array.length loct.val then loct.val.(i) else Ploc.dummy
in
(ts, locf)
;
value lam_lex =
{Plexing.tok_func = lex_func;
Plexing.tok_using _ = (); Plexing.tok_removing _ = ();
Plexing.tok_match = Plexing.default_match;
Plexing.tok_text = Plexing.lexer_text;
Plexing.tok_comm = None}
;
Parser
(* parser *)
#load "pa_extend.cmo";
#load "q_MLast.cmo";
value g = Grammar.gcreate lam_lex;
value expr_term_eos = Grammar.Entry.create g "term";
value patt_term_eos = Grammar.Entry.create g "term";
EXTEND
GLOBAL: expr_term_eos patt_term_eos;
expr_term_eos:
[ [ x = expr_term; EOS -> x ] ]
;
expr_term:
[ [ BSLASH; i = IDENT; "."; t = SELF -> <:expr< Lam $str:i$ $t$ >> ]
| [ x = SELF; y = SELF -> <:expr< App $x$ $y$ >> ]
| [ i = IDENT -> <:expr< Var $str:i$ >>
| CARET; r = expr_antiquot -> r
| "("; t = SELF; ")" -> t ] ]
;
expr_antiquot:
[ [ i = IDENT ->
let r =
let loc = Ploc.make_unlined (0, String.length i) in
<:expr< $lid:i$ >>
in
<:expr< $anti:r$ >> ] ]
;
patt_term_eos:
[ [ x = patt_term; EOS -> x ] ]
;
patt_term:
[ [ BSLASH; i = IDENT; "."; t = SELF -> <:patt< Lam $str:i$ $t$ >> ]
| [ x = SELF; y = SELF -> <:patt< App $x$ $y$ >> ]
| [ i = IDENT -> <:patt< Var $str:i$ >>
| CARET; r = patt_antiquot -> r
| "("; t = SELF; ")" -> t ] ]
;
patt_antiquot:
[ [ i = IDENT ->
let r =
let loc = Ploc.make_unlined (0, String.length i) in
<:patt< $lid:i$ >>
in
<:patt< $anti:r$ >> ] ]
;
END;
value expand_expr s =
Grammar.Entry.parse expr_term_eos (Stream.of_string s)
;
value expand_patt s =
Grammar.Entry.parse patt_term_eos (Stream.of_string s)
;
Quotation.add "term" (Quotation.ExAst (expand_expr, expand_patt));
Quotation.default.val := "term";
Compilation and test
Compilation:
ocamlc -pp camlp5r -I +camlp5 -c term.ml
Example, in the toplevel, including a semantic error, correctly
underlined, thanks to the antiquotation nodes:
$ ocaml -I +camlp5 camlp5r.cma
Objective Caml version ...
Camlp5 Parsing version ...
# #load "term.cmo";
# type term =
[ Lam of string and term
| App of term and term
| Var of string ]
;
type term =
[ Lam of string and term | App of term and term | Var of string ]
# value comb_s = << \x.\y.\z.(x y)(x z) >>;
value comb_s : term =
Lam "x"
(Lam "y"
(Lam "z" (App (App (Var "x") (Var "y")) (App (Var "x") (Var "z")))))
# value omega = << ^delta ^delta >>;
Characters 18-23:
value omega = << ^delta ^delta >>;
^^^^^
Unbound value delta
# value delta = << \x.x x >>;
value delta : term = Lam "x" (App (Var "x") (Var "x"))
# value omega = << ^delta ^delta >>;
value omega : term =
App (Lam "x" (App (Var "x") (Var "x")))
(Lam "x" (App (Var "x") (Var "x")))
Extensible functions allows the definition of pattern matching functions
which are extensible by adding new cases that are inserted
automatically at the proper place by comparing the patterns. The pattern
cases are ordered according to syntax trees representing them, "when"
statements being inserted before the cases without "when".
Notice that extensible functions are functional: when extending a
function, a new function is returned.
The extensible functions are used in
the pretty printing system of Camlp5.
Syntax
The syntax of the extensible functions, when loading
"pa_extfun.cmo", is the following:
It is an extension of the same syntax as the "match" and "try"
constructions.
Semantics
The statement "extend" defined by the syntax takes an extensible
function and return another extensible function with the new match
cases inserted at the proper place within the initial extensible
function.
Extensible functions are of type "Extfun.t a b", which
is an abstract type, where "a" and "b" are
respectively the type of the patterns and the type of the
expressions. It corresponds to a function of type "a ->
b".
The function "Extfun.apply" takes an extensible
function as parameter and returns a function which can be applied
like a normal function.
The value "Extfun.empty" is an empty extensible
function, of type "Extfun.t 'a 'b". When applied with
"Extfun.apply" and a parameter, it raises the exception
"Extfun.Failure" whatever the parameter.
For debugging, it is possible to use the function
"Extfun.print" which displays the match cases of the
extensible functions. (Only the patterns are displayed in
clear text, the associated expressions are not.)
The match cases are inserted according to the following rules:
The match cases are inserted in the order they are defined in
the syntax "extend"
A match case pattern with "when" is inserted before a match case
pattern without "when".
Two match cases patterns both with "when" or both without "when"
are inserted according to the alphabetic order of some internal
syntax tree of the patterns where bound variables names are
not taken into account.
If two match cases patterns without "when" have the same
patterns internal syntax tree, the initial one is silently
removed.
If two match cases patterns with "when" have the same patterns
internal syntax tree, the new one is inserted before the old
one.
Notice that the identifiers "DEFINE", "UNDEF",
"IFDEF", "IFNDEF", "ELSE",
"END", "OR", "AND" and "NOT" are
new keywords (they cannot be used as identifiers of constructors or
modules.
However, the identifiers "__FILE__" and
"__LOCATION__" and the new defined macro names are not new
identifiers.
Added command options
The parsing kit "pa_macro.cmo" also add two options usable
in all Camlp5 commands:
-D uident
Define the uident in question like would have been a DEFINE
(without parameter) in the code.
-U uident
Undefine the uident in question like would have been a UNDEF in
the code.
-defined
Print the defined macros and exit.
Semantics
The statement "DEFINE" defines a new macro with optional
parameters and an optional value. The macro name must start with an
uppercase letter.
The test of a macro can be done either:
in structure items
in signature items
in expressions
in patterns
in a constructor declaration
in a match case
using the statement "IFDEF". Its non-existence can be
tested by "IFNDEF". In expressions and patterns, the
"ELSE" part is required, not in structure items.
The expression behind the "IFDEF" or the "IFNDEF"
statement may use the operators "OR", "AND" and
"NOT" and contain parentheses.
Notice that in an "IFDEF" where the value is True
(resp. False), the "ELSE" (resp "THEN") part does
not need to be semantically correct (well typed), since it does not
appear in the resulting syntax tree. Same for "IFNDEF" and
for possible macros parameters which are not used in the
associated expression.
If a macro is defined twice, its first version is lost.
The statement "UNDEF" removes a macro definition.
When associated with a value, the "DEFINE" macro acts like
a variable (or like a function call if it has parameters), except that
the parameters are evaluated at parse time and can also be used also
in pattern positions. Notice that this is a way to define constants
by name in patterns. For example:
DEFINE WW1 = 1914;
DEFINE WW2 = 1939;
value war_or_year =
fun
[ WW1 -> "world war I"
| WW2 -> "world war II"
| _ -> "not a war" ]
;
In the definition of a macro, if the expression contains an
evaluation, the evaluation is not done by Camlp5 but just
transmitted as code. In this case, it does not work in pattern
position. Example in the toplevel:
# DEFINE PLUS(x, y) = x + y;
# PLUS(3, 4);
- : int = 7
# fun [ PLUS(3, 4) -> () ];
Toplevel input:
# fun [ PLUS(3, 4) -> () ];
^^^^^^^^^^
Failure: this is not a constructor, it cannot be applied in a pattern
On the other hand, if the expression does not contain evaluation,
this is possible:
The macro "__FILE__" is replaced by the current compiled source
file name. In the OCaml toplevel, its value is the empty string.
The macro "__LOCATION__" is replaced by the the current location
(two integers in number of characters from the beginning of the file,
starting at zero) of the macro itself.
In signatures, the macro definitions can return types which can be
used in type definitions.
In constructor declarations and in match cases, it is possible to
conditionally define some cases by "IFDEF" or
"IFNDEF". For example:
type t =
[ A of int
| IFNDEF FOO THEN
B of string
END
| C of bool ]
;
match x with
[ A i -> j
| IFNDEF FOO THEN
B s -> toto
END
| C b -> e ];
Predefined macros
The macro "CAMLP5" is always predefined.
The macro "OCAML_oversion" is predefined, where
"oversion" is the OCaml version the Camlp5 program has been
compiled with, where all characters but numbers are replaced by
underscores. For example, if using OCaml 3.09.3, the macro
"OCAML_3_09_3" is defined.
Moreover, for some Camlp5 versions (and all the versions
which follows them), the macro "CAMLP5_version" is defined
where "version" is the Camlp5 version where all characters
but numbers are replaced by underscores. For example, in version
4.02, the macro "CAMLP5_4_02" had been defined and this
macro have appeared in all versions of Camlp5 since 4.02.
To see which macros are predefined, type:
camlp5r pa_macro.cmo -defined
camlp5-6.14/doc/htmlp/ml_ast.html 0000664 0001750 0001750 00000012047 12556457206 015733 0 ustar roglo roglo
Abstract tree in concrete syntax
Syntax tree
In Camlp5, one often uses syntax trees. For example, in grammars of
the language (semantic actions), in pretty printing (as patterns),
in optimizing syntax code (typically stream parsers). Syntax trees
are mainly defined by sum types, one for each kind of tree:
"expr" for expressions, "patt" for patterns,
"ctyp" for types, "str_item" for structure items,
and so on. Each node corresponds to a possible value of this
type.
Transitional and Strict modes
Since version 5.00 of Camlp5, the definition of the syntax tree has
been different according to the mode Camlp5 has been installed:
In transitional mode,
this definition is the same than in the previous Camlp5
versions.
In strict mode, many
constructor parameters have a type enclosed by the predefined type
"Ploc.vala".
The advantage of the transitional mode is that the abstract syntax
tree is fully compatible with previous versions of Camlp5. Its
drawback is that when using the syntax tree
quotations in user syntax, it is not possible to use
antiquotations, a significatant limitation.
In strict mode, the abstract syntax is not compatible with versions
of Camlp5 previous to 5.00. Most of the parameters of the
constructor are enclosed with the type "Ploc.vala" whose
aim is to allow nodes to be either of the type argument, or an
antiquotation. In this mode, the syntax tree quotations in user
syntax can be used, with the same power of the previous syntax tree
quotations provided by Camlp5.
Compatibility
As there is a problem of compatibility in strict mode, a good
solution, for the programmer, is to always use syntax trees using
quotations, which is backward compatible. See the chapter
about syntax tree in strict mode.
For example, if the program made a value of the syntax tree of the
"let" statement, like this:
ExLet loc rf pel e
In strict mode, to be equivalent, this expression should be
rewritten like this:
ExLet loc (Ploc.VaVal rf) (Ploc.VaVal pel) e
where "Ploc.VaVal" is a value of the type "vala"
defined in the module Ploc (see its
section "pervasives").
This necessary conversion is a drawback if the programmer wants
that his programs remain compilable with previous versions of
Camlp5.
The recommended solution is to always write this code with
quotations, namely, in this example, like this:
<:expr< let $flag:rf$ $list:pel$ in $e$ >>
The quotation expanders ensure that, in strict mode, the variable
"rf" is still of type "bool", and that the variable "pel"
of type "list (patt * expr)", by enclosing them around
"Ploc.VaVal".
In transitional mode, it is equivalent to the first form above. In
strict mode, it is equivalent to the second form. And the previous
versions of Camlp5 also recognizes this form.
Two quotations expanders
Camlp5 provides two quotations expanders of syntax trees:
"q_MLast.cmo" and "q_ast.cmo".
Both allow writing syntax trees in concrete syntax as explained in
the previous section.
The first one, "q_MLast.cmo" requires that the contents of
the quotation be in revised syntax
without any syntax extension (even the stream
parsers). It works in transitional and in strict modes.
The second one, "q_ast.cmo" requires that the contents of
the quotation be in the current user syntax (normal, revised, lisp,
scheme, or other) and can accept all the syntax extensions he added
to compile his program. It fully works only in strict mode. In
transitional mode, the antiquotations are not available.
Just some ideas... things to implement or to think about...
extensible lexers
Extensible lexers would be an interesting
extension. And possibly lexers using regular expressions
(extensible if possible, otherwise as a different module).
utf-8
The Camlp5 lexer for OCaml programs (module Plexer) allows utf-8
characters. Since utf-8 seems to have some success among unicode
formats, perhaps a reflection to add greek characters and/or real
utf-8 arrows in the syntax (in particular in types) would be
interesting.
Directives in normal or revised syntax are statements at top level,
or, more generally, as signature items or structure items, which
stops the preprocessor for evaluate things - which can change the
behaviour of the preprocessor, for example to add syntax, load
syntax extensions and so on. After the directive is evaluated,
the parsing resumes.
Directives begin with '#', followed by an identifier, and,
optionnally by an expression. They are usable in source files the
and generally in the ocaml toplevel too.
Four predefined directives exist: #load, #directory, #option and
#use. It is also possible to add other directives. An example of
that is the parsing kit pa_pragma.cmo
which adds a new directive #pragma.
Predefined directives
The predefined directives are:
#load "name"
Loads an object file (ocaml cmo or cma file) in the core,
evaluating it. This is typically to be used in the ocaml toplevel
to add an syntax extension kit.
# #load "q_MLast.cmo";
# value loc = Ploc.dummy;
value loc : Ploc.t = <abstr>
# <:expr< fun x -> x >>;
- : MLast.expr =
MLast.ExFun <abstr>
(Ploc.VaVal
[(MLast.PaLid <abstr> (Ploc.VaVal "x"), Ploc.VaVal None,
MLast.ExLid <abstr> (Ploc.VaVal "x"))])
In a source file, the '#load' directive is equivalent to put the
object file as camlp5 parameter among the 'load options':
$ cat myfile.ml
#load "pa_extend.cmo";
value g = Grammar.gcreate (Plexer.gmake ());
value e = Grammar.Entry.create g "e";
EXTEND e: [[ i = INT -> i ]]; END;
$ ocamlc -pp camlp5r -I +camlp5 -c myfile.ml
which is equivalent to, without using '#load':
$ cat myfile2.ml
value g = Grammar.gcreate (Plexer.gmake ());
value e = Grammar.Entry.create g "e";
EXTEND e: [[ i = INT -> i ]]; END;
Adds a new directory in the camlp5 path searching for loaded files
(using the directive #load above). This is equivalent to the option
'-I' of the camlp5 command. See the camlp5 man page.
#use "name"
Loads a source file name. Useful in the ocaml toplevel to test a
source file.
#option "option"
Adds an option as if it were added in camlp5 command line (to be
used in a source file, not in the ocaml toplevel). Implemented
only on options without an extra parameter.
For example, the syntax
kit pa_extend.cmo adds an option named
'-split_ext'. This can be viewed through the command:
camlp5r pa_extend.cmo -help
Thanks to the directive '#option', the following command in the
shell:
$ camlp5r pa_extend.cmo -split_ext file.ml
can be used only as:
$ camlp5r file.ml
providing the file starts with:
#load "pa_extend.cmo";
#option "-split_ext";
User directives
It is possible to add any extra directive. The syntax
kit pragma.cmo, for example, adds a
directive named '#pragma'.
A user syntax kit can add its directives using the function
"add_directive" of the module Pcaml.
camlp5-6.14/doc/htmlp/html2texi 0000775 0001750 0001750 00000012273 12556457206 015435 0 ustar roglo roglo #!/bin/sh
# html2texi,v
FILES1="index.html strict.html ptools.html"
FILES2="parsers.html lexers.html fparsers.html bparsers.html grammars.html"
FILES3="printers.html pprintf.html pretty.html"
FILES4="locations.html ml_ast.html ast_transi.html ast_strict.html q_ast.html pcaml.html directives.html syntext.html opretty.html redef.html quot.html revsynt.html scheme.html macros.html pragma.html extfun.html"
FILES5="commands.html library.html sources.html about.html"
FILES="$FILES1 1 $FILES2 2 $FILES3 3 $FILES4 4 $FILES5"
VERSION="$(grep "value version =" ../../main/pcaml.ml | sed -e 's/^[^"]*"\([^"]*\).*$/\1/')"
/bin/echo '\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename camlp5.info'
/bin/echo "@settitle Camlp5 manual $VERSION"
/bin/echo '@c %**end of header'
/bin/echo '
@titlepage
@title Camlp5 manual
@author Daniel de Rauglaudre
@end titlepage'
/bin/echo '
@node Top
@top Camlp5: a preprocessor pretty printer for OCaml'
/bin/echo "@noindent Version $VERSION"
/bin/echo '
@menu'
(
/bin/echo 'sed \'
for i in $FILES; do
if [ "$i" = "1" ]; then
:
elif [ "$i" = "2" ]; then
:
elif [ "$i" = "3" ]; then
:
elif [ "$i" = "4" ]; then
:
elif [ "$i" = "5" ]; then
:
else
N=$(grep '
]*>||' -e 's|
||')
/bin/echo " -e 's/@pxref{$i}/@pxref{$N}/' \\"
fi
done
/bin/echo
) > tmp
for i in $FILES; do
if [ "$i" = "1" ]; then
/bin/echo
/bin/echo 'Parsing Tools'
/bin/echo
elif [ "$i" = "2" ]; then
/bin/echo
/bin/echo 'Printing Tools'
/bin/echo
elif [ "$i" = "3" ]; then
/bin/echo
/bin/echo 'Language extensions'
/bin/echo
elif [ "$i" = "4" ]; then
/bin/echo
/bin/echo 'Appendix'
/bin/echo
else
N=$(grep '
]*>||' -e 's|
||')
/bin/echo "* $N::"
fi
done
/bin/echo '
@end menu'
/bin/echo
chapter=chapter
for i in $FILES; do
if [ "$i" = "1" ]; then
:
elif [ "$i" = "2" ]; then
:
elif [ "$i" = "3" ]; then
:
elif [ "$i" = "4" ]; then
:
elif [ "$i" = "5" ]; then
chapter=appendix
else
N=$(grep '
It is possible to write OCaml programs with Scheme or Lisp
syntax. They are close to one another, both using parentheses
to identify and separate statements.
Common
The syntax extension kits are named "pa_scheme.cmo" and
"pa_lisp.cmo". The sources (same names ending with ".ml"
in the Camlp5 sources), are written in their own syntax. They are
boostrapped thanks to the versions being written in revised syntax and to
the Camlp5 pretty printing system.
In the OCaml toplevel, it is possible to use them by loading
"camlp5r.cma" first, then "pa_lisp.cmo" or
"pa_scheme.cmo" after:
The grammar of Scheme and Lisp are relatively simple, just reading
s-expressions. The syntax tree nodes are created in the semantic
actions. Because of this, these grammars are hardly extensible.
However, the syntax extension EXTEND ("pa_extend.cmo" in
extensible grammars) works for them:
only the semantic actions need be written with the Scheme or Lisp
syntax. Stream parsers are also implemented.
Warning: these syntaxes are incomplete, but can be completed, if
Camlp5 users are insterested.
The main command of Camlp5 is "camlp5". It is an OCaml
program in bytecode (compiled with ocamlc, not ocamlopt), able to
dynamically load OCaml object files (ending with ".cmo" and
".cma").
Most other Camlp5 commands derive from that one: they are the
command "camlp5" with some implicitely applied
parameters.
Two other commands are provided: "mkcamlp5" and
"mkcamlp5.opt". They allow to create camlp5 executables
with already loaded kits.
All commands have an option "-help" which display all
possible command parameters and options. Notice that some parameters
(the parsing and pretting kits) may add new options. For example,
the command:
camlp5 pr_r.cmo -help
prints more lines than just:
camlp5 -help
The first parameter ("load options") allows to specify parsing and
printing kits (".cmo" and ".cma" files) which are loaded inside the
"camlp5" core before any action. Other options may
follow.
Parsing and Printing Kits
Parsing kits
language parsing kits
pa_r.cmo
Revised syntax (without parsers).
pa_rp.cmo
Add revised syntax parsers.
pa_o.cmo
Normal syntax (without parsers). Option added:
-no_quot
don't parse quotations, allowing to use,
e.g. "<:>" as token.
pa_op.cmo
Add normal syntax parsers.
pa_oop.cmo
Add normal syntax parsers without code optimization.
pa_lexer.cmo
Add stream lexers.
extensible grammars
pa_extend.cmo
Add the EXTEND statement. Options added:
-split_ext
split EXTEND by functions to turn around a PowerPC problem.
-quotify
generate code for quotations (internally used to synchronize
q_MLast and pa_r)
-meta_action
undocumented (internally used for compiled version)
pa_extfold.cmo
Add the specific symbols FOLD0 and FOLD1 to the EXTEND
statement.
extensible functions and printers
pa_extfun.cmo
Add the extensible function ("extfun" statement).
pa_extprint.cmo
Add the EXTEND_PRINTER statement.
functional parsers
pa_fstream.cmo
Add the functional parsers ("fparser" statement) and the
backtracking parsers ("bparser" statement).
other languages
pa_lisp.cmo
Lisp syntax.
pa_scheme.cmo
Scheme syntax.
pa_sml.cmo
SML syntax.
other parsing kits
pa_lefteval.cmo
Add guarantee of left evaluation in functions calls.
pa_macro.cmo
Add macros. Options added:
-D <string>
define for IFDEF statement
-U <string>
undefine for IFDEF statement
-defined
print the defined macros and exit
pa_pragma.cmo
Add pragma directive: evaluations at parse time
Printing kits
language printing kits
pr_r.cmo
Display in revised syntax. Added options:
-flag <str>
Change pretty printing behaviour according to
"<str>":
A/a enable/disable all flags
C/c enable/disable comments in phrases
D/d enable/disable allowing expanding 'declare'
E/e enable/disable equilibrate cases
L/l enable/disable allowing printing 'let..in' horizontally
S/s enable/disable printing sequences beginners at end of lines
default setting is "aS".
-wflag <str>
Change displaying 'where' statements instead of 'let':
A/a enable/disable all flags
I/i enable/disable 'where' after 'in'
L/l enable/disable 'where' after 'let..='
M/m enable/disable 'where' after 'match' and 'try'
P/p enable/disable 'where' after left parenthesis
R/r enable/disable 'where' after 'record_field..='
S/s enable/disable 'where' in sequences
T/t enable/disable 'where' after 'then' or 'else'
V/v enable/disable 'where' after 'value..='
W/w enable/disable 'where' after '->'
default setting is "Ars".
-l <length>
Maximum line length for pretty printing (default 78)
-sep_src
Read source file for text between phrases (default).
-sep <string>
Use this string between phrases instead of reading source.
pr_ro.cmo
Add display objects, labels and variants in revised syntax.
pr_rp.cmo
Add display parsers with their (revised) syntax.
pr_o.cmo
Display in normal syntax. Added options:
-flag <str>
Change pretty printing behaviour according to
<str>:
A/a enable/disable all flags
C/c enable/disable comments in phrases
E/e enable/disable equilibrate cases
L/l enable/disable allowing printing 'let..in' horizontally
M/m enable/disable printing double semicolons
default setting is "Am".
-l <length>
Maximum line length for pretty printing (default 78)
-sep_src
Read source file for text between phrases (default).
-sep <string>
Use this string between phrases instead of reading source.
pr_op.cmo
Add displaying parsers with their (normal) syntax.
extensible parsers
pr_extend.cmo
Add the displaying of EXTEND statements in their initial
syntax.Option added:
-no_slist
Don't reconstruct SLIST, SOPT, SFLAG
extensible functions and printers
pr_extfun.cmo
Add displaying extensible functions ("extfun" statement) in
their initial syntax.
pr_extprint.cmo
Add displaying extensible printers ("EXTEND_PRINTER" statement)
in their initial syntax.
other language
pr_scheme.cmo
Display in Scheme syntax. Option added:
-l <length>
Maximum line length for pretty printing (default 78)
-sep <string>
Use this string between phrases instead of reading source.
pr_schemep.cmo
Add display parsers with their (Scheme) syntax.
other printing kits
pr_depend.cmo
Display dependencies. Option added:
-I dir
Add "dir" to the list of search directories.
pr_dump.cmo
Dump the syntax tree in binary (for the OCaml compiler)
pr_null.cmo
No output.
Quotations expanders
q_MLast.cmo
Syntax tree quotations. Define the quotations named: "expr",
"patt", "ctyp", "str_item", "sig_item", "module_type", "module_expr",
"class_type", "class_expr", "class_sig_item", "class_str_item",
"with_constr" and "poly_variant".
q_phony.cmo
Transform quotations into phony variables to be able to pretty
print the quotations in their initial form (not suitable for
compilation)
Commands
camlp5r
Shortcut for "camlp5 pa_r.cmo pa_rp.cmo pr_dump.cmo"
camlp5r.opt
Same as previous, but in native code instead of bytecode,
therefore faster. But not extensible: it is not possible to add
other parsing or printing kits neither in command arguments nor
with the "load" directive inside sources. Suitable for compiling
sources not using other syntax extensions.
camlp5o
Shortcut for "camlp5 pa_o.cmo pa_op.cmo pr_dump.cmo"
camlp5o.opt
Same as previous, and like "camlp5r.opt", faster and
not extensible. Moreover, this has been produced by compilation of
Camlp5 grammars, resulting in a still faster executable.
camlp5sch
Shortcut for "camlp5 pa_scheme.cmo pr_dump.cmo"
mkcamlp5
creates camlp5 executables with almost the same options than
ocamlmktop. The interfaces to be visible must be explicitly added
in the command line as ".cmi" files. For example, how to add the
the OCaml module "str": "mkcamlp5 -custom str.cmi str.cma
-cclib -lstr -o camlp5str"
mkcamlp5.opt
creates camlp5 executables like mkcamlp5, except that
it is in native code, therefore faster, but not extensible; the
added kits must be cmx or cmxa files
Environment variable
When running a program using extensible grammars (in particular,
the camlp5 commands), the environment variable
"CAMLP5PARAM" is consulted. It sets the grammar parsing
algoritm parameters.
This variable must be a sequence of parameter specifications. A
parameter specification is a letter optionally followed by an = and
a value, with any separator. There are four possible parameters:
b
Set the backtrack algorithm as default.
t
Trace symbols (terminals and non-terminals) while parsing with
backtracking.
y
In backtracking, trace the advance in the input stream (number
of unfrozen tokens) and the possible stalling (number of
tokens tests).
l=value
Set the maximum stalling value.
OCaml toplevel files
These object files can be loaded in the OCaml toplevel to make
Camlp5 parse the input. It is possible to load them either by
putting them as parameters of the toplevel, or by using the
directive "load". The option "-I +camlp5" (or "-I
`camlp5 -where`") must be added to the "ocaml" command
(the OCaml toplevel).
camlp5r.cma
Read phrases and display results in revised syntax
camlp5o.cma
Read phrases and display results in normal syntax
camlp5sch.cma
Read phrases in Scheme syntax
Library files
The Camlp5 library is named
"gramlib.cma" and its native code version is
"gramlib.cmxa". They contain the modules:
To understand the whole syntax in the examples given in this
chapter, it is good to understand this parsing tool (the extensible
grammars), but we shall try to give some minimal explanations to
allow the reader to follow them.
A syntax extension is an OCaml object file (ending with ".cmo" or
".cma") which is loaded inside Camlp5. The source of this file uses
calls to the specific statement EXTEND applied to entries defined in
the Camlp5 module "Pcaml".
Entries
The grammar of OCaml contains several entries, corresponding to the
major notions of the language, which are modifiable this way, and
even erasable. They are defined in this module "Pcaml".
Most important entries:
expr: the expressions.
patt: the patterns.
ctyp: the types.
str_item: the structure items, i.e. the items between
"struct" and "end", and the toplevel phrases in a ".ml" file.
sig_item: the signature items, i.e. the items between
"sig" and "end", and the toplevel phrases in a ".mli" file.
module_expr: the module expressions.
module_type: the module types.
Entries of object programming:
class_expr: the class expressions.
class_type: the class types.
class_str_item: the objects items.
class_sig_item: the objects types items.
Main entries of files and interactive toplevel parsing:
implem: the phrases that can be found in a ".ml" file.
interf: the phrases that can be found in a ".mli" file.
top_phrase: the phrases of the interactive toplevel.
use_file: the phrases that can be found in a file
included by the directive "use".
Extra useful entries also accessible:
let_binding: the bindings "expression = pattern" found in
the "let" statement.
type_declaration: the bindings "name = type" found in
the "type" statement.
Syntax tree quotations
A grammar rule is a list of rule symbols followed by the semantic
action, i.e. the result of the rule. This result is a syntax tree,
whose type is the type of the extended entry. The description of the
types of the syntax tree are in the Camlp5 module
"MLast".
There is however a simpler way to make values of these syntax tree
types: the system quotations (see chapters
about quotations
and syntax tree). With this system, it is
possible to represent syntax tree in concrete syntax, between
specific parentheses, namely "<<" and
">>", or between "<:name<" and
">>".
For example, the syntax node of the "if" statement is, normally:
MLast.ExIfe loc e1 e2 e3
where loc is the source location, and e1, e2, e3 are the
expressions constituting the if statement. With quotations, it is
possible to write it like this (which is stricly equivalent because
this is evaluated at parse time, not execution time):
<:expr< if $e1$ then $e2$ else $e3$ >>
With quotations, it is possible to build pieces of program as
complex as desired. See the chapter
about syntax trees.
An example : repeat..until
A classical extension is the creation of the "repeat" statement.
The "repeat" statement is like a "while" except that the loop is
executed at least one time and that the test is at the end of the
loop and is inverted. The equivalent of:
repeat x; y; z until c
is:
do {
x; y; z;
while not c do { x; y; z }
}
or, with a loop:
loop () where rec loop () = do {
x; y; z;
if c then () else loop ()
}
The code
This syntax extension could be written like this (see the detail of
syntax in the chapter about extensible
grammars and the syntax tree quotations in the chapter
about them):
#load "pa_extend.cmo";
#load "q_MLast.cmo";
open Pcaml;
EXTEND
expr:
[ [ "repeat"; el = LIST1 expr SEP ";"; "until"; c = expr ->
let el = el @ [<:expr< while not $c$ do { $list:el$ } >>] in
<:expr< do { $list:el$ } >> ] ]
;
END;
Alternatively, with the loop version:
#load "pa_extend.cmo";
#load "q_MLast.cmo";
open Pcaml;
EXTEND
expr:
[ [ "repeat"; el = LIST1 expr SEP ";"; "until"; c = expr ->
let el = el @ [<:expr< if $c$ then () else loop () >>] in
<:expr< loop () where rec loop () = do { $list:el$ } >> ] ]
;
END;
The first "#load" in the code (in both files) means that a
syntax extension has been used in the file, namely the "EXTEND"
statement. The second "#load" means that abstract
tree quotations has been used, namely the
"<:expr< ... >>".
The quotation, found in the second version:
<:expr< loop () where rec loop () = do { $list:el$ } >>
is especially interesting. Written with abstract syntax tree, it would
be:
MLast.ExLet loc True
[(MLast.PaLid loc "loop",
MLast.ExFun loc [(MLast.PaUid loc "()", None, MLast.ExSeq loc el)])]
(MLast.ExApp loc (MLast.ExLid loc "loop") (MLast.ExUid loc "()"));
This shows the interest of writing abstract syntax tree with quotations:
it is easier to program and to understand.
Compilation
If the file "foo.ml" contains one of these versions, it is possible to
compile it like this:
ocamlc -pp camlp5r -I +camlp5 -c foo.ml
Notice that the ocamlc option "-c" means that we are interested
only in generating the object file "foo.cmo", not achieving the
compilation by creating an executable. Anyway the link would not
work because of usage of modules specific to Camlp5.
Testing
In the OCaml toplevel
ocaml -I +camlp5 camlp5r.cma
Objective Caml version ...
Camlp5 Parsing version ...
# #load "foo.cmo";
# value x = ref 42;
value x : ref int = {val=42}
# repeat
print_int x.val; print_endline ""; x.val := x.val + 3
until x.val > 70;
42
45
48
51
54
57
60
63
66
69
- : unit = ()
In a file
The code, above, used in the toplevel, can be written in a file,
say "bar.ml":
#load "./foo.cmo";
value x = ref 42;
repeat
print_int x.val;
print_endline "";
x.val := x.val + 3
until x.val > 70;
with a subtile difference: the loaded file must be "./foo.cmo"
and not just "foo.cmo" because Camlp5 does not have, by default,
the current directory in its path.
The file can be compiled like this:
ocamlc -pp camlp5r bar.ml
or in native code:
ocamlopt -pp camlp5r bar.ml
And it is possible to check the resulting program by typing:
camlp5r pr_r.cmo bar.ml
whose displayed result is:
#load "./foo.cmo";
value x = ref 42;
do {
print_int x.val;
print_endline "";
x.val := x.val + 3;
while not (x.val > 70) do {
print_int x.val;
print_endline "";
x.val := x.val + 3
}
};
This chapter presents the Camlp5 syntax tree when Camlp5 is installed
in strict mode.
Introduction
This syntax tree is defined in the module "MLast" provided
by Camlp5. Each node corresponds to a syntactic entity of the
corresponding type.
For example, the syntax tree of the statement "if" can
be written:
MLast.ExIfe loc e1 e2 e3
where "loc" is the location in the source, and
"e1", "e2" and "e3" are
respectively the expression after the "if", the one after
the "then" and the one after the "else".
If a program needs to manipulate syntax trees, it can use the nodes
defined in the module "MLast". The programmer must know how
the concrete syntax is transformed into this abstract syntax.
A simpler solution is to use one of the quotation kits
"q_MLast.cmo" or "q_ast.cmo". Both propose
quotations which represent the abstract
syntax (the nodes of the module "MLast") into concrete
syntax with antiquotations to bind variables inside. The example
above can be written:
<:expr< if $e1$ then $e2$ else $e3$ >>
This representation is very interesting when one wants to
manipulate complicated syntax trees. Here is an example taken from
the Camlp5 sources themselves:
<:expr<
match try Some $f$ with [ Stream.Failure -> None ] with
[ Some $p$ -> $e$
| _ -> raise (Stream.Error $e2$) ]
>>
This example was in a position of a pattern. In abstract syntax, it
should have been written:
Which is less readable and much more complicated to build and
update.
Instead of thinking of "a syntax tree", the programmer can think of
"a piece of program".
Location
In all syntax tree nodes, the first parameter is the source
location of the node.
In expressions
When a quotation is in the context of an expression, the location
parameter is "loc" in the node and in all its possible
sub-nodes. Example: if we consider the quotation:
<:sig_item< value foo : int -> bool >>
This quotation, in a context of an expression, is equivalent
to:
MLast.SgVal loc (Ploc.VaVal "foo")
(MLast.TyArr loc (MLast.TyLid loc (Ploc.VaVal "int"))
(MLast.TyLid loc (Ploc.VaVal "bool")));
The name "loc" is predefined. However, it is possible to
change it, using the argument "-loc" of the Camlp5 shell
commands.
Consequently, if there is no variable "loc" defined in the
context of the quotation, or if it is not of the good type, a
semantic error occur in the OCaml compiler ("Unbound value
loc").
Note that in the extensible grammars,
the variable "loc" is bound, in all semantic actions, to
the location of the rule.
If the created node has no location, the programmer can define a
variable named "loc" equal to "Ploc.dummy".
In patterns
When a quotation is in the context of a pattern, the location
parameter of all nodes and possible sub-nodes is set to the wildcard
("_"). The same example above:
However, it is possible to generate a binding of the variable
"loc" on the top node by writing a "colon" before the
"less" in the quotation. The same example:
The expressions or patterns between dollar ($) characters are
called antiquotations. In opposition to quotations which
has its own syntax rules, the antiquotation is an area in the syntax
of the enclosing context (expression or pattern). See the chapter
about quotations.
If a quotation is in the context of an expression, the
antiquotation must be an expression. It could be any expression,
including function calls. Examples:
value f e el = <:expr< [$e$ :: $loop False el$] >>;
value patt_list p pl = <:patt< ( $list:[p::pl]$) >>;
If a quotation is in the context of an pattern, the antiquotation
is a pattern. Any pattern is possible, including the wildcard
character ("_"). Examples:
fun [ <:expr< $lid:op$ $_$ $_$ >> -> op ]
match p with [ <:patt< $_$ | $_$ >> -> Some p ]
Two kinds of antiquotations
Preliminary remark
In strict mode, we remark that most constructors defined of the
module "MLast" are of type "Ploc.vala". This type
is defined like this:
type vala 'a =
[ VaAnt of string
| VaVal of 'a ]
;
The type argument is the real type of the node. For example, a
value of type "bool" in transitional mode is frequently
represented by a value of type "Ploc.vala bool".
The first case of the type "vala" corresponds to an
antiquotation in the concrete syntax. The second case to a normal
syntax situation, without antiquotation.
Example: in the "let" statement, the fact that it is "rec" or not
is represented by a boolean. This boolean is, in the syntax tree,
encapsulated with the type "Ploc.vala". The syntax tree of
the two following lines:
let x = y in z
let rec x = y in z
start with, respectively:
MLast.ExLet loc (Ploc.VaVal False)
... (* and so on *)
and:
MLast.ExLet loc (Ploc.VaVal True)
... (* and so on *)
The case "Ploc.VaAnt s" is internally used by the parsers
and by the quotation kit "q_ast.cmo" to record
antiquotation strings representing the expression or the patterns
having this value. For example, in this "let" statement:
MLast.ExLet loc (Ploc.VaAnt s)
... (* and so on *)
The contents of this "s" is internally handled. For
information, it contains the antiquotation string (kind included)
together with representation of the location of the antiquotation
in the quotation. See the next section.
Antiquoting
To antiquotate the fact that the "let" is with or without rec (a flag
of type boolean), there are two ways.
direct antiquoting
The first way, hidding the type "Ploc.val", can be
written with the antiquotation kind "flag":
<:expr< let $flag:rf$ x = y in z >>
This corresponds to the syntax tree:
MLast.ExLet loc (Ploc.VaVal rf)
... (* and so on *)
And, therefore, the type of the variable "rf" is simply
"bool".
general antiquoting
The second way, introducing variables of type "Ploc.vala"
can be written a kind prefixed by "_", namely here
"_flag":
<:expr< let $_flag:rf$ x = y in z >>
In that case, it corresponds to the syntax tree:
MLast.ExLet loc rf
... (* and so on *)
And, therefore, the type of the variable "rf" is now
"Ploc.vala bool".
Remarks
The first form of antiquotation ensures the compatibility with
previous versions of Camlp5. The syntax tree is not the
same, but the bound variables keep the same type.
All antiquotations kinds have these two forms: one with some name
(e.g. "flag") and one with the same name prefixed by "a"
(e.g. "aflag").
Nodes and Quotations
This section describes all nodes defined in the module "MLast" of
Camlp5 and how to write them with quotations. Notice that, inside
quotations, one is not restricted to these elementary cases, but
any complex value can be used, resulting on possibly complex combined
nodes.
The quotation forms are described here
in revised syntax (like the rest of
this document). In reality, it depends on which quotation kit is
loaded:
If "q_MLast.cmo" is used, the revised syntax is
mandatory: the quotations must be in that syntax without any
extension.
If "q_ast.cmo" is used, the quotation
syntax must be in the current user syntax with
all extensions added to compile the file.
Last remark: in the following tables, the variables names give
information of their types. The details can be found in the
distributed source file "mLast.mli".
Node used in the quotation expanders to tells at conversion to
OCaml compiler syntax tree time, that all locations of the
sub-tree is correcty located in the quotation. By default, in
quotations, the locations of all generated nodes are the location
of the whole quotation. This node allows to make an exception to
this rule, since we know that the antiquotation belongs to the
universe of the enclosing program. See the chapter
about quotations and, in particular, its
section about antiquotations.
(2)
Extra node internally used by the quotation kit
"q_ast.cmo" to build antiquotations of expressions.
Node internally used to specify a different file name applying to
the whole subtree. This is generated by the directive "use" and
used when converting to the OCaml syntax tree which needs the file
name in its location type.
(2)
Extra node internally used by the quotation kit
"q_ast.cmo" to build antiquotations of structure
items.
sig_item
Signature items, i.e. phrases in a ".mli" file or elements
inside "sig ... end".
- class
<:sig_item< class $list:lcict$ >> MLast.SgCls loc (Ploc.VaVal lcict)
<:sig_item< class $_list:lcict$ >> MLast.SgCls loc lcict
- class type
<:sig_item< class type $list:lcict$ >> MLast.SgClt loc (Ploc.VaVal lcict)
<:sig_item< class type $_list:lcict$ >> MLast.SgClt loc lcict
- declare
<:sig_item< declare $list:lsi$ end >> MLast.SgDcl loc (Ploc.VaVal lsi)
<:sig_item< declare $_list:lsi$ end >> MLast.SgDcl loc lsi
- directive
<:sig_item< # $lid:s$ >> MLast.SgDir loc (Ploc.VaVal s) (Ploc.VaVal None)
<:poly_variant< `$s$ of & $list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) (Ploc.VaVal True) (Ploc.VaVal lt)
<:poly_variant< `$s$ of & $_list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) (Ploc.VaVal True) lt
<:poly_variant< `$s$ of $list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) (Ploc.VaVal False) (Ploc.VaVal lt)
<:poly_variant< `$s$ of $_list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) (Ploc.VaVal False) lt
<:poly_variant< `$s$ of $flag:b$ $list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) (Ploc.VaVal b) (Ploc.VaVal lt)
<:poly_variant< `$s$ of $flag:b$ $_list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) (Ploc.VaVal b) lt
<:poly_variant< `$s$ of $_flag:b$ $list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) b (Ploc.VaVal lt)
<:poly_variant< `$s$ of $_flag:b$ $_list:lt$ >> MLast.PvTag loc (Ploc.VaVal s) b lt
<:poly_variant< `$_:s$ >> MLast.PvTag loc s (Ploc.VaVal True) (Ploc.VaVal [])
<:poly_variant< `$_:s$ of & $list:lt$ >> MLast.PvTag loc s (Ploc.VaVal True) (Ploc.VaVal lt)
<:poly_variant< `$_:s$ of & $_list:lt$ >> MLast.PvTag loc s (Ploc.VaVal True) lt
<:poly_variant< `$_:s$ of $list:lt$ >> MLast.PvTag loc s (Ploc.VaVal False) (Ploc.VaVal lt)
<:poly_variant< `$_:s$ of $_list:lt$ >> MLast.PvTag loc s (Ploc.VaVal False) lt
<:poly_variant< `$_:s$ of $flag:b$ $list:lt$ >> MLast.PvTag loc s (Ploc.VaVal b) (Ploc.VaVal lt)
<:poly_variant< `$_:s$ of $flag:b$ $_list:lt$ >> MLast.PvTag loc s (Ploc.VaVal b) lt
<:poly_variant< `$_:s$ of $_flag:b$ $list:lt$ >> MLast.PvTag loc s b (Ploc.VaVal lt)
<:poly_variant< `$_:s$ of $_flag:b$ $_list:lt$ >> MLast.PvTag loc s b lt
- type
<:poly_variant< $t$ >> MLast.PvInh loc t
Nodes without quotations
Some types defined in the AST tree module "MLast" don't
have an associated quotation. They are:
type_var
class_infos
type_var
The type "type_var" is defined as:
type type_var = (Ploc.vala string * (bool * bool));
The first boolean is "True" if the type variable is
prefixed by "+" ("plus" sign). The second boolean is
"True" if the type variable is prefixed by "-"
("minus" sign).
class_infos
The type "class_infos" is a record type parametrized with
a type variable. It is common to:
the "class declaration" ("class ..." as structure item),
the type variable being "class_expr",
the "class description" ("class ..." as signature item),
the type variable being "class_type",
the "class type declaration" ("class type ..."), the
type variable being "class_type".
It is defined as:
type class_infos 'a =
{ ciLoc : loc;
ciVir : Ploc.vala bool;
ciPrm : (loc * Ploc.vala (list type_var));
ciNam : Ploc.vala string;
ciExp : 'a }
;
The field "ciLoc" is the location of the whole definition.
The field "ciVir" tells whether the type is virtual or not.
The field "ciPrm" is the list of its possible parameters.
The field "ciNam" is the class identifier.
The field "ciExp" is the class definition, depending of its
kind.
The quotation kit "q_ast.cmo" allows to use syntax tree
quotations in user syntax. It fully works only in "strict" mode. In
"transitional" mode, there is no way to use antiquotations, which
restricts its utility.
If this kit is loaded, when a quotation of syntax tree is found,
the current OCaml language parser is called. Then, the resulting
tree is reified (except the antiquotations) to represent the
syntax tree of the syntax tree.
Antiquotations
The OCaml langage parser used, and its possible extensions, must
have been built to allow the places of the antiquotations. The
symbols enclosed by the meta-symbol "V" (see the chapter
about extensible grammars, section
"symbols"), define where antiquotations can take place.
There is no need to specify antiquotations for the main types
defined in the AST tree module ("MLast"): "expr",
"patt", "expr", "str_item",
"sig_item", and so on. All syntax parts of these types are
automatically antiquotable.
For example, in Camlp5 sources, the grammar rule defining the
"let..in" statement is:
"let"; r = V (FLAG "rec") "flag" "opt";
l = V (LIST1 let_binding SEP "and"); "in"; x = expr
All symbols of these rules, except the keywords, are antiquotable:
The "rec" flag, because enclosed by the "V" meta symbol. The two
strings which follow it gives the possible antiquotation kinds:
"flag" (the normal antiquotation kind) and "opt" (kept by backward
compatibility, but not recommended). It is therefore possible to
antiquote it as: "$flag:...$" or "$opt:...$"
where the "..." is an expression or a pattern depending
on the position of the enclosing quotation
The binding list is also antiquotable, since it is also enclosed
by the "V" meta symbol. Its antiquotation kind is "list" (the
default when the meta symbol parameter is a list). It is therefore
possible to write "$list:...$" at the place of the
binding list.
The expression after the "in" is also antiquotable, because it
belongs to the main types defined in the module "MLast".
In that example, the variable "r" is of type "Ploc.vala
bool", the variable "r" of type "Ploc.vala (list
(patt * expr))" and the variable "x" of type
"MLast.expr".
The directive "#pragma" allows to evaluate expressions
at parse time, useful, for example, to test syntax extensions by the
statement EXTEND without having to compile it in a separate file.
To use it, add the syntax extension "pa_pragma.cmo" in
the Camlp5 command line. It adds the ability to use this directive.
As an example, let's add syntax for the statement 'repeat' and
use it immediately:
#pragma
EXTEND
GLOBAL: expr;
expr: LEVEL "top"
[ [ "repeat"; e1 = sequence; "until"; e2 = SELF ->
<:expr< do { $e1$; while not $e2$ do { $e1$ } } >> ] ]
;
sequence:
[ [ el = LIST1 expr_semi -> <:expr< do { $list:el$ } >> ] ]
;
expr_semi:
[ [ e = expr; ";" -> e ] ]
;
END;
let i = ref 1 in
repeat print_int i.val; print_endline ""; incr i; until i.val = 10;
The compilation of this example (naming it "foo.ml") can be done with
the command:
Backtracking parsers are close to functional parsers: they use the
same stream type, "Fstream.t", and their syntax is almost
identical, its introducing keyword being "bparser" instead
of "fparser".
The difference is that they are implemented with full
backtracking and that they return values of the type
"option" of the triplet: 1/ value, 2/ remaining stream
and 3/ continuation.
Syntax
The syntax of backtracking parsers is added together with the
syntax of functional parsers, when the kit "pa_fstream.cmo" is
loaded. It is:
The backtracking parsers, like classical parsers and functional
parsers, use a recursive descent algorithm. But:
If a stream pattern component does not match the current
position of the input stream, the control is given to the next
case of the stream pattern component before it. If it is the first
stream pattern component, the rule (the stream pattern) is left
and the next rule is tested.
For example, the following grammar:
E -> X Y
X -> a b | a
Y -> b
works, with the backtracking algorithm, for the input "a
b".
Parsing with the non-terminal "E", the non-terminal
"X" first accepts the input "a b" with its first
rule. Then when "Y" is called, the parsing fails since
nothing remains in the input stream.
In the rule "X Y" of the non-terminal "E", the
non-terminal "Y" having failed, the control is given the
the continuation of the non-terminal "X". This continuation
is its second rule containing only "a". Then "Y"
is called and accepted.
This case does not work with functional parsers since when the rule
"a b" of the non-terminal "X" is accepted, it is
definitive. If the input starts with "a b", there is no way
to apply its second rule.
Backtracking parsers are strictly more powerful than functional
parsers.
Type
A backtracking parser whose stream elements are of type
"t1", and whose semantic actions are of some type
"t2", is of type:
If the backtracking parsers fails, its returning value is
"None".
If it succeeds, its returning value is "Some (x, strm, k)"
where "x" is its result, "strm" the remaining
stream, and "k" the continuation.
The continuation is internally used in the backtracking algorithm,
but is can also be used in the main call to compute the next
solution, using the function "Fstream.bcontinue".
It is also possible to directly get the list of all solutions by
calling the function "Fstream.bparse_all".
Syntax errors
Like for functional parsers, in case of
syntax error, the error position can be found by using the function
"Fstream.count_unfrozen", the token in error being the
last unfrozen element of the stream.
A syntax error is not really an error: for the backtracking
parsers, like for functional parsers, it is viewed as a
"non-working" case and another solution is searched.
In the backtracking algorithm, depending on the grammar and the
input, the search of the next solution can be very long. A solution
is proposed for that in the extensible
grammars system when the parsing algorithm is set to
"backtracking".
Example
Here is an example which just shows the backtracking algorithm but
without parsing, an empty stream being given as parameter and never
referred.
It creates a list of three strings, each of them being choosen
between "A", "B" and "C".
The code is:
#load "pa_fstream.cmo";
value choice = bparser [ [: :] -> "A" | [: :] -> "B" | [: :] -> "C" ];
value combine = bparser [: x = choice; y = choice; z = choice :] -> [x; y; z];
The function "combine" returns the first solution: