pax_global_header 0000666 0000000 0000000 00000000064 14257632064 0014523 g ustar 00root root 0000000 0000000 52 comment=bc6c0d568b90b61143e9863cb6ef7b3989b3313a
omd-1.3.2/ 0000775 0000000 0000000 00000000000 14257632064 0012305 5 ustar 00root root 0000000 0000000 omd-1.3.2/.github/ 0000775 0000000 0000000 00000000000 14257632064 0013645 5 ustar 00root root 0000000 0000000 omd-1.3.2/.github/workflows/ 0000775 0000000 0000000 00000000000 14257632064 0015702 5 ustar 00root root 0000000 0000000 omd-1.3.2/.github/workflows/main.yml 0000664 0000000 0000000 00000001314 14257632064 0017350 0 ustar 00root root 0000000 0000000 name: Main workflow
on:
pull_request:
push:
jobs:
build:
strategy:
fail-fast: false
matrix:
os:
- macos-latest
- ubuntu-latest
- windows-latest
ocaml-compiler:
- 4.04.2
- 4.13.x
runs-on: ${{ matrix.os }}
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Use OCaml ${{ matrix.ocaml-compiler }}
uses: ocaml/setup-ocaml@v2
with:
ocaml-compiler: ${{ matrix.ocaml-compiler }}
cache-prefix: 1.3.1-${{ matrix.container }}
- run: opam install . --deps-only --with-test
- run: opam exec -- dune build
- run: opam exec -- dune runtest
omd-1.3.2/.gitignore 0000664 0000000 0000000 00000000165 14257632064 0014277 0 ustar 00root root 0000000 0000000 _build/
_opam/
*.data
*.log
*.native
*.opt
*.byte
#*
.#*
*~
*.cmi
*.cma
*.cmo
*.cmx
*.cmxa
*.o
omd
.DS_Store
.depend
omd-1.3.2/ABOUT.md 0000664 0000000 0000000 00000024202 14257632064 0013501 0 ustar 00root root 0000000 0000000
About [OMD](https://github.com/pw374/omd/)
==========================================
The implementation of this library and command-line tool
is based on [DFMSD][].
That description doesn't define a grammar but a sort of guide for
human users who are not trying to implement it. In other words,
it's ambiguous, which is a problem since there are no errors in the
Markdown language, which design is mostly based on some
email-writing experience: the meaning of a phrase is the meaning
a human would give when reading the phrase as some email contents.
For instance, if there are blank lines that have spaces
(lines that read empty but actually contain some characters, from
the computer point of view since spaces are represented by characters),
since they're invisible to the normal human reader, they should be ignored.
Specificities
-------------
There follows a list of specificities of OMD.
This list is probably not exhaustive.
**Please note that OMD's semantics have changed over time, but they are becoming
more and more stable with time and new releases. The goal is to eventually
have a semantics that's as sane as it can possibly be for a Markdown parser.
Please [browse and open issues](https://github.com/pw374/omd/issues/)
if you find something that seems wrong.**
- Email addresses encoding: email addresses are not hex entity-encoded.
- `[foo]` is a short-cut for `[foo][]`, but if `foo` is not a reference
then `[foo]` is printed `[foo]`, not `[foo][]`.
*(Taken from Github Flavour Markdown.)*
- The Markdown to Markdown conversion may performe
some cleaning (some meaningless characters may disappear)
or spoiling (some meaningless characters may appear),
but both inputs and ouputs should have the same semantics (otherwise
please do report the bug).
- A list containing at least one item which has at least one paragraph
is a list for which all items have paragraphs and/or blocks.
In HTML words, in practice, if an `li` of a `ul` or `ol` has a `p`,
then all other `li`s of that list have at least a `p` or a `pre`.
- It's not possible to emphasise a part of a word using underscores.
*(Taken from Github Flavour Markdown.)*
- A code section declared with at least 3 backquotes (`` ` ``) at the
first element on a line is a code block. The backquotes should be
followed by a language name (made of a-z characters) or by a newline.
- A code block starting with several backquotes (e.g., ```` ``` ````)
immediately followed by a word W made of a-z characters is a code block
for which the code language is W. (If you use other characters than
a-z, the semantics is currently undefined although it's deterministic
of course, because it may change in the near future.) Also, if you use
the command line tool `omd`, you can define programs to process code
blocks specifically to the languages that are declared for those code
blocks.
- Each and every tabulation is converted by OMD to 4 spaces at the lexing
step. And the behaviour of the parser is undefined for tabulations.
- Note that it does mean that if you have a document with some code written
using the
[Whitespace](http://en.wikipedia.org/wiki/Whitespace_(programming_language))
language, it will not work very well. This might be fixed in the future
but unless you have a very good reason for OMD to support tabulations,
it will probably not.
- Parentheses and square brackets are generally parsed in a way such that
`[a[b]](http://c/(d))` is the URL `http://c/(d)` with the text `a[b]`.
If you want a parenthesis or bracket not to count in the balanced parsing,
escape it with a backslash, such as in `[a\[b](http://c/\(d)`.
*This is typically something that's not defined in [DFMSD].*
- Note about backslashes in URLs: some web browsers (e.g., Safari)
automatically convert `\` to `/`. It's not the case of CURL.
However I assume it's safe to consider that backslashes are not
to be used in URLs. Still it's always possible to
backslashe-escape them anyways.
- HTML is somewhat a part of Markdown. OMD will partially parse HTML tags
and if you have a tag that isn't a known HTML tag, then it's possible
that OMD will not consider it as HTML. For instance, a document
containing just `
<foo></foo>
`. - It's possible to ask `omd` to relax this constraint. - Some additional features are available on the command line. For more information, used the command `omd -help` [DFMSD]: http://daringfireball.net/projects/markdown/syntax "John Gruber's description of the syntax of Markdown" "DFMSD" is short for "Daring Fireball: Markdown Syntax Documentation", which is the HTML title of the page located atplop
hello
"; loop indent q; Buffer.add_string b ""; loop indent tl end | Ref(rc, name, text, fallback) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> begin match rc#get_ref name with | Some(href, title) -> loop indent (Url(htmlentities ~md:true href, [Text(text)], htmlentities ~md:true title) ::tl) | None -> loop indent (fallback#to_t); loop indent tl end end | Img_ref(rc, name, alt, fallback) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> begin match rc#get_ref name with | Some(src, title) -> loop indent (Img(htmlentities ~md:true alt, htmlentities ~md:true src, htmlentities ~md:true title)::tl) | None -> loop indent (fallback#to_t); loop indent tl end end | Paragraph [] :: tl -> loop indent tl | Paragraph md as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> (let s = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in if empty s then () else begin Buffer.add_string b "
"; Buffer.add_string b (remove_trailing_blanks s); Buffer.add_string b "
\n"; end); loop indent tl end | Img(alt, src, title) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b ""
else if lang = "" then
bprintf b ""
!default_language !default_language
else
bprintf b "" lang lang;
let new_c = code_style ~lang:lang c in
if c = new_c then
Buffer.add_string b (htmlentities ~md:false c)
else
Buffer.add_string b new_c;
Buffer.add_string b "
";
loop indent tl
end
| Code(lang, c) as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
if lang = "" && !default_language = "" then
Buffer.add_string b ""
else if lang = "" then
bprintf b "" !default_language
else
bprintf b "" lang;
let new_c = code_style ~lang:lang c in
if c = new_c then
Buffer.add_string b (htmlentities ~md:false c)
else
Buffer.add_string b new_c;
Buffer.add_string b "
";
loop indent tl
end
| Br as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
Buffer.add_string b "
";
loop indent tl
end
| Hr as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
Buffer.add_string b "
";
loop indent tl
end
| Raw s as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
Buffer.add_string b s;
loop indent tl
end
| Raw_block s as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
Buffer.add_string b s;
loop indent tl
end
| Html(tagname, attrs, []) as e :: tl
when StringSet.mem tagname html_void_elements ->
let attrs = filter_text_omd_rev attrs in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs attrs);
Printf.bprintf b " />";
loop indent tl
end
| Html(tagname, attrs, body) as e :: tl ->
let attrs = filter_text_omd_rev attrs in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs attrs);
Buffer.add_string b ">";
loop indent body;
Printf.bprintf b "%s>" tagname;
loop indent tl
end
| Html_block(tagname, attrs, body) as e :: tl ->
let attrs = filter_text_omd_rev attrs in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
if body = [] && StringSet.mem tagname html_void_elements then
(
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs attrs);
Buffer.add_string b " />";
loop indent tl
)
else
(
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs attrs);
Buffer.add_string b ">";
loop indent body;
Printf.bprintf b "%s>" tagname;
loop indent tl
)
end
| Html_comment s as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
Buffer.add_string b s;
loop indent tl
end
| Url (href,s,title) as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
let s = html_of_md ~override ~pindent ~nl2br ~cs:code_style s in
Buffer.add_string b " "" then
begin
Buffer.add_string b " title='";
Buffer.add_string b (htmlentities ~md:true title);
Buffer.add_string b "'";
end;
Buffer.add_string b ">";
Buffer.add_string b s;
Buffer.add_string b "";
loop indent tl
end
| (H1 md as e) :: tl ->
let e, md =
if not remove_header_links then
e, md
else
let md = remove_links md in
H1 md, md in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in
let id = id_of_string ids (text_of_md md) in
headers := (e, id, ih) :: !headers;
Buffer.add_string b "";
Buffer.add_string b ih;
Buffer.add_string b "
";
loop indent tl
end
| (H2 md as e) :: tl ->
let e, md =
if not remove_header_links then
e, md
else
let md = remove_links md in
H2 md, md in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in
let id = id_of_string ids (text_of_md md) in
headers := (e, id, ih) :: !headers;
Buffer.add_string b "";
Buffer.add_string b ih;
Buffer.add_string b "
";
loop indent tl
end
| (H3 md as e) :: tl ->
let e, md =
if not remove_header_links then
e, md
else
let md = remove_links md in
H3 md, md in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in
let id = id_of_string ids (text_of_md md) in
headers := (e, id, ih) :: !headers;
Buffer.add_string b "";
Buffer.add_string b ih;
Buffer.add_string b "
";
loop indent tl
end
| (H4 md as e) :: tl ->
let e, md =
if not remove_header_links then
e, md
else
let md = remove_links md in
H4 md, md in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in
let id = id_of_string ids (text_of_md md) in
headers := (e, id, ih) :: !headers;
Buffer.add_string b "";
Buffer.add_string b ih;
Buffer.add_string b "
";
loop indent tl
end
| (H5 md as e) :: tl ->
let e, md =
if not remove_header_links then
e, md
else
let md = remove_links md in
H5 md, md in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in
let id = id_of_string ids (text_of_md md) in
headers := (e, id, ih) :: !headers;
Buffer.add_string b "";
Buffer.add_string b ih;
Buffer.add_string b "
";
loop indent tl
end
| (H6 md as e) :: tl ->
let e, md =
if not remove_header_links then
e, md
else
let md = remove_links md in
H6 md, md in
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in
let id = id_of_string ids (text_of_md md) in
headers := (e, id, ih) :: !headers;
Buffer.add_string b "";
Buffer.add_string b ih;
Buffer.add_string b "
";
loop indent tl
end
| NL as e :: tl ->
begin match override e with
| Some s ->
Buffer.add_string b s;
loop indent tl
| None ->
if nl2br then
Buffer.add_string b "
"
else
Buffer.add_string b "\n";
loop indent tl
end
| [] ->
()
in
loop 0 md;
Buffer.contents b, List.rev !headers
and string_of_attrs attrs =
let b = Buffer.create 1024 in
List.iter
(function
| (a, Some v) ->
if not(String.contains v '\'') then
Printf.bprintf b " %s='%s'" a v
else if not(String.contains v '"') then
Printf.bprintf b " %s=\"%s\"" a v
else
Printf.bprintf b " %s=\"%s\"" a v
| a, None ->
(* if html4 then *)
(* Printf.bprintf b " %s='%s'" a a *)
(* else *)
Printf.bprintf b " %s=''" a (* HTML5 *)
)
attrs;
Buffer.contents b
and html_of_md
?(override=(fun (e:element) -> (None:string option)))
?(pindent=false)
?(nl2br=false)
?cs
md
=
fst (html_and_headers_of_md ~override ~pindent ~nl2br ?cs md)
and headers_of_md ?remove_header_links md =
snd (html_and_headers_of_md ?remove_header_links md)
let rec sexpr_of_md md =
let b = Buffer.create 64 in
let rec loop = function
| X x :: tl ->
(match x#to_t md with
| Some t ->
Buffer.add_string b "(X";
loop t;
Buffer.add_string b ")"
| None ->
match x#to_sexpr sexpr_of_md md with
| Some s ->
Buffer.add_string b "(X";
Buffer.add_string b s;
Buffer.add_string b ")"
| None ->
match x#to_html ~indent:0 html_of_md md with
| Some s ->
Buffer.add_string b "(X";
Buffer.add_string b s;
Buffer.add_string b ")"
| None -> ());
loop tl
| Blockquote q :: tl ->
Buffer.add_string b "(Blockquote";
loop q;
Buffer.add_string b ")";
loop tl
| Ref(rc, name, text, _) :: tl ->
bprintf b "(Ref %S %S)" name text;
loop tl
| Img_ref(rc, name, alt, _) :: tl ->
bprintf b "(Img_ref %S %S)" name alt;
loop tl
| Paragraph md :: tl ->
Buffer.add_string b "(Paragraph";
loop md;
Buffer.add_string b ")";
loop tl
| Img(alt, src, title) :: tl ->
bprintf b "(Img %S %S %S)" alt src title;
loop tl
| Text t :: tl ->
bprintf b "(Text %S" t;
let rec f = function
| Text t :: tl ->
bprintf b " %S" t;
f tl
| x -> x
in
let tl = f tl in
bprintf b ")";
loop tl
| Emph md :: tl ->
Buffer.add_string b "(Emph";
loop md;
Buffer.add_string b ")";
loop tl
| Bold md :: tl ->
Buffer.add_string b "(Bold";
loop md;
Buffer.add_string b ")";
loop tl
| Ol l :: tl ->
bprintf b "(Ol";
List.iter(fun li -> bprintf b "(Li "; loop li; bprintf b ")") l;
bprintf b ")";
loop tl
| Ul l :: tl ->
bprintf b "(Ul";
List.iter(fun li -> bprintf b "(Li "; loop li;bprintf b ")") l;
bprintf b ")";
loop tl
| Olp l :: tl ->
bprintf b "(Olp";
List.iter(fun li -> bprintf b "(Li "; loop li; bprintf b ")") l;
bprintf b ")";
loop tl
| Ulp l :: tl ->
bprintf b "(Ulp";
List.iter(fun li -> bprintf b "(Li "; loop li;bprintf b ")") l;
bprintf b ")";
loop tl
| Code(lang, c) :: tl ->
bprintf b "(Code %S)" c;
loop tl
| Code_block(lang, c) :: tl ->
bprintf b "(Code_block %s)" c;
loop tl
| Br :: tl ->
Buffer.add_string b "(Br)";
loop tl
| Hr :: tl ->
Buffer.add_string b "(Hr)";
loop tl
| Raw s :: tl ->
bprintf b "(Raw %S)" s;
loop tl
| Raw_block s :: tl ->
bprintf b "(Raw_block %S)" s;
loop tl
| Html(tagname, attrs, body) :: tl ->
bprintf b "(Html %s %s " tagname (string_of_attrs attrs);
loop body;
bprintf b ")";
loop tl
| Html_block(tagname, attrs, body) :: tl ->
bprintf b "(Html_block %s %s " tagname (string_of_attrs attrs);
loop body;
bprintf b ")";
loop tl
| Html_comment s :: tl ->
bprintf b "(Html_comment %S)" s;
loop tl
| Url (href,s,title) :: tl ->
bprintf b "(Url %S %S %S)" href (html_of_md s) title;
loop tl
| H1 md :: tl ->
Buffer.add_string b "(H1";
loop md;
Buffer.add_string b ")";
loop tl
| H2 md :: tl ->
Buffer.add_string b "(H2";
loop md;
Buffer.add_string b ")";
loop tl
| H3 md :: tl ->
Buffer.add_string b "(H3";
loop md;
Buffer.add_string b ")";
loop tl
| H4 md :: tl ->
Buffer.add_string b "(H4";
loop md;
Buffer.add_string b ")";
loop tl
| H5 md :: tl ->
Buffer.add_string b "(H5";
loop md;
Buffer.add_string b ")";
loop tl
| H6 md :: tl ->
Buffer.add_string b "(H6";
loop md;
Buffer.add_string b ")";
loop tl
| NL :: tl ->
Buffer.add_string b "(NL)";
loop tl
| [] -> ()
in
loop md;
Buffer.contents b
let escape_markdown_characters s =
let b = Buffer.create (String.length s * 2) in
for i = 0 to String.length s - 1 do
match s.[i] with
| '.' as c ->
if i > 0 &&
match s.[i-1] with
| '0' .. '9' -> i+1 < String.length s && s.[i+1] = ' '
| _ -> false
then
Buffer.add_char b '\\';
Buffer.add_char b c
| '-' as c ->
if (i = 0 || match s.[i-1] with ' '| '\n' -> true | _ -> false)
&& (i+1 < String.length s && (s.[i+1] = ' '||s.[i+1] = '-'))
then
Buffer.add_char b '\\';
Buffer.add_char b c
| '+' as c ->
if (i = 0 || match s.[i-1] with ' '| '\n' -> true | _ -> false)
&& (i+1 < String.length s && s.[i+1] = ' ')
then
Buffer.add_char b '\\';
Buffer.add_char b c
| '!' as c ->
if i+1 < String.length s && s.[i+1] = '[' then
Buffer.add_char b '\\';
Buffer.add_char b c
| '<' as c ->
if i <> String.length s - 1 &&
(match s.[i+1] with 'a' .. 'z' | 'A' .. 'Z' -> false | _ -> true)
then
Buffer.add_char b '\\';
Buffer.add_char b c
| '>' as c ->
if i = 0 ||
(match s.[i-1] with ' ' | '\n' -> false | _ -> true)
then
Buffer.add_char b '\\';
Buffer.add_char b c
| '#' as c ->
if i = 0 || s.[i-1] = '\n' then
Buffer.add_char b '\\';
Buffer.add_char b c
| '\\' | '[' | ']' | '(' | ')' | '`' | '*' as c ->
Buffer.add_char b '\\';
Buffer.add_char b c
| c ->
Buffer.add_char b c
done;
Buffer.contents b
let rec markdown_of_md md =
if debug then eprintf "(OMD) markdown_of_md(%S)\n%!" (sexpr_of_md md);
let quote ?(indent=0) s =
let b = Buffer.create (String.length s) in
let l = String.length s in
let rec loop nl i =
if i < l then
begin
if nl && i < l - 1 then
(for i = 1 to indent do
Buffer.add_char b ' '
done;
Buffer.add_string b "> ");
match s.[i] with
| '\n' ->
Buffer.add_char b '\n';
loop true (succ i)
| c ->
Buffer.add_char b c;
loop false (succ i)
end
else
Buffer.contents b
in loop true 0
in
let b = Buffer.create 64 in
let add_spaces n = for i = 1 to n do Buffer.add_char b ' ' done in
let references = ref None in
let rec loop ?(fst_p_in_li=true) ?(is_in_list=false) list_indent l =
(* [list_indent: int] is the indentation level in number of spaces. *)
(* [is_in_list: bool] is necessary to know if we are inside a paragraph
which is inside a list item because those need to be indented! *)
let loop ?(fst_p_in_li=fst_p_in_li) ?(is_in_list=is_in_list) list_indent l =
loop ~fst_p_in_li:fst_p_in_li ~is_in_list:is_in_list list_indent l
in
match l with
| X x :: tl ->
(match x#to_t md with
| Some t -> loop list_indent t
| None ->
match x#to_html ~indent:0 html_of_md md with
| Some s -> Buffer.add_string b s
| None -> ());
loop list_indent tl
| Blockquote q :: tl ->
Buffer.add_string b (quote ~indent:list_indent (markdown_of_md q));
if tl <> [] then Buffer.add_string b "\n";
loop list_indent tl
| Ref(rc, name, text, fallback) :: tl ->
if !references = None then references := Some rc;
loop list_indent (Raw(fallback#to_string)::tl)
| Img_ref(rc, name, alt, fallback) :: tl ->
if !references = None then references := Some rc;
loop list_indent (Raw(fallback#to_string)::tl)
| Paragraph [] :: tl -> loop list_indent tl
| Paragraph md :: tl ->
if is_in_list then
if fst_p_in_li then
add_spaces (list_indent-2)
else
add_spaces list_indent;
loop ~fst_p_in_li:false list_indent md;
Printf.bprintf b "\n\n";
loop ~fst_p_in_li:false list_indent tl
| Img(alt, src, title) :: tl ->
Printf.bprintf b "" alt src title;
loop list_indent tl
| Text t :: tl ->
Printf.bprintf b "%s" (escape_markdown_characters t);
loop list_indent tl
| Emph md :: tl ->
Buffer.add_string b "*";
loop list_indent md;
Buffer.add_string b "*";
loop list_indent tl
| Bold md :: tl ->
Buffer.add_string b "**";
loop list_indent md;
Buffer.add_string b "**";
loop list_indent tl
| Ol l :: tl ->
if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n' then
Buffer.add_char b '\n';
let c = ref 0 in (* don't use List.iteri because it's not in 3.12 *)
List.iter(fun li ->
incr c;
add_spaces list_indent;
Printf.bprintf b "%d. " !c;
loop ~is_in_list:true (list_indent+4) li;
Buffer.add_char b '\n';
) l;
if list_indent = 0 then Buffer.add_char b '\n';
loop list_indent tl
| Ul l :: tl ->
if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n' then
Buffer.add_char b '\n';
List.iter(fun li ->
add_spaces list_indent;
Printf.bprintf b "- ";
loop ~is_in_list:true (list_indent+4) li;
Buffer.add_char b '\n';
) l;
if list_indent = 0 then Buffer.add_char b '\n';
loop list_indent tl
| Olp l :: tl ->
let c = ref 0 in (* don't use List.iteri because it's not in 3.12 *)
List.iter(fun li ->
if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n'
then Buffer.add_char b '\n';
add_spaces list_indent;
incr c;
bprintf b "%d. " !c;
loop ~is_in_list:true (list_indent+4) li;
(* Paragraphs => No need of '\n' *)
) l;
loop list_indent tl
| Ulp l :: tl ->
List.iter(fun li ->
if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n'
then Buffer.add_char b '\n';
add_spaces list_indent;
bprintf b "+ ";
loop ~is_in_list:true (list_indent+4) li;
(* Paragraphs => No need of '\n' *)
) l;
begin match tl with
| (H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _)::_
| NL::(H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _)::_ ->
Buffer.add_char b '\n'
| _ -> ()
end;
loop list_indent tl
| Code(_lang, c) :: tl -> (* FIXME *)
let n = (* compute how many backquotes we need to use *)
let filter (n:int) (s:int list) =
if n > 0 && n < 10 then
List.filter (fun e -> e <> n) s
else
s
in
let l = String.length c in
let rec loop s x b i =
if i = l then
match filter b s with
| hd::_ -> hd
| [] -> x+1
else
match c.[i] with
| '`' -> loop s x (succ b) (succ i)
| _ -> loop (filter b s) (max b x) 0 (succ i)
in
loop [1;2;3;4;5;6;7;8;9;10] 0 0 0
in
begin
Printf.bprintf b "%s" (String.make n '`');
if c.[0] = '`' then Buffer.add_char b ' ';
Printf.bprintf b "%s" c;
if c.[String.length c - 1] = '`' then Buffer.add_char b ' ';
Printf.bprintf b "%s" (String.make n '`');
end;
loop list_indent tl
| Code_block(lang, c) :: tl ->
let n = (* compute how many backquotes we need to use *)
let filter n s =
if n > 0 && n < 10 then
List.filter (fun e -> e <> n) s
else
s
in
let l = String.length c in
let rec loop s b i =
if i = l then
match filter b s with
| hd::_ -> hd
| [] -> 0
else
match c.[i] with
| '`' -> loop s (succ b) (succ i)
| _ -> loop (filter b s) 0 (succ i)
in
loop [3;4;5;6;7;8;9;10] 0 0
in
let output_indented_block n s =
let rec loop p i =
if i = String.length s then
()
else
match p with
| '\n' ->
Printf.bprintf b "%s" (String.make n ' ');
Buffer.add_char b s.[i];
loop s.[i] (succ i)
| _ ->
Buffer.add_char b s.[i];
loop s.[i] (succ i)
in loop '\n' 0
in
if n = 0 then (* FIXME *)
begin
(* case where we can't use backquotes *)
Buffer.add_char b '\n';
output_indented_block (4+list_indent) c;
if tl <> [] then Buffer.add_string b "\n\n"
end
else
begin
Buffer.add_string b (String.make (list_indent) ' ');
Printf.bprintf b "%s%s\n" (String.make n '`')
(if lang = "" then !default_language else lang);
output_indented_block (list_indent) c;
if Buffer.nth b (Buffer.length b - 1) <> '\n' then
Buffer.add_char b '\n';
Buffer.add_string b (String.make (list_indent) ' ');
Printf.bprintf b "%s\n" (String.make n '`');
end;
loop list_indent tl
| Br :: tl ->
Buffer.add_string b "
";
loop list_indent tl
| Hr :: tl ->
Buffer.add_string b "* * *\n";
loop list_indent tl
| Raw s :: tl ->
Buffer.add_string b s;
loop list_indent tl
| Raw_block s :: tl ->
Buffer.add_char b '\n';
Buffer.add_string b s;
Buffer.add_char b '\n';
loop list_indent tl
| Html(tagname, attrs, []) :: tl
when StringSet.mem tagname html_void_elements ->
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs attrs);
Buffer.add_string b " />";
loop list_indent tl
| Html(tagname, attrs, body) :: tl ->
let a = filter_text_omd_rev attrs in
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs a);
Buffer.add_string b ">";
if a == attrs then
loop list_indent body
else
Buffer.add_string b (html_of_md body);
Printf.bprintf b "%s>" tagname;
loop list_indent tl
| (Html_block(tagname, attrs, body))::tl ->
let needs_newlines =
match tl with
| NL :: Paragraph p :: _
| Paragraph p :: _ -> p <> []
| (H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _
| Ul _ | Ol _ | Ulp _ | Olp _ | Code (_, _) | Code_block (_, _)
| Text _ | Emph _ | Bold _ | Br |Hr | Url (_, _, _)
| Ref (_, _, _, _) | Img_ref (_, _, _, _)
| Html (_, _, _)
| Blockquote _ | Img (_, _, _)) :: _ -> true
| ( Html_block (_, _, _) | Html_comment _
| Raw _|Raw_block _) :: _-> false
| X _ :: _ -> false
| NL :: _ -> false
| [] -> false
in
if body = [] && StringSet.mem tagname html_void_elements then
(
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs attrs);
Buffer.add_string b " />";
if needs_newlines then Buffer.add_string b "\n\n";
loop list_indent tl
)
else
(
let a = filter_text_omd_rev attrs in
Printf.bprintf b "<%s" tagname;
Buffer.add_string b (string_of_attrs a);
Buffer.add_string b ">";
if a == attrs then
loop list_indent body
else
Buffer.add_string b (html_of_md body);
Printf.bprintf b "%s>" tagname;
if needs_newlines then Buffer.add_string b "\n\n";
loop list_indent tl
)
| Html_comment s :: tl ->
Buffer.add_string b s;
loop list_indent tl
| Url (href,s,title) :: tl ->
if title = "" then
bprintf b "[%s](%s)" (markdown_of_md s) href
else
bprintf b "[%s](%s \"%s\")" (markdown_of_md s) href title;
loop list_indent tl
| H1 md :: tl ->
Buffer.add_string b "# ";
loop list_indent md;
Buffer.add_string b "\n";
loop list_indent tl
| H2 md :: tl ->
Buffer.add_string b "## ";
loop list_indent md;
Buffer.add_string b "\n";
loop list_indent tl
| H3 md :: tl ->
Buffer.add_string b "### ";
loop list_indent md;
Buffer.add_string b "\n";
loop list_indent tl
| H4 md :: tl ->
Buffer.add_string b "#### ";
loop list_indent md;
Buffer.add_string b "\n";
loop list_indent tl
| H5 md :: tl ->
Buffer.add_string b "##### ";
loop list_indent md;
Buffer.add_string b "\n";
loop list_indent tl
| H6 md :: tl ->
Buffer.add_string b "###### ";
loop list_indent md;
Buffer.add_string b "\n";
loop list_indent tl
| NL :: tl ->
if Buffer.length b = 1
|| (Buffer.length b > 1 &&
not(Buffer.nth b (Buffer.length b - 1) = '\n'
&& Buffer.nth b (Buffer.length b - 2) = '\n'))
then
Buffer.add_string b "\n";
loop list_indent tl
| [] -> ()
in
loop 0 md;
begin match !references with
| None -> ()
| Some r ->
Buffer.add_char b '\n';
List.iter
(fun (name, (url, title)) ->
if title = "" then
bprintf b "[%s]: %s \n" name url
else
bprintf b "[%s]: %s \"%s\"\n" name url title
)
r#get_all
end;
let res = Buffer.contents b in
if debug then
eprintf "(OMD) markdown_of_md(%S) => %S\n%!"
(sexpr_of_md md) res;
res
omd-1.3.2/src/omd_backend.mli 0000664 0000000 0000000 00000007742 14257632064 0016037 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
type code_stylist = lang:string -> string -> string
(** Function that takes a language name and some code and returns
that code with style. *)
val default_language : string ref
(** default language for code blocks can be set to any name,
by default it is the empty string *)
val html_of_md :
?override:(Omd_representation.element -> string option) ->
?pindent:bool ->
?nl2br:bool ->
?cs:code_stylist ->
Omd_representation.t -> string
(** [html_of_md md] returns a string containing the HTML version of
[md]. Note that [md] uses the internal representation of
Markdown.
The optional parameter [override] allows to override an precise
behaviour for a constructor of Omd_representation.element,
as in the following example:
let customized_to_html =
Omd.html_of_md
~override:(function
| Url (href,s,title) ->
Some(" "" then
" title='" ^ (Omd_utils.htmlentities ~md:true title) ^ "'"
else "")
^ ">"
^ Omd_backend.html_of_md s ^ " target='_blank'")
| _ -> None)
*)
val headers_of_md :
?remove_header_links:bool ->
Omd_representation.t ->
(Omd_representation.element * string * string) list
(** [headers_of_md md] returns a list of 3-tuples; in each of them the
first element is the header (e.g., [H1(foo)]), the second is the
HTML id (as produced by [html_of_md]), and the third element is
the HTML version of [foo]. The third elements of those 3-tuples
exist because if you use [html_and_headers_of_md], then you have
the guarantee that the HTML version of [foo] is the same for
both the headers and the HTML version of [md].
If [remove_header_links], then remove links inside headers (h1, h2, ...).
Default value of [remove_header_links]: cf. [html_and_headers_of_md].
*)
val html_and_headers_of_md :
?remove_header_links:bool ->
?override:(Omd_representation.element -> string option) ->
?pindent:bool ->
?nl2br:bool ->
?cs:code_stylist ->
Omd_representation.t ->
string *
(Omd_representation.element * Omd_utils.StringSet.elt * string) list
(** [html_and_headers_of_md md] is the same as [(html_of_md md,
headers_of_md md)] except that it's two times faster.
If you need both headers and html, don't use [html_of_md]
and [headers_of_md] but this function instead.
If [remove_header_links], then remove links inside headers (h1, h2, ...).
Default value of [remove_header_links]: false.
*)
val escape_markdown_characters : string -> string
(** [escape_markdown_characters s] returns a string where
markdown-significant characters in [s] have been
backslash-escaped. Note that [escape_markdown_characters] takes a
"raw" string, therefore it doesn't have the whole context in which
the string appears, thus the escaping cannot really be
minimal. However the implementation tries to minimalise the extra
escaping. *)
val text_of_md : Omd_representation.t -> string
(** [text_of_md md] is basically the same as [html_of_md md] but without
the HTML tags in the output. *)
val markdown_of_md : Omd_representation.t -> string
(** [markdown_of_md md] is basically the same as [html_of_md md] but
with the output in Markdown syntax rather than HTML. *)
val sexpr_of_md : Omd_representation.t -> string
(** [sexpr_of_md md] is basically the same as [html_of_md md] but with
the output in s-expressions rather than HTML. This is mainly used
for debugging. *)
omd-1.3.2/src/omd_html.ml 0000664 0000000 0000000 00000003235 14257632064 0015234 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* OMD: Markdown tool in OCaml *)
(* (c) 2014 by Philippe Wang *)
(* Licence: ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
type html = html_node list
and html_node =
| Node of nodename * attributes * html
| Data of string
| Rawdata of string
| Comment of string
and nodename = string
and attributes = attribute list
and attribute = string * string option
let to_string html =
let b = Buffer.create 1024 in
let pp f = Printf.bprintf b f in
let rec loop = function
| Node(nodename, attributes, html) ->
pp "<%s" nodename;
ppa attributes;
pp ">";
List.iter loop html;
pp "%s>" nodename
| Data s -> pp "%s" s
| Rawdata s -> pp "%s" s
| Comment c -> pp "" c
and ppa attrs =
List.iter
(function
| (a, Some v) ->
if not (String.contains v '\'') then
pp " %s='%s'" a v
else if not (String.contains v '"') then
pp " %s=\"%s\"" a v
else
(
pp " %s=\"" a;
for i = 0 to String.length v - 1 do
match v.[i] with
| '"' -> pp """
| c -> pp "%c" c
done;
pp "\""
)
| a, None ->
Printf.bprintf b " %s=''" a (* HTML5 *)
)
attrs
in
List.iter loop html;
Buffer.contents b
omd-1.3.2/src/omd_lexer.ml 0000664 0000000 0000000 00000037366 14257632064 0015423 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
(* Implementation notes *********************************************
* - This module should depend on OCaml's standard library only and
* should be as 'pure OCaml' (i.e. depend as least as possible on
* external tools) as possible.
* - `while' loops are sometimes preferred to recursion because this
* may be used on systems where tail recursion is not well
* supported. (I tried to write "while" as often as possible, but it
* turned out that it was pretty inconvenient, so I do use
* recursion. When I have time, I'll do some tests and see if I
* need to convert recursive loops into iterative loops. Sorry if it
* makes it harder to read.)
*)
(* class type tag = object method is_me : 'a. 'a -> bool end *)
open Omd_representation
type token = Omd_representation.tok
type t = Omd_representation.tok list
let string_of_token = function
| Tag (name, o) ->
if Omd_utils.debug then "TAG("^name^")" ^ o#to_string else o#to_string
| Ampersand -> "&"
| Ampersands n -> assert (n >= 0); String.make (2+n) '&'
| At -> "@"
| Ats n -> assert (n >= 0); String.make (2+n) '@'
| Backquote -> "`"
| Backquotes n -> assert (n >= 0); String.make (2+n) '`'
| Backslash -> "\\"
| Backslashs n -> assert (n >= 0); String.make (2+n) '\\'
| Bar -> "|"
| Bars n -> assert (n >= 0); String.make (2+n) '|'
| Caret -> "^"
| Carets n -> assert (n >= 0); String.make (2+n) '^'
| Cbrace -> "}"
| Cbraces n -> assert (n >= 0); String.make (2+n) '}'
| Colon -> ":"
| Colons n -> assert (n >= 0); String.make (2+n) ':'
| Comma -> ","
| Commas n -> assert (n >= 0); String.make (2+n) ','
| Cparenthesis -> ")"
| Cparenthesiss n -> assert (n >= 0); String.make (2+n) ')'
| Cbracket -> "]"
| Cbrackets n -> assert (n >= 0); String.make (2+n) ']'
| Dollar -> "$"
| Dollars n -> assert (n >= 0); String.make (2+n) '$'
| Dot -> "."
| Dots n -> assert (n >= 0); String.make (2+n) '.'
| Doublequote -> "\""
| Doublequotes n -> assert (n >= 0); String.make (2+n) '"'
| Exclamation -> "!"
| Exclamations n -> assert (n >= 0); String.make (2+n) '!'
| Equal -> "="
| Equals n -> assert (n >= 0); String.make (2+n) '='
| Greaterthan -> ">"
| Greaterthans n -> assert (n >= 0); String.make (2+n) '>'
| Hash -> "#"
| Hashs n -> assert (n >= 0); String.make (2+n) '#'
| Lessthan -> "<"
| Lessthans n -> assert (n >= 0); String.make (2+n) '<'
| Minus -> "-"
| Minuss n -> assert (n >= 0); String.make (2+n) '-'
| Newline -> "\n"
| Newlines n -> assert (n >= 0); String.make (2+n) '\n'
| Number s -> s
| Obrace -> "{"
| Obraces n -> assert (n >= 0); String.make (2+n) '{'
| Oparenthesis -> "("
| Oparenthesiss n -> assert (n >= 0); String.make (2+n) '('
| Obracket -> "["
| Obrackets n -> assert (n >= 0); String.make (2+n) '['
| Percent -> "%"
| Percents n -> assert (n >= 0); String.make (2+n) '%'
| Plus -> "+"
| Pluss n -> assert (n >= 0); String.make (2+n) '+'
| Question -> "?"
| Questions n -> assert (n >= 0); String.make (2+n) '?'
| Quote -> "'"
| Quotes n -> assert (n >= 0); String.make (2+n) '\''
| Semicolon -> ";"
| Semicolons n -> assert (n >= 0); String.make (2+n) ';'
| Slash -> "/"
| Slashs n -> assert (n >= 0); String.make (2+n) '/'
| Space -> " "
| Spaces n -> assert (n >= 0); String.make (2+n) ' '
| Star -> "*"
| Stars n -> assert (n >= 0); String.make (2+n) '*'
| Tab -> " "
| Tabs n -> assert (n >= 0); String.make ((2+n)*4) ' '
| Tilde -> "~"
| Tildes n -> assert (n >= 0); String.make (2+n) '~'
| Underscore -> "_"
| Underscores n -> assert (n >= 0); String.make (2+n) '_'
| Word s -> s
let size_and_newlines = function
| Tag _ -> (0, 0)
| Ampersand | At | Backquote | Backslash | Bar | Caret | Cbrace
| Colon | Comma | Cparenthesis | Cbracket | Dollar | Dot
| Doublequote | Exclamation | Equal | Greaterthan | Hash | Lessthan
| Minus | Obrace | Oparenthesis | Obracket | Percent | Plus
| Question | Quote | Semicolon | Slash | Space | Star | Tab
| Tilde | Underscore -> (1, 0)
| Ampersands x | Ats x | Backquotes x | Backslashs x | Bars x | Carets x
| Cbraces x | Colons x | Commas x | Cparenthesiss x | Cbrackets x
| Dollars x | Dots x
| Doublequotes x | Exclamations x | Equals x | Greaterthans x | Hashs x
| Lessthans x
| Minuss x | Obraces x | Oparenthesiss x | Obrackets x | Percents x | Pluss x
| Questions x | Quotes x | Semicolons x | Slashs x | Spaces x | Stars x
| Tabs x
| Tildes x | Underscores x -> (2+x, 0)
| Newline -> (0, 1)
| Newlines x -> (0, 2+x)
| Number s | Word s -> (String.length s, 0)
let length t =
let c, nl = size_and_newlines t in
c + nl
let split_first = function
| Ampersands n -> Ampersand, (if n > 0 then Ampersands(n-1) else Ampersand)
| Ats n -> At, (if n > 0 then Ats(n-1) else At)
| Backquotes n -> Backquote, (if n > 0 then Backquotes(n-1) else Backquote)
| Backslashs n -> Backslash, (if n > 0 then Backslashs(n-1) else Backslash)
| Bars n -> Bar, (if n > 0 then Bars(n-1) else Bar)
| Carets n -> Caret, (if n > 0 then Carets(n-1) else Caret)
| Cbraces n -> Cbrace, (if n > 0 then Cbraces(n-1) else Cbrace)
| Colons n -> Colon, (if n > 0 then Colons(n-1) else Colon)
| Commas n -> Comma, (if n > 0 then Commas(n-1) else Comma)
| Cparenthesiss n -> Cparenthesis, (if n > 0 then Cparenthesiss(n-1)
else Cparenthesis)
| Cbrackets n -> Cbracket, (if n > 0 then Cbrackets(n-1) else Cbracket)
| Dollars n -> Dollar, (if n > 0 then Dollars(n-1) else Dollar)
| Dots n -> Dot, (if n > 0 then Dots(n-1) else Dot)
| Doublequotes n -> Doublequote, (if n > 0 then Doublequotes(n-1)
else Doublequote)
| Exclamations n -> Exclamation, (if n > 0 then Exclamations(n-1)
else Exclamation)
| Equals n -> Equal, (if n > 0 then Equals(n-1) else Equal)
| Greaterthans n -> Greaterthan, (if n > 0 then Greaterthans(n-1)
else Greaterthan)
| Hashs n -> Hash, (if n > 0 then Hashs(n-1) else Hash)
| Lessthans n -> Lessthan, (if n > 0 then Lessthans(n-1) else Lessthan)
| Minuss n -> Minus, (if n > 0 then Minuss(n-1) else Minus)
| Newlines n -> Newline, (if n > 0 then Newlines(n-1) else Newline)
| Obraces n -> Obrace, (if n > 0 then Obraces(n-1) else Obrace)
| Oparenthesiss n -> Oparenthesis, (if n > 0 then Oparenthesiss(n-1)
else Oparenthesis)
| Obrackets n -> Obracket, (if n > 0 then Obrackets(n-1) else Obracket)
| Percents n -> Percent, (if n > 0 then Percents(n-1) else Percent)
| Pluss n -> Plus, (if n > 0 then Pluss(n-1) else Plus)
| Questions n -> Question, (if n > 0 then Questions(n-1) else Question)
| Quotes n -> Quote, (if n > 0 then Quotes(n-1) else Quote)
| Semicolons n -> Semicolon, (if n > 0 then Semicolons(n-1) else Semicolon)
| Slashs n -> Slash, (if n > 0 then Slashs(n-1) else Slash)
| Spaces n -> Space, (if n > 0 then Spaces(n-1) else Space)
| Stars n -> Star, (if n > 0 then Stars(n-1) else Star)
| Tabs n -> Tab, (if n > 0 then Tabs(n-1) else Tab)
| Tildes n -> Tilde, (if n > 0 then Tildes(n-1) else Tilde)
| Underscores n -> Underscore, (if n > 0 then Underscores(n-1)
else Underscore)
| Ampersand | At | Backquote | Backslash | Bar | Caret | Cbrace | Colon
| Comma | Cparenthesis | Cbracket | Dollar | Dot | Doublequote
| Exclamation | Equal | Greaterthan | Hash | Lessthan | Minus
| Newline | Number _ | Obrace | Oparenthesis | Obracket | Percent
| Plus | Question | Quote | Semicolon | Slash | Space | Star | Tab
| Tilde | Underscore | Tag _ | Word _ ->
invalid_arg "Omd_lexer.split_first"
module type Input =
sig
type t
val length : t -> int
val get : t -> int -> char
val sub : t -> pos:int -> len:int -> string
end
module Lex(I : Input) :
sig
val lex : I.t -> t
end =
struct
let lex (s : I.t) =
let result = ref [] in
let i = ref 0 in
let l = I.length s in
let rcount c =
(* [rcount c] returns the number of immediate consecutive
occurrences of [c]. By side-effect, it increases the reference
counter [i]. *)
let rec loop r =
if !i = l then r
else if I.get s !i = c then (incr i; loop (r+1))
else r
in
loop 1
in
let word () =
let start = !i in
let rec loop () =
begin
if !i = l then
Word(I.sub s ~pos:start ~len:(!i-start))
else
match I.get s !i with
| ' ' | '\t' | '\n' | '\r' | '#' | '*' | '-' | '+' | '`' | '\''
| '"' | '\\' | '_' | '[' | ']' | '{' | '}' | '(' | ')' | ':'
| ';' | '>' | '~' | '<' | '@' | '&' | '|' | '^' | '.' | '/'
| '$' | '%' | '!' | '?' | '=' ->
Word(I.sub s ~pos:start ~len:(!i-start))
| c -> incr i; loop()
end
in
loop()
in
let maybe_number () =
let start = !i in
while
!i < l &&
match I.get s !i with
| '0' .. '9' -> true
| _ -> false
do
incr i
done;
if !i = l then
Number(I.sub s ~pos:start ~len:(!i-start))
else
begin match I.get s !i with
| ' ' | '\t' | '\n' | '\r' | '#' | '*' | '-' | '+' | '`' | '\'' | '"'
| '\\' | '_' | '[' | ']' | '{' | '}' | '(' | ')' | ':' | ';' | '>'
| '~' | '<' | '@' | '&' | '|' | '^' | '.' | '/' | '$' | '%' | '!'
| '?' | '=' ->
Number(I.sub s ~pos:start ~len:(!i-start))
| _ ->
i := start;
word()
end
in
let n_occ c = incr i; rcount c in
while !i < l do
let c = I.get s !i in
let w = match c with
| ' ' -> let n = n_occ c in if n = 1 then Space else Spaces (n-2)
| '\t' -> let n = n_occ c in if n = 1 then Spaces(2) else Spaces(4*n-2)
| '\n' -> let n = n_occ c in if n = 1 then Newline else Newlines (n-2)
| '\r' -> (* eliminating \r by converting all styles to unix style *)
incr i;
let rec count_rn x =
if !i < l && I.get s (!i) = '\n' then
if !i + 1 < l && I.get s (!i+1) = '\r' then
(i := !i + 2; count_rn (x+1))
else
x
else
x
in
let rn = 1 + count_rn 0 in
if rn = 1 then
match n_occ c with
| 1 -> Newline
| x -> assert(x>=2); Newlines(x-2)
else
(assert(rn>=2);Newlines(rn-2))
| '#' -> let n = n_occ c in if n = 1 then Hash else Hashs (n-2)
| '*' -> let n = n_occ c in if n = 1 then Star else Stars (n-2)
| '-' -> let n = n_occ c in if n = 1 then Minus else Minuss (n-2)
| '+' -> let n = n_occ c in if n = 1 then Plus else Pluss (n-2)
| '`' -> let n = n_occ c in if n = 1 then Backquote else Backquotes (n-2)
| '\'' -> let n = n_occ c in if n = 1 then Quote else Quotes (n-2)
| '"' -> let n = n_occ c in if n = 1 then Doublequote
else Doublequotes (n-2)
| '\\' -> let n = n_occ c in if n = 1 then Backslash
else Backslashs (n-2)
| '_' -> let n = n_occ c in if n = 1 then Underscore
else Underscores (n-2)
| '[' -> let n = n_occ c in if n = 1 then Obracket
else Obrackets (n-2)
| ']' -> let n = n_occ c in if n = 1 then Cbracket else Cbrackets (n-2)
| '{' -> let n = n_occ c in if n = 1 then Obrace else Obraces (n-2)
| '}' -> let n = n_occ c in if n = 1 then Cbrace else Cbraces (n-2)
| '(' -> let n = n_occ c in if n = 1 then Oparenthesis
else Oparenthesiss (n-2)
| ')' -> let n = n_occ c in if n = 1 then Cparenthesis
else Cparenthesiss (n-2)
| ':' -> let n = n_occ c in if n = 1 then Colon else Colons (n-2)
| ';' -> let n = n_occ c in if n = 1 then Semicolon else Semicolons (n-2)
| '>' -> let n = n_occ c in if n = 1 then Greaterthan
else Greaterthans (n-2)
| '~' -> let n = n_occ c in if n = 1 then Tilde else Tildes (n-2)
| '<' -> let n = n_occ c in if n = 1 then Lessthan else Lessthans (n-2)
| '@' -> let n = n_occ c in if n = 1 then At else Ats (n-2)
| '&' -> let n = n_occ c in if n = 1 then Ampersand else Ampersands (n-2)
| '|' -> let n = n_occ c in if n = 1 then Bar else Bars (n-2)
| '^' -> let n = n_occ c in if n = 1 then Caret else Carets (n-2)
| ',' -> let n = n_occ c in if n = 1 then Comma else Commas (n-2)
| '.' -> let n = n_occ c in if n = 1 then Dot else Dots (n-2)
| '/' -> let n = n_occ c in if n = 1 then Slash else Slashs (n-2)
| '$' -> let n = n_occ c in if n = 1 then Dollar else Dollars (n-2)
| '%' -> let n = n_occ c in if n = 1 then Percent else Percents (n-2)
| '=' -> let n = n_occ c in if n = 1 then Equal else Equals (n-2)
| '!' -> let n = n_occ c in if n = 1 then Exclamation
else Exclamations (n-2)
| '?' -> let n = n_occ c in if n = 1 then Question else Questions (n-2)
| '0' .. '9' -> maybe_number()
| c -> word() in
result := w :: !result
done;
List.rev !result
end
module Lex_string = Lex(StringLabels)
let lex = Lex_string.lex
type bigstring = (char,
Bigarray.int8_unsigned_elt,
Bigarray.c_layout) Bigarray.Array1.t
module Bigarray_input : Input with type t = bigstring =
struct
module BA = Bigarray
type t = bigstring
let get = BA.Array1.get
let length = BA.Array1.dim
let sub arr ~pos ~len =
if len < 0 || pos < 0 || pos + len > BA.Array1.dim arr
then invalid_arg "Bigarray_input.sub";
let s = Bytes.create len in
for i = 0 to len - 1 do
Bytes.unsafe_set s i (BA.Array1.unsafe_get arr (i + pos))
done;
Bytes.unsafe_to_string s
end
module Lex_bigarray = Lex(Bigarray_input)
let lex_bigarray = Lex_bigarray.lex
let make_space = function
| 0 -> invalid_arg "Omd_lexer.make_space"
| 1 -> Space
| n -> if n < 0 then invalid_arg "Omd_lexer.make_space" else Spaces (n-2)
(*
(** [string_of_tl l] returns the string representation of l.
[estring_of_tl l] returns the escaped string representation of l
(same semantics as [String.escaped (string_of_tl l)]). *)
let string_of_tl, estring_of_tl =
let g escaped tl =
let b = Buffer.create 42 in
let rec loop : 'a t list -> unit = function
| e::tl ->
Buffer.add_string b (if escaped then String.escaped (string_of_t e)
else string_of_t e);
loop tl
| [] ->
()
in
Buffer.contents (loop tl; b)
in g false, g true
*)
let string_of_tokens tl =
let b = Buffer.create 128 in
List.iter (fun e -> Buffer.add_string b (string_of_token e)) tl;
Buffer.contents b
let destring_of_tokens ?(limit=max_int) tl =
let b = Buffer.create 1024 in
let rec loop (i:int) (tlist:tok list) : unit = match tlist with
| e::tl ->
if limit = i then
loop i []
else
begin
Buffer.add_string b (String.escaped (string_of_token e));
Buffer.add_string b "::";
loop (succ i) tl
end
| [] ->
Buffer.add_string b "[]"
in
Buffer.contents (loop 0 tl; b)
omd-1.3.2/src/omd_lexer.mli 0000664 0000000 0000000 00000003111 14257632064 0015551 0 ustar 00root root 0000000 0000000 type token = Omd_representation.tok
type t = token list
val lex : string -> t
(** Translate a raw string into tokens for the parser. To implement
an extension to the lexer, one may process its result before
giving it to the parser. To implement an extension to the
parser, one may extend it using the constructor [Tag]
from type [tok] and/or using the extensions mechanism
of the parser (cf. the optional argument [extensions]).
The main difference is that [Tag] is processed by the parser
in highest priority whereas functions in [extensions] are applied
with lowest priority. *)
type bigstring = (char,
Bigarray.int8_unsigned_elt,
Bigarray.c_layout) Bigarray.Array1.t
val lex_bigarray : bigstring -> t
(** As {!lex}, but read input from a bigarray rather than from a string. *)
val string_of_tokens : t -> string
(** [string_of_tokens t] return the string corresponding to the token
list [t]. *)
val length : token -> int
(** [length t] number of characters of the string represented as [t]
(i.e. [String.length(string_of_token t)]). *)
val string_of_token : token -> string
(** [string_of_token tk] return the string corresponding to the token
[tk]. *)
val make_space : int -> token
val split_first : token -> token * token
(** [split_first(Xs n)] returns [(X, X(n-1))] where [X] is a token
carrying an int count.
@raise Invalid_argument is passed a single token. *)
val destring_of_tokens : ?limit:int -> t -> string
(** Converts the tokens to a simple string representation useful for
debugging. *)
omd-1.3.2/src/omd_lexer_fs.ml 0000664 0000000 0000000 00000001534 14257632064 0016077 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
(** You should either use this module or Omd_lexer, not both.
This module includes Omd_lexer.
*)
include Omd_lexer
let lex_from_inchannel ic =
(* Maintenance-easiness-driven implementation. *)
let ic_content =
let b = Buffer.create 64 in
try while true do
Buffer.add_char b (input_char ic)
done;
assert false
with End_of_file -> Buffer.contents b in
lex ic_content
omd-1.3.2/src/omd_lexer_fs.mli 0000664 0000000 0000000 00000001042 14257632064 0016242 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
include module type of Omd_lexer
val lex_from_inchannel : in_channel -> Omd_representation.tok list
omd-1.3.2/src/omd_main.ml 0000664 0000000 0000000 00000034232 14257632064 0015215 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
(** This module implements an end-user interface for OMD.
Treatments that are not specific to Markdown (such as table of
contents generation) are done here. If you want to build an
alternative end-user Markdown tool using OMD, you might want to
fork this file or get inspiration from it.
Happy coding!
*)
open Omd
let remove_comments l =
let open Omd_representation in
let rec loop = function
| true, Exclamations n :: tl when n > 0 ->
loop (true,
Omd_utils.eat (function Newline|Newlines _ -> false|_-> true) tl)
| _, (Newline|Newlines _ as e)::tl ->
e::loop (true, tl)
| _, e::tl ->
e::loop (false, tl)
| _, [] -> []
in loop (true, l)
let remove_endline_comments l =
let open Omd_representation in
let rec loop = function
| Backslash :: (Exclamations n as e) :: tl when n > 0 ->
e :: loop tl
| Backslashs b :: (Exclamations n as e) :: tl when n > 0 && b mod 2 = 1 ->
Backslashs(b-1) :: e :: loop tl
| Exclamations n :: tl when n > 0 ->
loop (Omd_utils.eat (function Newline|Newlines _ -> false|_-> true) tl)
| e::tl ->
e::loop tl
| [] -> []
in loop l
let preprocess_functions = ref []
(** [a += b] is a shortcut for [a := b :: !a] // NON-EXPORTED *)
let (+=) a b = a := b :: !a
let preprocess l =
List.fold_left (fun r e -> e r)
l
!preprocess_functions
let otoc = ref false
let toc = ref false
let omarkdown = ref false
let notags = ref false
let toc_depth = ref 2
let toc_start = ref([]: int list)
let nl2br = ref false
let protect_html_comments = ref false
let code_stylist =
let module M = Map.Make(String) in
object
val mutable stylists =
M.empty
method style ~lang code =
try (M.find lang stylists) code
with Not_found ->
try (M.find "_" stylists) code
with Not_found -> code
method register ~lang stylist =
stylists <- M.add lang stylist stylists
end
let code_stylist_of_program p =
fun code ->
let tmp1 = Filename.temp_file "code" "bef" in
let tmp2 = Filename.temp_file "code" "aft" in
let () = at_exit (fun () -> Sys.remove tmp1; Sys.remove tmp2) in
let otmp1 = open_out_bin tmp1 in
Printf.fprintf otmp1 "%s%!" code;
close_out otmp1;
match Sys.command (Printf.sprintf "( cat %s | %s ) > %s" tmp1 p tmp2) with
| 0 ->
let cat f =
let ic = open_in f in
let b = Buffer.create 64 in
try
while true do
Buffer.add_char b (input_char ic)
done;
assert false
with End_of_file -> Buffer.contents b
in
cat tmp2
| _ -> code
let register_code_stylist_of_program x =
try
let i = String.index x '=' in
code_stylist#register
~lang:(String.sub x 0 i)
(code_stylist_of_program
(String.sub x (i+1) (String.length x - (i+1))))
with Not_found | Invalid_argument _ ->
Printf.eprintf "Error: Something wrong with [-r %s]\n" x;
exit 1
let register_default_language l =
Omd_backend.default_language := l
(* HTML comments might contain some double-dash (--) that are not well
treated by HTML parsers. For instance "" should be
translated to "" when we want to ensure that
the generated HTML is correct! *)
let patch_html_comments l =
let htmlcomments s =
let b = Buffer.create (String.length s) in
for i = 0 to 3 do
Buffer.add_char b s.[i]
done;
for i = 4 to String.length s - 4 do
match s.[i] with
| '-' as c ->
if (i > 4 && s.[i-1] = '-')
|| (i < String.length s - 5 && s.[i+1] = '-')
then
Printf.bprintf b "%d;" (int_of_char c)
else
Buffer.add_char b c
| c -> Buffer.add_char b c
done;
for i = String.length s - 3 to String.length s - 1 do
Buffer.add_char b s.[i]
done;
Buffer.contents b
in
let rec loop accu = function
| Html_comment s :: tl ->
loop (Html_comment(htmlcomments s)::accu) tl
| e :: tl ->
loop (e :: accu) tl
| [] -> List.rev accu
in loop [] l
let tag_toc l =
let open Omd_representation in
let x =
object(self)
(* [shield] is used to prevent endless loops.
If one wants to use system threads at some point,
and calls methods of this object concurrently,
then there is a real problem. *)
val remove = fun e md ->
visit
(function X(v) when v==e-> Some[] | _ -> None)
md
method name = "toc"
method to_html ?indent:_ f md =
let r = f (Omd.toc(remove self md)) in
Some r
method to_sexpr f md =
let r = f (Omd.toc(remove self md)) in
Some r
method to_t md =
let r = (Omd.toc(remove self md)) in
Some r
end
in
let rec loop = function
| Star::
Word "Table"::Space::
Word "of"::Space::
Word "contents"::Star::tl ->
Tag("tag_toc",
object
method parser_extension r p l =
Some(X(x)::r,p,l)
method to_string = ""
end
) :: loop tl
| e::tl -> e::loop tl
| [] -> []
in loop l
let split_comma_int_list s =
if s = "" then []
else (
let l = ref [] in
let i = ref 0 in
try
while true do
let j = String.index_from s !i ',' in
l := int_of_string(String.sub s !i (j - !i)) :: !l;
i := j + 1
done;
assert false
with Not_found ->
l := (int_of_string(String.sub s !i (String.length s - !i))) :: !l;
List.rev !l
)
module E = Omd_parser.Default_env(struct end)
let omd_gh_uemph_or_bold_style =
ref E.gh_uemph_or_bold_style
let omd_blind_html =
ref E.blind_html
let omd_strict_html =
ref E.strict_html
let omd_warning = ref E.warning
let omd_warn_error = ref E.warn_error
let list_html_tags ~inline =
let module Parser = Omd_parser.Make(E)
in
if inline then
Omd_utils.StringSet.iter
(fun e -> print_string e; print_char '\n')
Parser.inline_htmltags_set
else
Omd_utils.StringSet.iter
(fun e -> print_string e; print_char '\n')
Parser.htmltags_set
let verbatim_start = ref ""
let verbatim_end = ref ""
let lex_with_verb_extension s =
if !verbatim_start = "" || !verbatim_end = "" then
Omd_lexer.lex s
else
begin
let module M = struct
type t = Verb of string | To_lex of string
end in
let open M in
let sl = String.length s
and stl = String.length !verbatim_start
and enl = String.length !verbatim_end in
let rec seek_start accu from i =
if i + stl + enl > sl then
To_lex(String.sub s from (sl - from))::accu
else if String.sub s i stl = !verbatim_start then
seek_end
(To_lex(String.sub s from (i - from))::accu)
(i+stl)
(i+stl)
else seek_start accu from (i+1)
and seek_end accu from i =
if i + enl > sl then
To_lex(String.sub s from (sl - from))::accu
else if String.sub s i enl = !verbatim_end then
seek_start
(Verb(String.sub s from (i - from))::accu)
(i+enl)
(i+enl)
else seek_end accu from (i+1)
in
let first_pass () = seek_start [] 0 0 in
let second_pass l =
List.rev_map
(function
| To_lex x ->
Omd_lexer.lex x
| Verb x ->
[Omd_representation.Tag(
"raw",
object
method parser_extension r p l =
match p with
| [] | [Omd_representation.Newlines _] ->
Some(Raw_block x :: r, [Omd_representation.Space], l)
| _ ->
Some(Raw x :: r, [Omd_representation.Space], l)
method to_string = x
end
)]
)
l
in
List.flatten(second_pass(first_pass()))
end
let main () =
let input = ref []
and output = ref ""
in
Arg.(
parse
(align[
"-o", Set_string output,
"file.html Specify the output file (default is stdout).";
"--", Rest(fun s -> input := s :: !input),
" Consider all remaining arguments as input file names.";
"-u", Clear(omd_gh_uemph_or_bold_style),
" Use standard Markdown style for emph/bold when using `_'.";
"-c", Unit(fun () -> preprocess_functions += remove_endline_comments),
" Ignore lines that start with `!!!' (3 or more exclamation points).";
"-C", Unit(fun () -> preprocess_functions += remove_comments),
" Ignore everything on a line after `!!!' \
(3 or more exclamation points).";
"-m", Set(omarkdown), " Output Markdown instead of HTML.";
"-notags", Set(notags), " Output without the HTML tags.";
"-toc", Set(toc),
" Replace `*Table of contents*' by the table of contents.";
"-otoc", Set(otoc), " Output only the table of contents.";
"-ts", String(fun l -> toc_start := split_comma_int_list l),
"f Section for the Table of contents (default: all).";
"-td", Set_int(toc_depth), "f Table of contents depth (default is 2).";
"-H", Set(protect_html_comments), " Protect HTML comments.";
"-r", String(register_code_stylist_of_program),
"l=p Register program p as a code highlighter for language l.";
"-R", String(register_default_language),
"l Registers unknown languages to be l instead of void.";
"-nl2br", Set(nl2br), " Convert new lines to
.";
"-x", String(ignore),
"ext Activate extension ext (not yet implemented).";
"-l", Unit ignore,
" List available extensions ext (not yet implemented).";
"-b", Set(omd_blind_html),
" Don't check validity of HTML tag names.";
"-s", Set(omd_strict_html),
" (might not work as expected yet) Block HTML only in block HTML, \
inline HTML only in inline HTML \
(semantics undefined if use both -b and -s).";
"-LHTML", Unit(fun () -> list_html_tags ~inline:false; exit 0),
" List all known HTML tags";
"-LHTMLi", Unit(fun () -> list_html_tags ~inline:true; exit 0),
" List all known inline HTML tags";
"-version", Unit(fun () -> print_endline "This is version VERSION.";
exit 0), " Print version.";
"-VS", Set_string(verbatim_start),
"start Set the start token to use to declare a verbatim section. \
If you use -VE, you must use -VS, and both must be non-empty.";
"-VE", Set_string(verbatim_end),
"end Set the end token to use to declare a verbatim section. \
If you use -VE, you must use -VS, and both must be non-empty.";
"-w", Set(omd_warning),
" Activate warnings (beta).";
"-W", Set(omd_warn_error),
" Convert warnings to errors, implies -w (beta).";
])
(fun s -> input := s :: !input)
"omd [options] [inputfile1 .. inputfileN] [options]"
);
let input_files =
if !input = [] then
[stdin]
else
List.rev_map (open_in) !input
in
let output =
if !output = "" then
stdout
else
open_out_bin !output
in
List.iter (fun ic ->
let b = Buffer.create 64 in
try while true do
Buffer.add_char b (input_char ic)
done; assert false
with End_of_file ->
let lexed = lex_with_verb_extension(Buffer.contents b) in
let preprocessed = preprocess (if !toc then tag_toc lexed else lexed) in
let module E = Omd_parser.Default_env(struct end) in
let module Parser = Omd_parser.Make(
struct
include E
let warning = !omd_warning || !omd_warn_error
let warn_error = !omd_warn_error
let gh_uemph_or_bold_style = !omd_gh_uemph_or_bold_style
let blind_html = !omd_blind_html
let strict_html = !omd_strict_html
end)
in
let parsed1 = Parser.parse preprocessed in
let parsed2 =
if !protect_html_comments then
patch_html_comments parsed1
else
parsed1
in
let parsed = parsed2 in
let o1 = (* make either TOC or paragraphs, or leave as it is *)
(if !otoc then Omd.toc ~start:!toc_start ~depth:!toc_depth
else Parser.make_paragraphs)
parsed in
let o2 = (* output either Text or HTML, or markdown *)
if !notags then to_text o1
else if !omarkdown then to_markdown o1
else if !toc && not !otoc then
to_html
~pindent:true ~nl2br:false ~cs:code_stylist#style
(* FIXME: this is a quick fix for -toc which doesn't work
if to_html is directly applied to o1, and that seems to have
something to do with Parser.make_paragraphs, which seems to
prevent tag_toc from working properly when using to_html!
*)
(Parser.make_paragraphs(Parser.parse(Omd_lexer.lex(to_markdown o1))))
else
to_html
~pindent:true ~nl2br:false ~cs:code_stylist#style
(* The normal behaviour is to convert directly, like this. *)
o1
in
output_string output o2;
if o2 <> "" && o2.[String.length o2 - 1] <> '\n' then
output_char output '\n';
flush output;
if false && Omd_utils.debug then
print_endline
(Omd_backend.sexpr_of_md
(Omd_parser.default_parse
(preprocess(Omd_lexer.lex (Buffer.contents b)))));
)
input_files
(* call the main function *)
let () =
try
main ()
with
| Omd_utils.Error msg when not Omd_utils.debug ->
Printf.eprintf "(OMD) Error: %s\n" msg;
exit 1
| Sys_error msg ->
Printf.eprintf "Error: %s\n" msg;
exit 1
omd-1.3.2/src/omd_main.mli 0000664 0000000 0000000 00000005652 14257632064 0015372 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
val remove_comments : Omd_representation.tok list -> Omd_representation.tok list
(** [remove_comments l] returns [l] without OMD comments. *)
val remove_endline_comments :
Omd_representation.tok list -> Omd_representation.tok list
(** [remove_endline_comments l] returns [l] without OMD endline-comments. *)
val preprocess_functions :
(Omd_representation.tok list -> Omd_representation.tok list) list ref
(** [preprocess_functions] contains the list of preprocessing functions *)
val preprocess : Omd_representation.tok list -> Omd_representation.tok list
(** [preprocess l] returns [l] to which all preprocessing functions
(in reference [preprocess_functions]) have been applied. *)
val otoc : bool ref
(** flag: output the table of contents only. *)
val toc : bool ref
(** flag: replace "*Table of contents*" by the table of contents. *)
val omarkdown : bool ref
(** flag: output Markdown instead of HTML. *)
val notags : bool ref
(** flag: output HTML but without HTML tags, so it's not really HTML anymore. *)
val toc_depth : int ref
(** flag: depth of table of contents *)
val toc_start : int list ref
(** flag: first header level for table of contents *)
val nl2br : bool ref
(** flag: convert newlines to "
" when output is HTML *)
val omd_gh_uemph_or_bold_style : bool ref
(** flag: set on the command line, used for instanciating the
functor Omd_parser.Make *)
val omd_blind_html : bool ref
(** flag: set on the command line, used for instanciating the
functor Omd_parser.Make *)
val omd_strict_html : bool ref
(** flag: set on the command line, used for instanciating the
functor Omd_parser.Make *)
val protect_html_comments : bool ref
(** flag: for multiple dashes in HTML comments, replace dashes by - *)
val patch_html_comments : Omd.element list -> Omd.element list
(** [patch_html_comments l] returns the list [l] where
all [Html_comments s] have been converted to [Html_comments s'],
where [s'] means [s] with dashes replaced by - except for
single dashes (which are left untouched).
N.B. It seems that it's not valid to have double dashes inside HTML comments
(cf. http://validator.w3.org/check). So one way to make life somewhat easier
is to patch the comments and transform inner dashed to -. *)
val tag_toc : Omd_representation.tok list -> Omd_representation.tok list
(** [tag_toc l] returns [l] where *Table of contents* has been replaced
by a tag that can generate a table of contents. *)
val main : unit -> unit
(** main function *)
omd-1.3.2/src/omd_parser.ml 0000664 0000000 0000000 00000524210 14257632064 0015565 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013-2014 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
let sdebug = true
open Printf
open Omd_representation
open Omd_utils
module L = Omd_lexer
type r = Omd_representation.t
(** accumulator (beware, reversed tokens) *)
and p = Omd_representation.tok list
(** context information: previous elements *)
and l = Omd_representation.tok list
(** tokens to parse *)
and main_loop =
?html:bool ->
r -> (* accumulator (beware, reversed tokens) *)
p -> (* info: previous elements *)
l -> (* tokens to parse *)
Omd_representation.t (* final result *)
(** most important loop *)
(** N.B. Please do not use tabulations in your Markdown file! *)
module type Env = sig
val rc: Omd_representation.ref_container
val extensions : Omd_representation.extensions
val default_lang : string
val gh_uemph_or_bold_style : bool
val blind_html : bool
val strict_html : bool
val warning : bool
val warn_error : bool
end
module Unit = struct end
module Default_env (Unit:sig end) : Env = struct
let rc = new Omd_representation.ref_container
let extensions = []
let default_lang = ""
let gh_uemph_or_bold_style = true
let blind_html = false
let strict_html = false
let warning = false
let warn_error = false
end
module Make (Env:Env) =
struct
include Env
let warn = Omd_utils.warn ~we:warn_error
(** set of known HTML codes *)
let htmlcodes_set = StringSet.of_list (* This list should be checked... *)
(* list extracted from: http://www.w3.org/TR/html4/charset.html *)
[ "AElig"; "Aacute"; "Acirc"; "Agrave"; "Alpha"; "Aring"; "Atilde";
"Auml"; "Beta"; "Ccedil"; "Chi"; "Dagger"; "Delta"; "ETH"; "Eacute";
"Ecirc"; "Egrave"; "Epsilon"; "Eta"; "Euml"; "Gamma"; "Iacute";
"Icirc"; "Igrave"; "Iota"; "Iuml"; "Kappa"; "Lambda"; "Mu"; "Ntilde";
"Nu"; "OElig"; "Oacute"; "Ocirc"; "Ograve"; "Omega"; "Omicron";
"Oslash"; "Otilde"; "Ouml"; "Phi"; "Pi"; "Prime"; "Psi"; "Rho";
"Scaron"; "Sigma"; "THORN"; "Tau"; "Theta"; "Uacute"; "Ucirc";
"Ugrave"; "Upsilon"; "Uuml"; "Xi"; "Yacute"; "Yuml"; "Zeta"; "aacute";
"acirc"; "acute"; "aelig"; "agrave"; "alefsym"; "alpha"; "amp"; "and";
"ang"; "aring"; "asymp"; "atilde"; "auml"; "bdquo"; "beta"; "brvbar";
"bull"; "cap"; "ccedil"; "cedil"; "cent"; "chi"; "circ"; "clubs";
"cong"; "copy"; "crarr"; "cup"; "curren"; "dArr"; "dagger"; "darr";
"deg"; "delta"; "diams"; "divide"; "eacute"; "ecirc"; "egrave";
"empty"; "emsp"; "ensp"; "epsilon"; "equiv"; "eta"; "eth"; "euml";
"euro"; "exist"; "fnof"; "forall"; "frac12"; "frac14"; "frac34";
"frasl"; "gamma"; "ge"; "gt"; "hArr"; "harr"; "hearts"; "hellip";
"iacute"; "icirc"; "iexcl"; "igrave"; "image"; "infin"; "int"; "iota";
"iquest"; "isin"; "iuml"; "kappa"; "lArr"; "lambda"; "lang"; "laquo";
"larr"; "lceil"; "ldquo"; "le"; "lfloor"; "lowast"; "loz"; "lrm";
"lsaquo"; "lsquo"; "lt"; "macr"; "mdash"; "micro"; "middot"; "minus";
"mu"; "nabla"; "nbsp"; "ndash"; "ne"; "ni"; "not"; "notin"; "nsub";
"ntilde"; "nu"; "oacute"; "ocirc"; "oelig"; "ograve"; "oline";
"omega"; "omicron"; "oplus"; "or"; "ordf"; "ordm"; "oslash"; "otilde";
"otimes"; "ouml"; "para"; "part"; "permil"; "perp"; "phi"; "pi";
"piv"; "plusmn"; "pound"; "prime"; "prod"; "prop"; "psi"; "quot";
"rArr"; "radic"; "rang"; "raquo"; "rarr"; "rceil"; "rdquo"; "real";
"reg"; "rfloor"; "rho"; "rlm"; "rsaquo"; "rsquo"; "sbquo"; "scaron";
"sdot"; "sect"; "shy"; "sigma"; "sigmaf"; "sim"; "spades"; "sub";
"sube"; "sum"; "sup"; "sup1"; "sup2"; "sup3"; "supe"; "szlig"; "tau";
"there4"; "theta"; "thetasym"; "thinsp"; "thorn"; "tilde"; "times";
"trade"; "uArr"; "uacute"; "uarr"; "ucirc"; "ugrave"; "uml"; "upsih";
"upsilon"; "uuml"; "weierp"; "xi"; "yacute"; "yen"; "yuml"; "zeta";
"zwj"; "zwnj"; ]
(** set of known inline HTML tags *)
let inline_htmltags_set =
(StringSet.of_list
(* from https://developer.mozilla.org/en-US/docs/HTML/Inline_elements *)
[ "b";"big";"i";"small";"tt";
"abbr";"acronym";"cite";"code";"dfn";"em";"kbd";"strong";"samp";"var";
"a";"bdo";"br";"img";"map";"object";"q";"span";"sub";"sup";
"button";"input";"label";"select";"textarea";])
(** N.B. it seems that there is no clear distinction between inline
tags and block-level tags: in HTML4 it was not clear, in HTML5
it's even more complicated. So, the choice *here* is to specify
a set of tags considered as "inline", cf. [inline_htmltags_set].
So there will be inline tags, non-inline tags, and unknown
tags.*)
(** set of HTML tags that may appear out of a body *)
let notinbodytags = StringSet.of_list
[
"title";
"link";
"meta";
"style";
"html";
"head";
"body";
]
(** All known HTML tags *)
let htmltags_set =
StringSet.union notinbodytags
(StringSet.union inline_htmltags_set
(StringSet.of_list
[
"a";"abbr";"acronym";"address";"applet";"area";"article";"aside"
;"audio";"b";"base";"basefont";"bdi";"bdo";"big";"blockquote"
;"br";"button";"canvas";"caption";"center";"cite";"code";"col"
;"colgroup";"command";"datalist";"dd";"del";"details";"dfn"
;"dialog";"dir";"div";"dl";"dt";"em";"embed";"fieldset"
;"figcaption";"figure";"font";"footer";"form";"frame";"frameset"
;"h2";"h3";"h4";"h5";"h6"
;"h1";"header";"hr";"i";"iframe";"img";"input";"ins";"kbd"
;"keygen";"label";"legend";"li";"map";"mark";"menu";"meter";"nav"
;"noframes";"noscript";"object";"ol";"optgroup";"option";"output"
;"p";"param";"pre";"progress";"q";"rp";"rt";"ruby";"s";"samp"
;"script";"section";"select";"small";"source";"span";"strike"
;"strong";"style";"sub";"summary";"sup";"table";"tbody";"td"
;"textarea";"tfoot";"th";"thead";"time";"tr";"track";"tt";"u"
;"ul";"var";"video";"wbr"
]))
(** This functions fixes bad lexing trees, which may be built when
extraction a portion of another lexing tree. *)
let fix l =
let rec loop accu = function
(* code to generate what follows...
List.iter (fun e ->
Printf.printf "
| %s::%s::tl ->
if trackfix then eprintf \"%s 1\\n%!\";
loop accu (%ss 0::tl)
| %ss n::%s::tl ->
if trackfix then eprintf \"%s 2\\n%!\";
loop accu (%ss(n+1)::tl)
| %s::%ss n::tl ->
if trackfix then eprintf \"%s 3\\n%!\";
loop accu (%ss(n+1)::tl)
| %ss a::%ss b::tl ->
if trackfix then eprintf \"%s 4\\n%!\";
loop accu (%ss(a+b+2)::tl)"
e e e e e e e e e e e e e e e e)
["Ampersand"; "At"; "Backquote"; "Backslash"; "Bar"; "Caret"; "Cbrace"; "Colon"; "Comma"; "Cparenthesis"; "Cbracket"; "Dollar"; "Dot"; "Doublequote"; "Exclamation"; "Equal"; "Greaterthan"; "Hash"; "Lessthan"; "Minus"; "Newline"; "Obrace"; "Oparenthesis"; "Obracket"; "Percent"; "Plus"; "Question"; "Quote"; "Semicolon"; "Slash"; "Space"; "Star"; "Tab"; "Tilde"; "Underscore"];
print_string "| x::tl -> loop (x::accu) tl\n| [] -> List.rev accu\n"; *)
| Ampersand::Ampersand::tl ->
if trackfix then eprintf "(OMD) Ampersand 1\n";
loop accu (Ampersands 0::tl)
| Ampersands n::Ampersand::tl ->
if trackfix then eprintf "(OMD) Ampersand 2\n";
loop accu (Ampersands(n+1)::tl)
| Ampersand::Ampersands n::tl ->
if trackfix then eprintf "(OMD) Ampersand 3\n";
loop accu (Ampersands(n+1)::tl)
| Ampersands a::Ampersands b::tl ->
if trackfix then eprintf "(OMD) Ampersand 4\n";
loop accu (Ampersands(a+b+2)::tl)
| At::At::tl ->
if trackfix then eprintf "(OMD) At 1\n";
loop accu (Ats 0::tl)
| Ats n::At::tl ->
if trackfix then eprintf "(OMD) At 2\n";
loop accu (Ats(n+1)::tl)
| At::Ats n::tl ->
if trackfix then eprintf "(OMD) At 3\n";
loop accu (Ats(n+1)::tl)
| Ats a::Ats b::tl ->
if trackfix then eprintf "(OMD) At 4\n";
loop accu (Ats(a+b+2)::tl)
| Backquote::Backquote::tl ->
if trackfix then eprintf "(OMD) Backquote 1\n";
loop accu (Backquotes 0::tl)
| Backquotes n::Backquote::tl ->
if trackfix then eprintf "(OMD) Backquote 2\n";
loop accu (Backquotes(n+1)::tl)
| Backquote::Backquotes n::tl ->
if trackfix then eprintf "(OMD) Backquote 3\n";
loop accu (Backquotes(n+1)::tl)
| Backquotes a::Backquotes b::tl ->
if trackfix then eprintf "(OMD) Backquote 4\n";
loop accu (Backquotes(a+b+2)::tl)
| Backslash::Backslash::tl ->
if trackfix then eprintf "(OMD) Backslash 1\n";
loop accu (Backslashs 0::tl)
| Backslashs n::Backslash::tl ->
if trackfix then eprintf "(OMD) Backslash 2\n";
loop accu (Backslashs(n+1)::tl)
| Backslash::Backslashs n::tl ->
if trackfix then eprintf "(OMD) Backslash 3\n";
loop accu (Backslashs(n+1)::tl)
| Backslashs a::Backslashs b::tl ->
if trackfix then eprintf "(OMD) Backslash 4\n";
loop accu (Backslashs(a+b+2)::tl)
| Bar::Bar::tl ->
if trackfix then eprintf "(OMD) Bar 1\n";
loop accu (Bars 0::tl)
| Bars n::Bar::tl ->
if trackfix then eprintf "(OMD) Bar 2\n";
loop accu (Bars(n+1)::tl)
| Bar::Bars n::tl ->
if trackfix then eprintf "(OMD) Bar 3\n";
loop accu (Bars(n+1)::tl)
| Bars a::Bars b::tl ->
if trackfix then eprintf "(OMD) Bar 4\n";
loop accu (Bars(a+b+2)::tl)
| Caret::Caret::tl ->
if trackfix then eprintf "(OMD) Caret 1\n";
loop accu (Carets 0::tl)
| Carets n::Caret::tl ->
if trackfix then eprintf "(OMD) Caret 2\n";
loop accu (Carets(n+1)::tl)
| Caret::Carets n::tl ->
if trackfix then eprintf "(OMD) Caret 3\n";
loop accu (Carets(n+1)::tl)
| Carets a::Carets b::tl ->
if trackfix then eprintf "(OMD) Caret 4\n";
loop accu (Carets(a+b+2)::tl)
| Cbrace::Cbrace::tl ->
if trackfix then eprintf "(OMD) Cbrace 1\n";
loop accu (Cbraces 0::tl)
| Cbraces n::Cbrace::tl ->
if trackfix then eprintf "(OMD) Cbrace 2\n";
loop accu (Cbraces(n+1)::tl)
| Cbrace::Cbraces n::tl ->
if trackfix then eprintf "(OMD) Cbrace 3\n";
loop accu (Cbraces(n+1)::tl)
| Cbraces a::Cbraces b::tl ->
if trackfix then eprintf "(OMD) Cbrace 4\n";
loop accu (Cbraces(a+b+2)::tl)
| Colon::Colon::tl ->
if trackfix then eprintf "(OMD) Colon 1\n";
loop accu (Colons 0::tl)
| Colons n::Colon::tl ->
if trackfix then eprintf "(OMD) Colon 2\n";
loop accu (Colons(n+1)::tl)
| Colon::Colons n::tl ->
if trackfix then eprintf "(OMD) Colon 3\n";
loop accu (Colons(n+1)::tl)
| Colons a::Colons b::tl ->
if trackfix then eprintf "(OMD) Colon 4\n";
loop accu (Colons(a+b+2)::tl)
| Comma::Comma::tl ->
if trackfix then eprintf "(OMD) Comma 1\n";
loop accu (Commas 0::tl)
| Commas n::Comma::tl ->
if trackfix then eprintf "(OMD) Comma 2\n";
loop accu (Commas(n+1)::tl)
| Comma::Commas n::tl ->
if trackfix then eprintf "(OMD) Comma 3\n";
loop accu (Commas(n+1)::tl)
| Commas a::Commas b::tl ->
if trackfix then eprintf "(OMD) Comma 4\n";
loop accu (Commas(a+b+2)::tl)
| Cparenthesis::Cparenthesis::tl ->
if trackfix then eprintf "(OMD) Cparenthesis 1\n";
loop accu (Cparenthesiss 0::tl)
| Cparenthesiss n::Cparenthesis::tl ->
if trackfix then eprintf "(OMD) Cparenthesis 2\n";
loop accu (Cparenthesiss(n+1)::tl)
| Cparenthesis::Cparenthesiss n::tl ->
if trackfix then eprintf "(OMD) Cparenthesis 3\n";
loop accu (Cparenthesiss(n+1)::tl)
| Cparenthesiss a::Cparenthesiss b::tl ->
if trackfix then eprintf "(OMD) Cparenthesis 4\n";
loop accu (Cparenthesiss(a+b+2)::tl)
| Cbracket::Cbracket::tl ->
if trackfix then eprintf "(OMD) Cbracket 1\n";
loop accu (Cbrackets 0::tl)
| Cbrackets n::Cbracket::tl ->
if trackfix then eprintf "(OMD) Cbracket 2\n";
loop accu (Cbrackets(n+1)::tl)
| Cbracket::Cbrackets n::tl ->
if trackfix then eprintf "(OMD) Cbracket 3\n";
loop accu (Cbrackets(n+1)::tl)
| Cbrackets a::Cbrackets b::tl ->
if trackfix then eprintf "(OMD) Cbracket 4\n";
loop accu (Cbrackets(a+b+2)::tl)
| Dollar::Dollar::tl ->
if trackfix then eprintf "(OMD) Dollar 1\n";
loop accu (Dollars 0::tl)
| Dollars n::Dollar::tl ->
if trackfix then eprintf "(OMD) Dollar 2\n";
loop accu (Dollars(n+1)::tl)
| Dollar::Dollars n::tl ->
if trackfix then eprintf "(OMD) Dollar 3\n";
loop accu (Dollars(n+1)::tl)
| Dollars a::Dollars b::tl ->
if trackfix then eprintf "(OMD) Dollar 4\n";
loop accu (Dollars(a+b+2)::tl)
| Dot::Dot::tl ->
if trackfix then eprintf "(OMD) Dot 1\n";
loop accu (Dots 0::tl)
| Dots n::Dot::tl ->
if trackfix then eprintf "(OMD) Dot 2\n";
loop accu (Dots(n+1)::tl)
| Dot::Dots n::tl ->
if trackfix then eprintf "(OMD) Dot 3\n";
loop accu (Dots(n+1)::tl)
| Dots a::Dots b::tl ->
if trackfix then eprintf "(OMD) Dot 4\n";
loop accu (Dots(a+b+2)::tl)
| Doublequote::Doublequote::tl ->
if trackfix then eprintf "(OMD) Doublequote 1\n";
loop accu (Doublequotes 0::tl)
| Doublequotes n::Doublequote::tl ->
if trackfix then eprintf "(OMD) Doublequote 2\n";
loop accu (Doublequotes(n+1)::tl)
| Doublequote::Doublequotes n::tl ->
if trackfix then eprintf "(OMD) Doublequote 3\n";
loop accu (Doublequotes(n+1)::tl)
| Doublequotes a::Doublequotes b::tl ->
if trackfix then eprintf "(OMD) Doublequote 4\n";
loop accu (Doublequotes(a+b+2)::tl)
| Exclamation::Exclamation::tl ->
if trackfix then eprintf "(OMD) Exclamation 1\n";
loop accu (Exclamations 0::tl)
| Exclamations n::Exclamation::tl ->
if trackfix then eprintf "(OMD) Exclamation 2\n";
loop accu (Exclamations(n+1)::tl)
| Exclamation::Exclamations n::tl ->
if trackfix then eprintf "(OMD) Exclamation 3\n";
loop accu (Exclamations(n+1)::tl)
| Exclamations a::Exclamations b::tl ->
if trackfix then eprintf "(OMD) Exclamation 4\n";
loop accu (Exclamations(a+b+2)::tl)
| Equal::Equal::tl ->
if trackfix then eprintf "(OMD) Equal 1\n";
loop accu (Equals 0::tl)
| Equals n::Equal::tl ->
if trackfix then eprintf "(OMD) Equal 2\n";
loop accu (Equals(n+1)::tl)
| Equal::Equals n::tl ->
if trackfix then eprintf "(OMD) Equal 3\n";
loop accu (Equals(n+1)::tl)
| Equals a::Equals b::tl ->
if trackfix then eprintf "(OMD) Equal 4\n";
loop accu (Equals(a+b+2)::tl)
| Greaterthan::Greaterthan::tl ->
if trackfix then eprintf "(OMD) Greaterthan 1\n";
loop accu (Greaterthans 0::tl)
| Greaterthans n::Greaterthan::tl ->
if trackfix then eprintf "(OMD) Greaterthan 2\n";
loop accu (Greaterthans(n+1)::tl)
| Greaterthan::Greaterthans n::tl ->
if trackfix then eprintf "(OMD) Greaterthan 3\n";
loop accu (Greaterthans(n+1)::tl)
| Greaterthans a::Greaterthans b::tl ->
if trackfix then eprintf "(OMD) Greaterthan 4\n";
loop accu (Greaterthans(a+b+2)::tl)
| Hash::Hash::tl ->
if trackfix then eprintf "(OMD) Hash 1\n";
loop accu (Hashs 0::tl)
| Hashs n::Hash::tl ->
if trackfix then eprintf "(OMD) Hash 2\n";
loop accu (Hashs(n+1)::tl)
| Hash::Hashs n::tl ->
if trackfix then eprintf "(OMD) Hash 3\n";
loop accu (Hashs(n+1)::tl)
| Hashs a::Hashs b::tl ->
if trackfix then eprintf "(OMD) Hash 4\n";
loop accu (Hashs(a+b+2)::tl)
| Lessthan::Lessthan::tl ->
if trackfix then eprintf "(OMD) Lessthan 1\n";
loop accu (Lessthans 0::tl)
| Lessthans n::Lessthan::tl ->
if trackfix then eprintf "(OMD) Lessthan 2\n";
loop accu (Lessthans(n+1)::tl)
| Lessthan::Lessthans n::tl ->
if trackfix then eprintf "(OMD) Lessthan 3\n";
loop accu (Lessthans(n+1)::tl)
| Lessthans a::Lessthans b::tl ->
if trackfix then eprintf "(OMD) Lessthan 4\n";
loop accu (Lessthans(a+b+2)::tl)
| Minus::Minus::tl ->
if trackfix then eprintf "(OMD) Minus 1\n";
loop accu (Minuss 0::tl)
| Minuss n::Minus::tl ->
if trackfix then eprintf "(OMD) Minus 2\n";
loop accu (Minuss(n+1)::tl)
| Minus::Minuss n::tl ->
if trackfix then eprintf "(OMD) Minus 3\n";
loop accu (Minuss(n+1)::tl)
| Minuss a::Minuss b::tl ->
if trackfix then eprintf "(OMD) Minus 4\n";
loop accu (Minuss(a+b+2)::tl)
| Newline::Newline::tl ->
if trackfix then eprintf "(OMD) Newline 1\n";
loop accu (Newlines 0::tl)
| Newlines n::Newline::tl ->
if trackfix then eprintf "(OMD) Newline 2\n";
loop accu (Newlines(n+1)::tl)
| Newline::Newlines n::tl ->
if trackfix then eprintf "(OMD) Newline 3\n";
loop accu (Newlines(n+1)::tl)
| Newlines a::Newlines b::tl ->
if trackfix then eprintf "(OMD) Newline 4\n";
loop accu (Newlines(a+b+2)::tl)
| Obrace::Obrace::tl ->
if trackfix then eprintf "(OMD) Obrace 1\n";
loop accu (Obraces 0::tl)
| Obraces n::Obrace::tl ->
if trackfix then eprintf "(OMD) Obrace 2\n";
loop accu (Obraces(n+1)::tl)
| Obrace::Obraces n::tl ->
if trackfix then eprintf "(OMD) Obrace 3\n";
loop accu (Obraces(n+1)::tl)
| Obraces a::Obraces b::tl ->
if trackfix then eprintf "(OMD) Obrace 4\n";
loop accu (Obraces(a+b+2)::tl)
| Oparenthesis::Oparenthesis::tl ->
if trackfix then eprintf "(OMD) Oparenthesis 1\n";
loop accu (Oparenthesiss 0::tl)
| Oparenthesiss n::Oparenthesis::tl ->
if trackfix then eprintf "(OMD) Oparenthesis 2\n";
loop accu (Oparenthesiss(n+1)::tl)
| Oparenthesis::Oparenthesiss n::tl ->
if trackfix then eprintf "(OMD) Oparenthesis 3\n";
loop accu (Oparenthesiss(n+1)::tl)
| Oparenthesiss a::Oparenthesiss b::tl ->
if trackfix then eprintf "(OMD) Oparenthesis 4\n";
loop accu (Oparenthesiss(a+b+2)::tl)
| Obracket::Obracket::tl ->
if trackfix then eprintf "(OMD) Obracket 1\n";
loop accu (Obrackets 0::tl)
| Obrackets n::Obracket::tl ->
if trackfix then eprintf "(OMD) Obracket 2\n";
loop accu (Obrackets(n+1)::tl)
| Obracket::Obrackets n::tl ->
if trackfix then eprintf "(OMD) Obracket 3\n";
loop accu (Obrackets(n+1)::tl)
| Obrackets a::Obrackets b::tl ->
if trackfix then eprintf "(OMD) Obracket 4\n";
loop accu (Obrackets(a+b+2)::tl)
| Percent::Percent::tl ->
if trackfix then eprintf "(OMD) Percent 1\n";
loop accu (Percents 0::tl)
| Percents n::Percent::tl ->
if trackfix then eprintf "(OMD) Percent 2\n";
loop accu (Percents(n+1)::tl)
| Percent::Percents n::tl ->
if trackfix then eprintf "(OMD) Percent 3\n";
loop accu (Percents(n+1)::tl)
| Percents a::Percents b::tl ->
if trackfix then eprintf "(OMD) Percent 4\n";
loop accu (Percents(a+b+2)::tl)
| Plus::Plus::tl ->
if trackfix then eprintf "(OMD) Plus 1\n";
loop accu (Pluss 0::tl)
| Pluss n::Plus::tl ->
if trackfix then eprintf "(OMD) Plus 2\n";
loop accu (Pluss(n+1)::tl)
| Plus::Pluss n::tl ->
if trackfix then eprintf "(OMD) Plus 3\n";
loop accu (Pluss(n+1)::tl)
| Pluss a::Pluss b::tl ->
if trackfix then eprintf "(OMD) Plus 4\n";
loop accu (Pluss(a+b+2)::tl)
| Question::Question::tl ->
if trackfix then eprintf "(OMD) Question 1\n";
loop accu (Questions 0::tl)
| Questions n::Question::tl ->
if trackfix then eprintf "(OMD) Question 2\n";
loop accu (Questions(n+1)::tl)
| Question::Questions n::tl ->
if trackfix then eprintf "(OMD) Question 3\n";
loop accu (Questions(n+1)::tl)
| Questions a::Questions b::tl ->
if trackfix then eprintf "(OMD) Question 4\n";
loop accu (Questions(a+b+2)::tl)
| Quote::Quote::tl ->
if trackfix then eprintf "(OMD) Quote 1\n";
loop accu (Quotes 0::tl)
| Quotes n::Quote::tl ->
if trackfix then eprintf "(OMD) Quote 2\n";
loop accu (Quotes(n+1)::tl)
| Quote::Quotes n::tl ->
if trackfix then eprintf "(OMD) Quote 3\n";
loop accu (Quotes(n+1)::tl)
| Quotes a::Quotes b::tl ->
if trackfix then eprintf "(OMD) Quote 4\n";
loop accu (Quotes(a+b+2)::tl)
| Semicolon::Semicolon::tl ->
if trackfix then eprintf "(OMD) Semicolon 1\n";
loop accu (Semicolons 0::tl)
| Semicolons n::Semicolon::tl ->
if trackfix then eprintf "(OMD) Semicolon 2\n";
loop accu (Semicolons(n+1)::tl)
| Semicolon::Semicolons n::tl ->
if trackfix then eprintf "(OMD) Semicolon 3\n";
loop accu (Semicolons(n+1)::tl)
| Semicolons a::Semicolons b::tl ->
if trackfix then eprintf "(OMD) Semicolon 4\n";
loop accu (Semicolons(a+b+2)::tl)
| Slash::Slash::tl ->
if trackfix then eprintf "(OMD) Slash 1\n";
loop accu (Slashs 0::tl)
| Slashs n::Slash::tl ->
if trackfix then eprintf "(OMD) Slash 2\n";
loop accu (Slashs(n+1)::tl)
| Slash::Slashs n::tl ->
if trackfix then eprintf "(OMD) Slash 3\n";
loop accu (Slashs(n+1)::tl)
| Slashs a::Slashs b::tl ->
if trackfix then eprintf "(OMD) Slash 4\n";
loop accu (Slashs(a+b+2)::tl)
| Space::Space::tl ->
if trackfix then eprintf "(OMD) Space 1\n";
loop accu (Spaces 0::tl)
| Spaces n::Space::tl ->
if trackfix then eprintf "(OMD) Space 2\n";
loop accu (Spaces(n+1)::tl)
| Space::Spaces n::tl ->
if trackfix then eprintf "(OMD) Space 3\n";
loop accu (Spaces(n+1)::tl)
| Spaces a::Spaces b::tl ->
if trackfix then eprintf "(OMD) Space 4\n";
loop accu (Spaces(a+b+2)::tl)
| Star::Star::tl ->
if trackfix then eprintf "(OMD) Star 1\n";
loop accu (Stars 0::tl)
| Stars n::Star::tl ->
if trackfix then eprintf "(OMD) Star 2\n";
loop accu (Stars(n+1)::tl)
| Star::Stars n::tl ->
if trackfix then eprintf "(OMD) Star 3\n";
loop accu (Stars(n+1)::tl)
| Stars a::Stars b::tl ->
if trackfix then eprintf "(OMD) Star 4\n";
loop accu (Stars(a+b+2)::tl)
| Tab::Tab::tl ->
if trackfix then eprintf "(OMD) Tab 1\n";
loop accu (Tabs 0::tl)
| Tabs n::Tab::tl ->
if trackfix then eprintf "(OMD) Tab 2\n";
loop accu (Tabs(n+1)::tl)
| Tab::Tabs n::tl ->
if trackfix then eprintf "(OMD) Tab 3\n";
loop accu (Tabs(n+1)::tl)
| Tabs a::Tabs b::tl ->
if trackfix then eprintf "(OMD) Tab 4\n";
loop accu (Tabs(a+b+2)::tl)
| Tilde::Tilde::tl ->
if trackfix then eprintf "(OMD) Tilde 1\n";
loop accu (Tildes 0::tl)
| Tildes n::Tilde::tl ->
if trackfix then eprintf "(OMD) Tilde 2\n";
loop accu (Tildes(n+1)::tl)
| Tilde::Tildes n::tl ->
if trackfix then eprintf "(OMD) Tilde 3\n";
loop accu (Tildes(n+1)::tl)
| Tildes a::Tildes b::tl ->
if trackfix then eprintf "(OMD) Tilde 4\n";
loop accu (Tildes(a+b+2)::tl)
| Underscore::Underscore::tl ->
if trackfix then eprintf "(OMD) Underscore 1\n";
loop accu (Underscores 0::tl)
| Underscores n::Underscore::tl ->
if trackfix then eprintf "(OMD) Underscore 2\n";
loop accu (Underscores(n+1)::tl)
| Underscore::Underscores n::tl ->
if trackfix then eprintf "(OMD) Underscore 3\n";
loop accu (Underscores(n+1)::tl)
| Underscores a::Underscores b::tl ->
if trackfix then eprintf "(OMD) Underscore 4\n";
loop accu (Underscores(a+b+2)::tl)| x::tl -> loop (x::accu) tl
| [] -> List.rev accu
in
loop [] l
(* Remove all [NL] and [Br] at the beginning. *)
let rec remove_initial_newlines = function
| [] -> []
| (NL | Br) :: tl -> remove_initial_newlines tl
| l -> l
(** - recognizes paragraphs
- glues following blockquotes *)
let make_paragraphs md =
let rec loop cp accu = function (* cp means current paragraph *)
| [] ->
let accu =
match cp with
| [] | [NL] | [Br] -> accu
| (NL|Br)::cp -> Paragraph(List.rev cp)::accu
| cp -> Paragraph(List.rev cp)::accu
in
List.rev accu
| Blockquote b1 :: Blockquote b2 :: tl ->
loop cp accu (Blockquote(b1@b2):: tl)
| Blockquote b :: tl ->
let e = Blockquote(loop [] [] b) in
(match cp with
| [] | [NL] | [Br] -> loop cp (e::accu) tl
| _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl)
| (Ulp b) :: tl ->
let e = Ulp(List.map (fun li -> loop [] [] li) b) in
(match cp with
| [] | [NL] | [Br] -> loop cp (e::accu) tl
| _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl)
| (Olp b) :: tl ->
let e = Olp(List.map (fun li -> loop [] [] li) b) in
(match cp with
| [] | [NL] | [Br] -> loop cp (e::accu) tl
| _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl)
| Html_comment _ as e :: tl ->
(match cp with
| [] -> loop [] (e::accu) tl
| [NL] | [Br] -> loop [] (e::NL::accu) tl
| _ -> loop (e::cp) accu tl)
| (Raw_block _ | Html_block _) as e :: tl ->
(match cp with
| [] | [NL] | [Br] -> loop cp (e::cp@accu) tl
| _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl)
| (Code_block _ | H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _
| Ol _ | Ul _) as e :: tl ->
(match cp with
| [] | [NL] | [Br] -> loop cp (e::accu) tl
| _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl)
| Text "\n" :: _ | Paragraph _ :: _ ->
invalid_arg "Omd_parser.make_paragraphs"
| (NL|Br) :: (NL|Br) :: tl ->
let tl = remove_initial_newlines tl in
begin match cp with
| [] | [NL] | [Br] -> loop [] (NL::NL::accu) tl
| _ -> loop [] (Paragraph(List.rev cp)::accu) tl
end
| X(x) as e :: tl ->
(* If the extension returns a block as first element,
then consider the extension as a block. However
don't take its contents as it is yet, the contents
of the extension shall be considered final as late
as possible. *)
begin match x#to_t md with
| None -> loop (e::cp) accu tl
| Some(t) ->
match t with
| ( H1 _
| H2 _
| H3 _
| H4 _
| H5 _
| H6 _
| Paragraph _
| Ul _
| Ol _
| Ulp _
| Olp _
| Code_block _
| Hr
| Html_block _
| Raw_block _
| Blockquote _
) :: _
->
(match cp with
| [] | [NL] | [Br] ->
loop cp (e::accu) tl
| _ ->
loop [] (e::Paragraph(List.rev cp)::accu) tl)
| _ ->
loop (e::cp) accu tl
end
| e::tl ->
loop (e::cp) accu tl
in
let remove_white_crumbs l =
let rec loop = function
| [] -> []
| Text " " :: tl
| NL::tl
| Br::tl
->
loop tl
| l -> l
in
List.rev (loop (List.rev l))
in
let rec clean_paragraphs =
if debug then eprintf "(OMD) clean_paragraphs\n";
function
| [] -> []
| Paragraph[]::tl -> tl
| Paragraph(p) :: tl ->
Paragraph(clean_paragraphs
(remove_initial_newlines
(remove_white_crumbs(normalise_md p))))
:: clean_paragraphs tl
| H1 v :: tl -> H1(clean_paragraphs v)
:: clean_paragraphs tl
| H2 v :: tl -> H2(clean_paragraphs v)
:: clean_paragraphs tl
| H3 v :: tl -> H3(clean_paragraphs v)
:: clean_paragraphs tl
| H4 v :: tl -> H4(clean_paragraphs v)
:: clean_paragraphs tl
| H5 v :: tl -> H5(clean_paragraphs v)
:: clean_paragraphs tl
| H6 v :: tl -> H6(clean_paragraphs v)
:: clean_paragraphs tl
| Emph v :: tl -> Emph(clean_paragraphs v)
:: clean_paragraphs tl
| Bold v :: tl -> Bold(clean_paragraphs v)
:: clean_paragraphs tl
| Ul v :: tl -> Ul(List.map clean_paragraphs v)
:: clean_paragraphs tl
| Ol v :: tl -> Ol(List.map clean_paragraphs v)
:: clean_paragraphs tl
| Ulp v :: tl -> Ulp(List.map clean_paragraphs v)
:: clean_paragraphs tl
| Olp v :: tl -> Olp(List.map clean_paragraphs v)
:: clean_paragraphs tl
| Blockquote v :: tl -> Blockquote(clean_paragraphs v)
:: clean_paragraphs tl
| Url(href,v,title) :: tl -> Url(href,(clean_paragraphs v),title)
:: clean_paragraphs tl
| Text _
| Code _
| Code_block _
| Br
| Hr
| NL
| Ref _
| Img_ref _
| Raw _
| Raw_block _
| Html _
| Html_block _
| Html_comment _
| Img _
| X _ as v :: tl -> v :: clean_paragraphs tl
in
let r = clean_paragraphs(loop [] [] md)
in
if debug then eprintf "(OMD) clean_paragraphs %S --> %S\n%!"
(Omd_backend.sexpr_of_md md)
(Omd_backend.sexpr_of_md r);
r
(** [assert_well_formed] is a developer's function that helps to
track badly constructed token lists. This function has an
effect only if [trackfix] is [true]. *)
let assert_well_formed (l:tok list) : unit =
if trackfix then
let rec equiv l1 l2 = match l1, l2 with
| [], [] -> true
| Tag _::tl1, Tag _::tl2-> equiv tl1 tl2
| e1::tl1, e2::tl2 -> e1 = e2 && equiv tl1 tl2
| _ -> false
in
assert(equiv (fix l) l);
()
(** Generate fallback for references. *)
let extract_fallback main_loop remains l =
if debug then eprintf "(OMD) Omd_parser.extract_fallback\n%!";
let rec loop accu = function
| [] -> List.rev accu
| e::tl as r ->
if r == remains then
List.rev accu
else
match e, remains with
| Cbrackets 0, Cbracket::r when tl = r ->
let accu = Word "]" :: accu in
List.rev accu
| Cbrackets n, Cbrackets m::r when m + 1 = n && tl = r ->
let accu = Word "]" :: accu in
List.rev accu
| _ ->
loop (e::accu) tl
in
let a = loop [] l in
object
method to_string = L.string_of_tokens a
method to_t = [Text(L.string_of_tokens a)]
end
let unindent_rev n lexemes =
if debug then eprintf "(OMD) CALL: Omd_parser.unindent_rev\n%!";
assert_well_formed lexemes;
let rec loop accu cl = function
| Newlines x::(Space|Spaces _)::Newlines y::tl ->
loop accu cl (Newlines(x+y+2)::tl)
| Newline::(Space|Spaces _)::Newlines x::tl ->
loop accu cl (Newlines(1+x)::tl)
| Newlines x::(Space|Spaces _)::Newline::tl ->
loop accu cl (Newlines(1+x)::tl)
| Newline::(Space|Spaces _)::Newline::tl ->
loop accu cl (Newlines(0)::tl)
| (Newline|Newlines 0 as nl)::(Space|Spaces _ as s)::(
(Number _::Dot::(Space|Spaces _)::_)
| ((Star|Plus|Minus)::(Space|Spaces _)::_)
as tl) as l ->
if n = L.length s then
loop (nl::cl@accu) [] tl
else
(cl@accu), l
| (Newline|Newlines 0 as nl)::(Space|Spaces _ as s)::tl ->
let x = L.length s - n in
loop (nl::cl@accu)
(if x > 0 then [L.make_space x] else [])
tl
| Newlines(_)::_ as l ->
(cl@accu), l
| Newline::_ as l ->
(cl@accu), l
| e::tl ->
loop accu (e::cl) tl
| [] as l ->
(cl@accu), l
in
match loop [] [] lexemes with
| [], right -> [], right
| l, right ->
assert_well_formed l;
l, right
let unindent n lexemes =
let fst, snd = unindent_rev n lexemes in
List.rev fst, snd
let rec is_blank = function
| (Space | Spaces _ | Newline | Newlines _) :: tl ->
is_blank tl
| [] -> true
| _ -> false
let semph_or_bold (n:int) (l:l) =
(* FIXME: use rpl call/return convention *)
assert_well_formed l;
assert (n>0 && n<4);
match
fsplit
~excl:(function Newlines _ :: _ -> true | _ -> false)
~f:(function
| Backslash::Star::tl ->
Continue_with([Star;Backslash],tl)
| Backslash::Stars 0::tl ->
Continue_with([Star;Backslash],Star::tl)
| Backslash::Stars n::tl ->
Continue_with([Star;Backslash],Stars(n-1)::tl)
| (Backslashs b as x)::Star::tl ->
if b mod 2 = 0 then
Continue_with([x],Star::tl)
else
Continue_with([Star;x],tl)
| (Backslashs b as x)::(Stars 0 as s)::tl ->
if b mod 2 = 0 then
Continue_with([x],s::tl)
else
Continue_with([Star;x],Star::tl)
| (Backslashs b as x)::(Stars n as s)::tl ->
if b mod 2 = 0 then
Continue_with([x],s::tl)
else
Continue_with([Star;x],Stars(n-1)::tl)
| (Space|Spaces _ as x)::(Star|Stars _ as s)::tl ->
Continue_with([s;x],tl)
| (Star|Stars _ as s)::tl ->
if L.length s = n then
Split([],tl)
else
Continue
| _ -> Continue)
l
with
| None ->
None
| Some(left,right) ->
if is_blank left then None else Some(left,right)
let sm_uemph_or_bold (n:int) (l:l) =
assert_well_formed l;
(* FIXME: use rpl call/return convention *)
assert (n>0 && n<4);
match
fsplit
~excl:(function Newlines _ :: _ -> true | _ -> false)
~f:(function
| Backslash::Underscore::tl ->
Continue_with([Underscore;Backslash],tl)
| Backslash::Underscores 0::tl ->
Continue_with([Underscore;Backslash],Underscore::tl)
| Backslash::Underscores n::tl ->
Continue_with([Underscore;Backslash],Underscores(n-1)::tl)
| (Backslashs b as x)::Underscore::tl ->
if b mod 2 = 0 then
Continue_with([x],Underscore::tl)
else
Continue_with([Underscore;x],tl)
| (Backslashs b as x)::(Underscores 0 as s)::tl ->
if b mod 2 = 0 then
Continue_with([x],s::tl)
else
Continue_with([Underscore;x],Underscore::tl)
| (Backslashs b as x)::(Underscores n as s)::tl ->
if b mod 2 = 0 then
Continue_with([x],s::tl)
else
Continue_with([Underscore;x],Underscores(n-1)::tl)
| (Space|Spaces _ as x)::(Underscore|Underscores _ as s)::tl ->
Continue_with([s;x],tl)
| (Underscore|Underscores _ as s)::tl ->
if L.length s = n then
Split([],tl)
else
Continue
| _ -> Continue)
l
with
| None ->
None
| Some(left,right) ->
if is_blank left then None else Some(left,right)
let gh_uemph_or_bold (n:int) (l:l) =
assert_well_formed l;
(* FIXME: use rpl call/return convention *)
assert (n>0 && n<4);
match
fsplit
~excl:(function Newlines _ :: _ -> true | _ -> false)
~f:(function
| Backslash::Underscore::tl ->
Continue_with([Underscore;Backslash],tl)
| Backslash::Underscores 0::tl ->
Continue_with([Underscore;Backslash],Underscore::tl)
| Backslash::Underscores n::tl ->
Continue_with([Underscore;Backslash],Underscores(n-1)::tl)
| (Backslashs b as x)::Underscore::tl ->
if b mod 2 = 0 then
Continue_with([x],Underscore::tl)
else
Continue_with([Underscore;x],tl)
| (Backslashs b as x)::(Underscores 0 as s)::tl ->
if b mod 2 = 0 then
Continue_with([x],s::tl)
else
Continue_with([Underscore;x],Underscore::tl)
| (Backslashs b as x)::(Underscores n as s)::tl ->
if b mod 2 = 0 then
Continue_with([x],s::tl)
else
Continue_with([Underscore;x],Underscores(n-1)::tl)
| (Space|Spaces _ as x)::(Underscore|Underscores _ as s)::tl ->
Continue_with([s;x],tl)
| (Underscore|Underscores _ as s)::(Word _|Number _ as w):: tl ->
Continue_with([w;s],tl)
| (Underscore|Underscores _ as s)::tl ->
if L.length s = n then
Split([],tl)
else
Continue
| _ -> Continue)
l
with
| None ->
None
| Some(left,right) ->
if is_blank left then None else Some(left,right)
let uemph_or_bold n l =
assert_well_formed l;
(* FIXME: use rpl call/return convention *)
if gh_uemph_or_bold_style then
gh_uemph_or_bold n l
else
sm_uemph_or_bold n l
let eat_blank =
eat (function |Space|Spaces _|Newline|Newlines _ -> true| _ -> false)
(* used by tag__maybe_h1 and tag__maybe_h2 *)
let setext_title main_loop (l:l) : (Omd_representation.tok list * l) option =
assert_well_formed l;
let rec detect_balanced_bqs n r l =
(* If there's a balanced (complete) backquote-started code block
then it should be "ignored", else it means the line that
follows is part of a code block, so it's not defining a
setext-style title. *)
if debug then
eprintf "(OMD) detect_balanced_bqs n=%d r=%S l=%S\n%!"
n (L.string_of_tokens r) (L.string_of_tokens l);
match l with
| [] ->
None
| (Newline|Newlines _)::_ ->
None
| Backslash::Backquote::tl ->
detect_balanced_bqs n (Backquote::Backslash::r) tl
| Backslash::Backquotes 0::tl ->
detect_balanced_bqs n (Backquote::Backslash::r) (Backquote::tl)
| Backslash::Backquotes x::tl ->
detect_balanced_bqs n (Backquote::Backslash::r) (Backquotes(x-1)::tl)
| Backslashs(m) as b::Backquote::tl when m mod 2 = 1 ->
detect_balanced_bqs n (Backquote::b::r) tl
| Backslashs(m) as b::Backquotes 0::tl when m mod 2 = 1 ->
detect_balanced_bqs n (Backquote::b::r) (Backquote::tl)
| Backslashs(m) as b::Backquotes x::tl when m mod 2 = 1 ->
detect_balanced_bqs n (Backquote::b::r) (Backquotes(x-1)::tl)
| (Backquote as b)::tl when n = 1 ->
Some(List.rev (b::r), tl)
| (Backquotes x as b)::tl when n = x+2 ->
Some(List.rev (b::r), tl)
| e::tl ->
detect_balanced_bqs n (e::r) tl
in
let rec loop r = function
| [] ->
if r = [] then
None
else
Some(List.rev r, [])
| Backslash::Backquote::tl ->
loop (Backquote::Backslash::r) tl
| Backslashs(m) as b::Backquote::tl when m mod 2 = 1 ->
loop (Backquote::b::r) tl
| Backslash::Backquotes 0::tl ->
loop (Backquote::Backslash::r) (Backquote::tl)
| Backslash::Backquotes x::tl ->
loop (Backquote::Backslash::r) (Backquotes(x-1)::tl)
| Backslashs(m) as b::Backquotes 0::tl when m mod 2 = 1 ->
loop (Backquote::b::r) (Backquote::tl)
| Backslashs(m) as b::Backquotes x::tl when m mod 2 = 1 ->
loop (Backquote::b::r) (Backquotes(x-1)::tl)
| Backquote::tl ->
begin match detect_balanced_bqs 1 [] tl with
| Some(bl,tl) -> loop (bl@r) tl
| _ -> None
end
| Backquotes(x)::tl ->
begin match detect_balanced_bqs (x+2) [] tl with
| Some(bl,tl) -> loop (bl@r) tl
| _ -> None
end
| Newline::(Equal|Equals _|Minus|Minuss _)::tl ->
if r = [] then
None
else
Some(List.rev r, tl)
| (Newline|Newlines _)::_ ->
if debug then
eprintf "(OMD) Omd_parser.setext_title is wrongly used!\n%!";
None
| e::tl ->
loop (e::r) tl
in
if match l with
| Lessthan::Word _::_ ->
begin match main_loop [] [] l with
| (Html_block _ | Code_block _ | Raw_block _)::_ ->
true
| _ ->
false
end
| _ -> false
then
None
else
let result = loop [] l in
if debug then
eprintf "(OMD) setext_title l=%S result=%S,%S\n%!"
(L.string_of_tokens l)
(match result with
| None -> ""
| Some (x,tl) -> L.string_of_tokens x)
(match result with
| None -> ""
| Some (x,tl) -> L.string_of_tokens tl);
result
let tag__maybe_h1 (main_loop:main_loop) =
Tag("tag__maybe_h1",
object
method parser_extension r p l =
match p with
| ([]|[Newline|Newlines _]) ->
begin match setext_title main_loop l with
| None ->
None
| Some(title, tl) ->
let title = H1(main_loop [] [] title) in
Some((title::r), [Newline], tl)
end
| _ ->
if debug then
eprintf "(OMD) Warning: Omd_parser.tag__maybe_h1 is wrongly \
used (p=%S)!\n"
(L.string_of_tokens p);
None
method to_string = ""
end
)
let tag__maybe_h2 (main_loop:main_loop) =
Tag("tag__maybe_h2",
object
method parser_extension r p l =
match p with
| ([]|[Newline|Newlines _]) ->
begin match setext_title main_loop l with
| None ->
None
| Some(title, tl) ->
let title = H2(main_loop [] [] title) in
Some((title::r), [Newline], tl)
end
| _ ->
if debug then
eprintf "(OMD) Warning: Omd_parser.tag__maybe_h2 is wrongly \
used (p=%S)!\n"
(L.string_of_tokens p);
None
method to_string = ""
end
)
let tag__md md = (* [md] should be in reverse *)
Tag("tag__md",
object
method parser_extension r p l = Some(md@r, [], l)
method to_string = ""
end
)
(* Let's tag the lines that *might* be titles using setext-style.
"might" because if they are, for instance, in a code section,
then they are not titles at all. *)
let tag_setext main_loop lexemes =
assert_well_formed lexemes;
let rec loop pl res = function
| [] | [Newline|Newlines _] ->
pl@res
| (Newline as e1)::(Equal|Equals _ as e2)::tl -> (* might be a H1. *)
begin
match
fsplit_rev
~f:(function
| (Space|Spaces _|Equal|Equals _)::tl -> Continue
| [] -> Split([],[])
| _::_ as l -> Split([], l))
tl
with
| Some(rleft, (([]|(Newline|Newlines _)::_) as right)) ->
loop [] (rleft@(e2::e1::pl@tag__maybe_h1 main_loop::res)) right
| Some(rleft, right) ->
loop [] (rleft@(e2::e1::pl@res)) right
| None ->
loop [] (e2::e1::pl@res) []
end
| (Newline as e1)::(Minus|Minuss _ as e2)::tl -> (* might be a H2. *)
begin
match
fsplit_rev
~f:(function
| (Space|Spaces _|Minus|Minuss _)::tl -> Continue
| [] -> Split([],[])
| _::_ as l -> Split([], l))
tl
with
| Some(rleft, (([]|(Newline|Newlines _)::_) as right)) ->
loop [] (rleft@(e2::e1::pl@tag__maybe_h2 main_loop::res)) right
| Some(rleft, right) ->
loop [] (rleft@(e2::e1::pl@res)) right
| None ->
loop [] (e2::e1::pl@res) []
end
| (Newline | Newlines _ as e1)::tl ->
loop [] (e1::pl@res) tl
| e::tl ->
loop (e::pl) res tl
in
List.rev (loop [] [] lexemes)
let hr_m l =
assert_well_formed l;
let rec loop n = function
| ((Newlines _|Newline)::tl) | ([] as tl) ->
if n >= 3 then Some tl else None
| (Space|Spaces _)::tl ->
loop n tl
| Minus::tl ->
loop (n+1) tl
| Minuss x::tl ->
loop (x+2+n) tl
| _::_ ->
None
in loop 0 l
let hr_s l =
assert_well_formed l;
let rec loop n = function
| ((Newline|Newlines _)::tl) | ([] as tl) ->
if n >= 3 then Some tl else None
| (Space|Spaces _)::tl ->
loop n tl
| Star::tl ->
loop (n+1) tl
| Stars x::tl ->
loop (x+2+n) tl
| _::_ ->
None
in loop 0 l
let hr l =
match hr_m l with
| None -> hr_s l
| Some _ as tl -> tl
(** [bcode] parses code that's delimited by backquote(s) *)
let bcode ?(default_lang=default_lang) r p l =
assert_well_formed l;
let e, tl =
match l with
| (Backquote|Backquotes _ as e)::tl -> e, tl
| _ -> failwith "Omd_parser.bcode is wrongly called"
in
let rec code_block accu = function
| [] ->
None
| Backquote::tl ->
if e = Backquote then
match accu with
| Newline::accu ->
Some(List.rev accu, tl)
| _ ->
Some(List.rev accu, tl)
else
code_block (Backquote::accu) tl
| (Backquotes n as b)::tl ->
if e = b then
match accu with
| Newline::accu ->
Some(List.rev accu, tl)
| _ ->
Some(List.rev accu, tl)
else
code_block (b::accu) tl
| Tag(_, _)::tl ->
code_block accu tl
| e::tl ->
code_block (e::accu) tl
in
match code_block [] tl with
| None -> None
| Some(cb, l) ->
if List.exists (function (Newline|Newlines _) -> true | _ -> false) cb
&& (match p with []|[Newline|Newlines _] -> true | _ -> false)
&& (match e with Backquotes n when n > 0 -> true | _ -> false)
then
match cb with
| Word lang :: (Space|Spaces _) :: Newline :: tl
| Word lang :: Newline :: tl ->
let code = L.string_of_tokens tl in
Some(Code_block(lang, code) :: r, [Backquote], l)
| Word lang :: (Space|Spaces _) :: Newlines 0 :: tl
| Word lang :: Newlines 0 :: tl ->
let code = L.string_of_tokens(Newline::tl) in
Some(Code_block(lang, code) :: r, [Backquote], l)
| Word lang :: (Space|Spaces _) :: Newlines n :: tl
| Word lang :: Newlines n :: tl ->
let code = L.string_of_tokens (Newlines(n-1)::tl) in
Some(Code_block(lang, code) :: r, [Backquote], l)
| Newline :: tl ->
let code = L.string_of_tokens tl in
Some(Code_block(default_lang, code) :: r, [Backquote], l)
| _ ->
let code = L.string_of_tokens cb in
Some(Code_block(default_lang, code) :: r, [Backquote], l)
else
let clean_bcode s =
let rec loop1 i =
if i = String.length s then 0
else match s.[i] with
| ' ' -> loop1(i+1)
| _ -> i
in
let rec loop2 i =
if i = -1 then String.length s
else match s.[i] with
| ' ' -> loop2(i-1)
| _ -> i+1
in
match loop1 0, loop2 (String.length s - 1) with
| 0, n when n = String.length s - 1 -> s
| i, n -> String.sub s i (n-i)
in
let code = L.string_of_tokens cb in
if debug then
eprintf "(OMD) clean_bcode %S => %S\n%!" code (clean_bcode code);
Some(Code(default_lang, clean_bcode code) :: r, [Backquote], l)
exception NL_exception
exception Premature_ending
(* !!DO NOT DELETE THIS!!
The program that generates the generated part that follows right after.
List.iter (fun (a,b,c) ->
print_endline ("let read_until_"^a^" ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: ("^b^" as b) :: tl ->
loop (b::accu) n tl
| Backslash :: ("^b^"s 0) :: tl ->
loop ("^b^"::accu) n ("^b^"::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl"
^(if c<>"" then "
| Backslash :: ("^c^" as b) :: tl ->
loop (b::accu) n tl
| Backslash :: ("^c^"s 0) :: tl ->
loop ("^c^"::accu) n ("^c^"::tl)
| "^c^" as e :: tl ->
loop (e::accu) (n+1) tl
| "^c^"s x as e :: tl ->
loop (e::accu) (n+x+2) tl
" else "")^
" | "^b^" as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| "^b^"s 0 :: tl ->
if n = 0 then
List.rev accu, "^b^"::tl
else
loop ("^b^"::accu) (n-1) ("^b^"::tl)
| "^b^"s x :: tl ->
if n = 0 then
List.rev accu, "^b^"s(x-1)::tl
else
loop
(match accu with
| "^b^"::accu -> "^b^"s(0)::accu
| "^b^"s x::accu -> "^b^"s(x+1)::accu
| _ -> "^b^"::accu)
(n-1)
("^b^"s(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf \"Omd_parser.read_until_"^a^" %S bq=%b no_nl=%b\\n%!\" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf \"Omd_parser.read_until_"^a^" %S bq=%b no_nl=%b => %S\\n%!\" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
"))
[ "gt", "Greaterthan", "Lessthan";
"lt", "Lessthan", "";
"cparenth", "Cparenthesis", "Oparenthesis";
"oparenth", "Oparenthesis", "";
"dq", "Doublequote", "";
"q", "Quote", "";
"obracket", "Obracket", "";
"cbracket", "Cbracket", "Obracket";
"space", "Space", "";
]
*)
(* begin generated part *)
let read_until_gt ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Greaterthan as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Greaterthans 0) :: tl ->
loop (Greaterthan::accu) n (Greaterthan::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl
| Backslash :: (Lessthan as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Lessthans 0) :: tl ->
loop (Lessthan::accu) n (Lessthan::tl)
| Lessthan as e :: tl ->
loop (e::accu) (n+1) tl
| Lessthans x as e :: tl ->
loop (e::accu) (n+x+2) tl
| Greaterthan as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Greaterthans 0 :: tl ->
if n = 0 then
List.rev accu, Greaterthan::tl
else
loop (Greaterthan::accu) (n-1) (Greaterthan::tl)
| Greaterthans x :: tl ->
if n = 0 then
List.rev accu, Greaterthans(x-1)::tl
else
loop
(match accu with
| Greaterthan::accu -> Greaterthans(0)::accu
| Greaterthans x::accu -> Greaterthans(x+1)::accu
| _ -> Greaterthan::accu)
(n-1)
(Greaterthans(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_gt %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_gt %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_lt ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Lessthan as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Lessthans 0) :: tl ->
loop (Lessthan::accu) n (Lessthan::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl | Lessthan as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Lessthans 0 :: tl ->
if n = 0 then
List.rev accu, Lessthan::tl
else
loop (Lessthan::accu) (n-1) (Lessthan::tl)
| Lessthans x :: tl ->
if n = 0 then
List.rev accu, Lessthans(x-1)::tl
else
loop
(match accu with
| Lessthan::accu -> Lessthans(0)::accu
| Lessthans x::accu -> Lessthans(x+1)::accu
| _ -> Lessthan::accu)
(n-1)
(Lessthans(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_lt %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_lt %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_cparenth ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Cparenthesis as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Cparenthesiss 0) :: tl ->
loop (Cparenthesis::accu) n (Cparenthesis::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl
| Backslash :: (Oparenthesis as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Oparenthesiss 0) :: tl ->
loop (Oparenthesis::accu) n (Oparenthesis::tl)
| Oparenthesis as e :: tl ->
loop (e::accu) (n+1) tl
| Oparenthesiss x as e :: tl ->
loop (e::accu) (n+x+2) tl
| Cparenthesis as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Cparenthesiss 0 :: tl ->
if n = 0 then
List.rev accu, Cparenthesis::tl
else
loop (Cparenthesis::accu) (n-1) (Cparenthesis::tl)
| Cparenthesiss x :: tl ->
if n = 0 then
List.rev accu, Cparenthesiss(x-1)::tl
else
loop
(match accu with
| Cparenthesis::accu -> Cparenthesiss(0)::accu
| Cparenthesiss x::accu -> Cparenthesiss(x+1)::accu
| _ -> Cparenthesis::accu)
(n-1)
(Cparenthesiss(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_cparenth %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_cparenth %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_oparenth ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Oparenthesis as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Oparenthesiss 0) :: tl ->
loop (Oparenthesis::accu) n (Oparenthesis::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl | Oparenthesis as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Oparenthesiss 0 :: tl ->
if n = 0 then
List.rev accu, Oparenthesis::tl
else
loop (Oparenthesis::accu) (n-1) (Oparenthesis::tl)
| Oparenthesiss x :: tl ->
if n = 0 then
List.rev accu, Oparenthesiss(x-1)::tl
else
loop
(match accu with
| Oparenthesis::accu -> Oparenthesiss(0)::accu
| Oparenthesiss x::accu -> Oparenthesiss(x+1)::accu
| _ -> Oparenthesis::accu)
(n-1)
(Oparenthesiss(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_oparenth %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_oparenth %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_dq ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Doublequote as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Doublequotes 0) :: tl ->
loop (Doublequote::accu) n (Doublequote::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl | Doublequote as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Doublequotes 0 :: tl ->
if n = 0 then
List.rev accu, Doublequote::tl
else
loop (Doublequote::accu) (n-1) (Doublequote::tl)
| Doublequotes x :: tl ->
if n = 0 then
List.rev accu, Doublequotes(x-1)::tl
else
loop
(match accu with
| Doublequote::accu -> Doublequotes(0)::accu
| Doublequotes x::accu -> Doublequotes(x+1)::accu
| _ -> Doublequote::accu)
(n-1)
(Doublequotes(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_dq %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_dq %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_q ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Quote as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Quotes 0) :: tl ->
loop (Quote::accu) n (Quote::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl | Quote as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Quotes 0 :: tl ->
if n = 0 then
List.rev accu, Quote::tl
else
loop (Quote::accu) (n-1) (Quote::tl)
| Quotes x :: tl ->
if n = 0 then
List.rev accu, Quotes(x-1)::tl
else
loop
(match accu with
| Quote::accu -> Quotes(0)::accu
| Quotes x::accu -> Quotes(x+1)::accu
| _ -> Quote::accu)
(n-1)
(Quotes(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_q %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_q %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_obracket ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Obracket as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Obrackets 0) :: tl ->
loop (Obracket::accu) n (Obracket::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl | Obracket as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Obrackets 0 :: tl ->
if n = 0 then
List.rev accu, Obracket::tl
else
loop (Obracket::accu) (n-1) (Obracket::tl)
| Obrackets x :: tl ->
if n = 0 then
List.rev accu, Obrackets(x-1)::tl
else
loop
(match accu with
| Obracket::accu -> Obrackets(0)::accu
| Obrackets x::accu -> Obrackets(x+1)::accu
| _ -> Obracket::accu)
(n-1)
(Obrackets(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_obracket %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_obracket %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_cbracket ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Cbracket as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Cbrackets 0) :: tl ->
loop (Cbracket::accu) n (Cbracket::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl
| Backslash :: (Obracket as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Obrackets 0) :: tl ->
loop (Obracket::accu) n (Obracket::tl)
| Obracket as e :: tl ->
loop (e::accu) (n+1) tl
| Obrackets x as e :: tl ->
loop (e::accu) (n+x+2) tl
| Cbracket as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Cbrackets 0 :: tl ->
if n = 0 then
List.rev accu, Cbracket::tl
else
loop (Cbracket::accu) (n-1) (Cbracket::tl)
| Cbrackets x :: tl ->
if n = 0 then
List.rev accu, Cbrackets(x-1)::tl
else
loop
(match accu with
| Cbracket::accu -> Cbrackets(0)::accu
| Cbrackets x::accu -> Cbrackets(x+1)::accu
| _ -> Cbracket::accu)
(n-1)
(Cbrackets(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_cbracket %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_cbracket %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
let read_until_space ?(bq=false) ?(no_nl=false) l =
assert_well_formed l;
let rec loop accu n = function
| Backslash :: (Space as b) :: tl ->
loop (b::accu) n tl
| Backslash :: (Spaces 0) :: tl ->
loop (Space::accu) n (Space::tl)
| Backslashs 0 :: tl ->
loop (Backslash::accu) n tl
| Backslashs 1 :: tl ->
loop (Backslash::accu) n (Backslash::tl)
| Backslashs 2 :: tl ->
loop (Backslashs 0::accu) n tl
| (Backslashs x) :: tl ->
if x mod 2 = 0 then
loop (Backslashs(x/2-1)::accu) n tl
else
loop (Backslashs(x/2-1)::accu) n (Backslash::tl)
| (Backquote|Backquotes _ as e)::tl as l ->
if bq then
match bcode [] [] l with
| None -> loop (e::accu) n tl
| Some (r, _, tl) ->
loop (* not very pretty kind of hack *)
(List.rev(L.lex(Omd_backend.markdown_of_md r))@accu)
n
tl
else
loop (e::accu) n tl | Space as e :: tl ->
if n = 0 then
List.rev accu, tl
else
loop (e::accu) (n-1) tl
| Spaces 0 :: tl ->
if n = 0 then
List.rev accu, Space::tl
else
loop (Space::accu) (n-1) (Space::tl)
| Spaces x :: tl ->
if n = 0 then
List.rev accu, Spaces(x-1)::tl
else
loop
(match accu with
| Space::accu -> Spaces(0)::accu
| Spaces x::accu -> Spaces(x+1)::accu
| _ -> Space::accu)
(n-1)
(Spaces(x-1)::tl)
| (Newline|Newlines _ as e)::tl ->
if no_nl then
raise NL_exception
else
loop (e::accu) n tl
| e::tl ->
loop (e::accu) n tl
| [] ->
raise Premature_ending
in
if debug then
eprintf "Omd_parser.read_until_space %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl;
let res = loop [] 0 l in
if debug then
eprintf "Omd_parser.read_until_space %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res));
res
(* /end generated part *)
let read_until_newline l =
assert_well_formed l;
let rec loop accu n =
function
| ((Backslash as a)) :: ((Newline as b)) :: tl ->
loop (b :: a :: accu) n tl
| Backslash :: Newlines 0 :: tl ->
loop (Newline :: Backslash :: accu) n (Newline :: tl)
| ((Backslashs 0 as e)) :: tl -> loop (e :: accu) n tl
| ((Backslashs x as e)) :: tl ->
if (x mod 2) = 0
then loop (e :: accu) n tl
else loop ((Backslashs (x - 1)) :: accu) n (Backslash :: tl)
| ((Newline as e)) :: tl ->
if n = 0 then ((List.rev accu), tl) else loop (e :: accu) (n - 1) tl
| Newlines 0 :: tl ->
if n = 0
then ((List.rev accu), (Newline :: tl))
else loop (Newline :: accu) (n - 1) (Newline :: tl)
| Newlines n :: tl -> ((List.rev accu), ((Newlines (n - 1)) :: tl))
| e :: tl -> loop (e :: accu) n tl
| [] -> raise Premature_ending
in loop [] 0 l
(* H1, H2, H3, ... *)
let read_title (main_loop:main_loop) n r _previous lexemes =
let title, rest =
let rec loop accu = function
| Backslash::Hash::tl ->
loop (Hash::Backslash::accu) tl
| Backslashs(n)::Hash::tl when n mod 2 = 1 ->
loop (Hash::Backslashs(n-1)::accu) tl
| Backslash::Hashs(h)::tl ->
begin match tl with
| []
| (Space|Spaces _)::(Newline|Newlines _)::_
| (Newline|Newlines _)::_ ->
loop (Hash::Backslash::accu)
((if h = 0 then Hash else Hashs(h-1))::tl)
| _ ->
loop (Hashs(h)::Backslash::accu) tl
end
| Backslashs(n)::Hashs(h)::tl when n mod 2 = 1 ->
begin match tl with
| []
| (Space|Spaces _)::(Newline|Newlines _)::_
| (Newline|Newlines _)::_ ->
loop (Hash::Backslashs(n)::accu)
((if h = 0 then Hash else Hashs(h-1))::tl)
| _ ->
loop (Hashs(h)::Backslashs(n)::accu) tl
end
| (Hash|Hashs _) :: ((Newline|Newlines _) :: _ as l)
| (Hash|Hashs _) :: (Space|Spaces _) :: ((Newline|Newlines _)::_ as l)
| ((Newline|Newlines _) :: _ as l)
| ([] as l)
| (Space|Spaces _) :: (Hash|Hashs _) :: ((Newline|Newlines _) :: _ as l)
| (Space|Spaces _) :: (Hash|Hashs _) :: (Space|Spaces _)
:: ((Newline|Newlines _)::_ as l)
| (Space|Spaces _) :: ((Newline|Newlines _) :: _ as l)
| (Space|Spaces _) :: ([] as l) ->
main_loop [] [] (List.rev accu), l
| [Hash|Hashs _]
| [(Space|Spaces _); Hash|Hashs _]
| [(Space|Spaces _); (Hash|Hashs _); (Space|Spaces _)] ->
main_loop [] [] (List.rev accu), []
| x::tl ->
loop (x::accu) tl
in
loop [] lexemes
in
match n with
| 1 -> Some(H1 title :: r, [Newline], rest)
| 2 -> Some(H2 title :: r, [Newline], rest)
| 3 -> Some(H3 title :: r, [Newline], rest)
| 4 -> Some(H4 title :: r, [Newline], rest)
| 5 -> Some(H5 title :: r, [Newline], rest)
| 6 -> Some(H6 title :: r, [Newline], rest)
| _ -> None
let maybe_extension extensions r p l =
match extensions with
| [] -> None
| _ ->
List.fold_left
(function
| None ->
(fun f -> f#parser_extension r p l)
| Some(nr, np, nl) as e ->
(fun f -> match f#parser_extension nr np nl with
| None -> e
| Some _ as k -> k)
)
None
extensions
(* blockquotes *)
let emailstyle_quoting (main_loop:main_loop) r _p lexemes =
assert_well_formed lexemes;
let rec loop block cl =
function
| Newline::Greaterthan::(Newline::_ as tl) ->
loop (Newline::cl@block) [] tl
| Newline::Greaterthan::Space::tl ->
loop (Newline::cl@block) [] tl
| Newline::Greaterthan::Spaces 0::tl ->
loop (Newline::cl@block) [Space] tl
| Newline::Greaterthan::Spaces n::tl ->
assert(n>0);
loop (Newline::cl@block) [Spaces(n-1)] tl
(* multi paragraph blockquotes with empty lines *)
| Newlines 0::Greaterthan::Space::tl ->
loop (Newlines 0::cl@block) [] tl
| Newlines 0::Greaterthan::Spaces 0::tl ->
loop (Newlines 0::cl@block) [Space] tl
| Newlines 0::Greaterthan::Spaces n::tl ->
assert(n>0);
loop (Newlines 0::cl@block) [Spaces(n-1)] tl
| (Newlines _::_ as l) | ([] as l) -> fix(List.rev(cl@block)), l
| e::tl -> loop block (e::cl) tl
in
match loop [] [] lexemes with
| (Newline|Newlines _)::block, tl ->
if debug then
eprintf "(OMD) Omd_parser.emailstyle_quoting %S\n%!"
(L.string_of_tokens block);
Some((Blockquote(main_loop [] [] block)::r), [Newline], tl)
| _ ->
None
(* maybe a reference *)
let maybe_reference (main_loop:main_loop) rc r _p l =
assert_well_formed l;
(* this function is called when we know it's not a link although
it started with a '[' *)
(* So it could be a reference or a link definition. *)
let rec maybe_ref l =
let text, remains = read_until_cbracket ~bq:true l in
(* check that there is no ill-placed open bracket *)
if (try ignore(read_until_obracket ~bq:true text); true
with Premature_ending -> false) then
raise Premature_ending; (* <-- ill-placed open bracket *)
let blank, remains = read_until_obracket ~bq:true remains in
(* check that there are no unwanted characters between CB and OB. *)
if eat (let flag = ref true in
function (* allow only a space, multiple spaces, or a newline *)
| Newline -> !flag && (flag := false; true)
| (Space|Spaces _) -> !flag && (flag := false; true)
| _ -> false) blank <> [] then
raise Premature_ending (* <-- not a regular reference *)
else
match read_until_cbracket ~bq:true remains with
| [], remains ->
let fallback = extract_fallback main_loop remains (Obracket::l) in
let id = L.string_of_tokens text in (* implicit anchor *)
Some(((Ref(rc, id, id, fallback))::r), [Cbracket], remains)
| id, remains ->
let fallback = extract_fallback main_loop remains (Obracket::l) in
Some(((Ref(rc, L.string_of_tokens id,
L.string_of_tokens text, fallback))::r),
[Cbracket], remains)
in
let rec maybe_nonregular_ref l =
let text, remains = read_until_cbracket ~bq:true l in
(* check that there is no ill-placed open bracket *)
if (try ignore(read_until_obracket ~bq:true text); true
with Premature_ending -> false) then
raise Premature_ending; (* <-- ill-placed open bracket *)
let fallback = extract_fallback main_loop remains (Obracket::l) in
let id = L.string_of_tokens text in (* implicit anchor *)
Some(((Ref(rc, id, id, fallback))::r), [Cbracket], remains)
in
let rec maybe_def l =
match read_until_cbracket ~bq:true l with
| _, [] -> raise Premature_ending
| id, (Colon::(Space|Spaces _)::remains)
| id, (Colon::remains) ->
begin
match
fsplit
~f:(function
| (Space|Spaces _|Newline|Newlines _):: _ as l -> Split([], l)
| e::tl -> Continue
| [] -> Split([],[]))
remains
with
| None | Some([], _) -> raise Premature_ending
| Some(url, remains) ->
let title, remains =
match
eat
(function | (Space|Spaces _|Newline|Newlines _) -> true
| _ -> false)
remains
with
| Doublequotes(0)::tl -> [], tl
| Doublequote::tl -> read_until_dq ~bq:true tl
| Quotes(0)::tl -> [], tl
| Quote::tl -> read_until_q ~bq:true tl
| Oparenthesis::tl-> read_until_cparenth ~bq:true tl
| l -> [], l
in
let url =
let url = L.string_of_tokens url in
if String.length url > 2 && url.[0] = '<'
&& url.[String.length url - 1] = '>' then
String.sub url 1 (String.length url - 2)
else
url
in
rc#add_ref (L.string_of_tokens id) (L.string_of_tokens title) url;
Some(r, [Newline], remains)
end
| _ -> raise Premature_ending
in
try
maybe_ref l
with | Premature_ending | NL_exception ->
try
maybe_def l
with
| Premature_ending | NL_exception ->
try
maybe_nonregular_ref l
with
| Premature_ending | NL_exception ->
None
(** maybe a link *)
let maybe_link (main_loop:main_loop) r _p l =
if debug then eprintf "(OMD) # maybe_link\n";
assert_well_formed l;
let read_url name l =
if debug then
eprintf "(OMD) # maybe_link>read_url %S\n" (L.string_of_tokens l);
try
let l_cp, r_cp =
read_until_cparenth ~no_nl:true ~bq:false l
in
if debug then eprintf "(OMD) maybe_link >> l_cp=%S r_cp=%S\n%!"
(L.string_of_tokens l_cp)
(L.string_of_tokens r_cp);
try
let l_dq, r_dq =
read_until_dq ~no_nl:true ~bq:false l
in
if debug then eprintf "(OMD) maybe_link >> l_dq=%S r_dq=%S\n%!"
(L.string_of_tokens l_dq)
(L.string_of_tokens r_dq);
(* maybe title *)
if List.length l_cp > List.length l_dq then (* title *)
begin
if debug then eprintf "(OMD) maybe_link >> title\n%!";
let url =
match List.rev l_dq with
| (Newline|Space|Spaces _)::(Newline|Space|Spaces _)::tl
| (Newline|Space|Spaces _)::tl ->
L.string_of_tokens (List.rev tl)
| _ ->
L.string_of_tokens l_dq
in
let title, rest = read_until_dq ~no_nl:false ~bq:false r_dq in
let rest = snd(read_until_cparenth rest) in
let title = L.string_of_tokens title in
Some(Url(url, name, title) :: r, [Cparenthesis], rest)
end
else (* no title *)
raise Premature_ending
with NL_exception | Premature_ending -> (* no title *)
begin
if debug then eprintf "(OMD) maybe_link >> no title\n%!";
let url = match List.rev l_cp with
| (Newline|Space|Spaces _)::(Newline|Space|Spaces _)::tl
| (Newline|Space|Spaces _)::tl -> List.rev tl
| _ -> l_cp
in
let title, rest = [], r_cp in
let url = L.string_of_tokens url in
let title = L.string_of_tokens title in
Some(Url(url, name, title) :: r, [Cparenthesis], rest)
end
with NL_exception | Premature_ending ->
None
in
let read_name l =
(* it's not really the "name" of a URL but what
corresponds to the inner HTML of an HTML 'A' tag *)
if debug then eprintf "(OMD) # maybe_link> read_name\n";
try
match read_until_cbracket ~bq:true l with
| name, (Oparenthesis::tl) ->
read_url (main_loop [] [Obracket] name) (eat_blank tl)
| name, (Oparenthesiss 0::tl) ->
read_url (main_loop [] [Obracket] name) (Oparenthesis::tl)
| name, (Oparenthesiss n::tl) ->
read_url (main_loop [] [Obracket] name) (Oparenthesiss(n-1)::tl)
| _ ->
None
with Premature_ending | NL_exception -> None
in
read_name l
let has_paragraphs l =
(* Has at least 2 consecutive newlines. *)
List.exists (function Newlines _ -> true | _ -> false) l
let parse_list (main_loop:main_loop) r _p l =
assert_well_formed l;
if debug then begin
eprintf "(OMD) parse_list r=(%s) p=(%s) l=(%s)\n%!"
"" (* (Omd_backend.sexpr_of_md (List.rev r)) *)
"" (* (destring_of_tl p) *)
(L.destring_of_tokens ~limit:40 l);
end;
let module UO = struct type ordered = O | U end in
let open UO in
if debug then
eprintf "(OMD) parse_list: l=(%s)\n%!" (L.destring_of_tokens l);
let end_of_item (indent:int) l : tok split_action = match l with
| [] ->
Split([],[])
| Newlines 0 :: ((Spaces n) :: Greaterthan :: (Space | Spaces _) :: tl
as s) ->
assert(n>=0);
if n+2 = indent+4 then (* blockquote *)
match unindent (n+2) (Newline::s) with
| Newline::block, rest ->
Continue_with(List.rev(Newlines(1)::block), rest)
| Newlines n::block, rest ->
Continue_with(List.rev(Newlines(n+2)::block), rest)
| block, rest ->
Continue_with(Newlines 0::block, rest)
else if n+2 >= indent+8 then (* code inside item *)
match unindent (indent+4) (Newline::s) with
| Newline::block, rest ->
Continue_with(List.rev(Newlines(1)::block), rest)
| Newlines n::block, rest ->
Continue_with(List.rev(Newlines(n+2)::block), rest)
| block, rest ->
Continue_with(Newlines 0::block, rest)
else
Split([], l)
| Newlines 0 :: (Spaces n :: tl as s) ->
assert(n>=0);
if n+2 >= indent+8 then (* code inside item *)
match unindent (indent+4) (Newline::s) with
| Newline::block, rest ->
Continue_with(List.rev(Newlines(0)::block), rest)
| Newlines n::block, rest ->
Continue_with(List.rev(Newlines(n+1)::block), rest)
| block, rest ->
Continue_with(Newline::block, rest)
else if n+2 >= indent+4 then (* new paragraph inside item *)
match unindent (indent+4) (Newline::s) with
| Newline::block, rest ->
Continue_with(List.rev(Newlines(1)::block), rest)
| Newlines n::block, rest ->
Continue_with(List.rev(Newlines(n+2)::block), rest)
| block, rest ->
Continue_with(Newlines 0::block, rest)
else
Split([], l)
| (Newlines _) :: _ -> (* n > 0 *)
(* End of item, stop *)
Split([], l)
| Newline ::
(
((Space|Spaces _) :: (Star|Minus|Plus) :: (Space|Spaces _):: _)
| ((Space|Spaces _) :: Number _ :: Dot :: (Space|Spaces _) :: _)
| ((Star|Minus|Plus) :: (Space|Spaces _):: _)
| (Number _ :: Dot :: (Space|Spaces _) :: _)
as tl) ->
Split([Newline], tl)
| Newline :: (Space | Spaces _) :: Newline :: tl ->
(* A line with spaces shouldn't interfere here,
which is about exactly 2 consecutive newlines,
so we rewrite the head of the lexing stream. *)
Continue_with([], Newlines 0 :: tl)
| Newline :: (Space | Spaces _) :: (Newlines _) :: _ ->
(* A line with spaces shouldn't interfere here,
which is about at least 3 consecutive newlines,
so we stop. *)
Split([], l)
| Newline :: (Spaces _ as s) :: tl ->
Continue_with
([s;
Tag("parse_list/remember spaces",
object
method parser_extension r p =
function Spaces _::tl -> Some(r,p,Space::tl)
| _ -> None
method to_string = ""
end);
Newline],
tl)
| Newline :: (Space as s) :: tl ->
Continue_with
([s;
Tag("parse_list/remember space",
object
method parser_extension r p =
function (Space|Spaces _)::tl -> Some(r,p,Space::tl)
| _ -> None
method to_string = ""
end);
Newline],
tl)
| _::_ ->
Continue
in
let rev_to_t l =
assert_well_formed l;
(* Newlines at the end of items have no meaning (except to end the
item which is expressed by the constructor already). *)
let l = match l with (Newline | Newlines _) :: tl -> tl | _ -> l in
main_loop [] [Newline] (List.rev l)
in
let add (sublist:element) items =
if debug then eprintf "(OMD) add\n%!";
match items with
| [] -> assert false
| (O,indents,item)::tl ->
(O,indents,(item@[sublist]))::tl
| (U,indents,item)::tl ->
(U,indents,(item@[sublist]))::tl
in
let make_up ~p items : Omd_representation.element =
if debug then eprintf "(OMD) make_up p=%b\n%!" p;
let items = List.rev items in
match items with
| (U,_,item)::_ ->
if p then
Ulp(List.map (fun (_,_,i) -> i) items)
else
Ul(List.map (fun (_,_,i) -> i) items)
| (O,_,item)::_ ->
if p then
Olp(List.map (fun (_,_,i) -> i) items)
else
Ol(List.map (fun (_,_,i) -> i) items)
| [] ->
failwith "make_up called with []" (* assert false *)
in
let rec list_items ~p indents items l =
if debug then eprintf "(OMD) list_items: p=%b l=(%s)\n%!"
p (L.destring_of_tokens l);
match l with
(* no more list items *)
| [] ->
make_up p items, l
(* more list items *)
(* new unordered items *)
| (Star|Minus|Plus)::(Space|Spaces _)::tl ->
begin
match fsplit_rev ~f:(end_of_item 0) tl with
| None ->
make_up p items, l
| Some(new_item, rest) ->
let p = p || has_paragraphs new_item in
if debug then
eprintf "(OMD) (2346) new_item=%S\n%!"
(L.destring_of_tokens new_item);
match indents with
| [] ->
assert(items = []);
list_items ~p [0] ((U,[0], rev_to_t new_item)::items) rest
| 0::_ ->
list_items ~p indents ((U,indents,rev_to_t new_item)::items) rest
| _::_ ->
make_up p items, l
end
| Space::(Star|Minus|Plus)::(Space|Spaces _)::tl ->
begin
match fsplit_rev ~f:(end_of_item 1) tl with
| None -> make_up p items, l
| Some(new_item, rest) ->
let p = p || has_paragraphs new_item in
match indents with
| [] ->
assert(items = []);
list_items ~p [1] ((U,[1],rev_to_t new_item)::items) rest
| 1::_ ->
list_items ~p indents ((U,indents,rev_to_t new_item)::items) rest
| i::_ ->
if i > 1 then
make_up p items, l
else (* i < 1 : new sub list*)
let sublist, remains =
list_items ~p (1::indents)
[(U,1::indents,rev_to_t new_item)] rest
in
list_items ~p indents (add sublist items) remains
end
| Spaces n::(Star|Minus|Plus)::(Space|Spaces _)::tl ->
begin
match fsplit_rev ~f:(end_of_item (n+2)) tl with
| None ->
make_up p items, l
| Some(new_item, rest) ->
let p = p || has_paragraphs new_item in
match indents with
| [] ->
if debug then
eprintf "(OMD) spaces[] l=(%S)\n%!" (L.string_of_tokens l);
assert(items = []); (* ae... listes mal formes ?! *)
list_items ~p [n+2] ((U,[n+2],rev_to_t new_item)::items) rest
| i::_ ->
if debug then eprintf "(OMD) spaces(%d::_) n=%d l=(%S)\n%!"
i n (L.string_of_tokens l);
if i = n + 2 then
let items = (U,indents,rev_to_t new_item) :: items in
list_items ~p indents items rest
else if i < n + 2 then
let sublist, remains =
list_items ~p ((n+2)::indents)
[(U,(n+2)::indents,rev_to_t new_item)]
rest
in
list_items ~p indents (add sublist items) remains
else (* i > n + 2 *)
make_up p items, l
end
(* new ordered items *)
| Number _::Dot::(Space|Spaces _)::tl ->
begin
match fsplit_rev ~f:(end_of_item 0) tl with
| None ->
make_up p items, l
| Some(new_item, rest) ->
let p = p || has_paragraphs new_item in
assert_well_formed new_item;
match indents with
| [] ->
assert(items = []);
list_items ~p [0] ((O,[0],rev_to_t new_item)::items) rest
| 0::_ ->
list_items ~p indents ((O,indents,rev_to_t new_item)::items) rest
| _::_ ->
make_up p items, l
end
| Space::Number _::Dot::(Space|Spaces _)::tl ->
begin
match fsplit_rev ~f:(end_of_item 1) tl with
| None -> make_up p items, l
| Some(new_item, rest) ->
let p = p || has_paragraphs new_item in
match indents with
| [] ->
assert(items = []);
list_items ~p [1] ((O,[1],rev_to_t new_item)::items) rest
| 1::_ ->
list_items ~p indents ((O,indents,rev_to_t new_item)::items) rest
| i::_ ->
if i > 1 then
make_up p items, l
else (* i < 1 : new sub list*)
let sublist, remains =
list_items ~p (1::indents)
[(O,1::indents,rev_to_t new_item)] rest
in
list_items ~p:p indents (add sublist items) remains
end
| Spaces n::Number _::Dot::(Space|Spaces _)::tl ->
begin
match fsplit_rev ~f:(end_of_item (n+2)) tl with
| None ->
make_up p items, l
| Some(new_item, rest) ->
let p = p || has_paragraphs new_item in
match indents with
| [] ->
if debug then eprintf "(OMD) spaces[] l=(%S)\n%!"
(L.string_of_tokens l);
assert(items = []); (* ae... listes mal formes ?! *)
list_items ~p [n+2] ((O,[n+2],rev_to_t new_item)::items) rest
| i::_ ->
if debug then eprintf "(OMD) spaces(%d::_) n=%d l=(%S)\n%!"
i n (L.string_of_tokens l);
if i = n + 2 then
list_items ~p indents ((O,indents,rev_to_t new_item)::items)
rest
else if i < n + 2 then
let sublist, remains =
list_items ~p
((n+2)::indents)
[(O,(n+2)::indents,rev_to_t new_item)]
rest
in
list_items ~p:p indents (add sublist items) remains
else (* i > n + 2 *)
make_up p items, l
end
(* *)
| Newlines 0::((Star|Minus|Plus)::(Space|Spaces _)::_ as l)
| Newlines 0::(Number _::Dot::(Space|Spaces _)::_ as l)
| Newlines 0::((Space|Spaces _)::Star::(Space|Spaces _)::_ as l)
| Newlines 0::((Space|Spaces _)::Number _::Dot::(Space|Spaces _)::_ as l)
->
list_items ~p:true indents items l
| _ ->
if debug then
begin
let rec string_of_items items =
match items with
| [] -> ""
| (O,indent::_,item)::tl ->
sprintf "(O,i=%d,%S)" (indent) (Omd_backend.html_of_md item)
^ string_of_items tl
| (U,indent::_,item)::tl ->
sprintf "(U,i=%d,%S)" (indent) (Omd_backend.html_of_md item)
^ string_of_items tl
| _ -> "(weird)"
in
eprintf "(OMD) NALI parse_list: l=(%S) items=%s\n%!"
(L.string_of_tokens l) (string_of_items items)
end;
(* not a list item *)
make_up p items, l
in
match list_items ~p:false [] [] l with
| rp, l ->
rp::r, [Newline], l
let icode ?(default_lang=default_lang) r _p l =
assert_well_formed l;
(* indented code: returns (r,p,l) where r is the result, p is the
last thing read, l is the remains *)
let dummy_tag = Tag("dummy_tag",
object
method to_string = ""
method parser_extension = fun r p l -> None
end) in
let accu = Buffer.create 64 in
let rec loop s tl = match s, tl with
| (Newline|Newlines _ as p), (Space|Spaces(0|1))::_ ->
(* 1, 2 or 3 spaces. *)
(* -> Return what's been found as code because what follows isn't. *)
Code_block(default_lang, Buffer.contents accu) :: r, [p], tl
| (Newline|Newlines _ as p), Spaces(n)::tl ->
assert(n>0);
(* At least 4 spaces, it's still code. *)
Buffer.add_string accu (L.string_of_token p);
loop
(if n >= 4 then Spaces(n-4) else if n = 3 then Space else dummy_tag)
tl
| (Newline|Newlines _ as p), (not_spaces::_ as tl) -> (* stop *)
Code_block(default_lang, Buffer.contents accu) :: r, [p], tl
(* -> Return what's been found as code because it's no more code. *)
| p, e::tl ->
Buffer.add_string accu (L.string_of_token p);
(* html entities are to be converted later! *)
loop e tl
| p, [] ->
Buffer.add_string accu (L.string_of_token p);
Code_block(default_lang, Buffer.contents accu)::r, [p], []
in
match l with
| Spaces n::tl ->
if n >= 4 then
Some(loop (Spaces(n-4)) tl)
else if n = 3 then
Some(loop Space tl)
else Some(loop dummy_tag tl)
| _ -> assert false
(* Returns [(r,p,l)] where [r] is the result, [p] is the last thing
read, and [l] is what remains. *)
let spaces_at_beginning_of_line main_loop default_lang n r previous lexemes =
assert_well_formed lexemes;
assert (n > 0);
if n <= 3 then (
match lexemes with
| (Star|Minus|Plus) :: (Space|Spaces _) :: _ ->
(* unordered list *)
parse_list main_loop r [] (L.make_space n::lexemes)
| (Number _)::Dot::(Space|Spaces _)::tl ->
(* ordered list *)
parse_list main_loop r [] (L.make_space n::lexemes)
| []
| (Newline|Newlines _) :: _ -> (* blank line, skip spaces *)
r, previous, lexemes
| _::_ ->
Text (" ")::r, previous, lexemes
)
else ( (* n>=4, blank line or indented code *)
match lexemes with
| [] | (Newline|Newlines _) :: _ -> r, previous, lexemes
| _ ->
match
icode ~default_lang r [Newline] (L.make_space n :: lexemes)
with
| Some(r,p,l) -> r,p,l
| None ->
if debug then
eprintf "(OMD) Omd_parser.icode or \
Omd_parser.main_loop is broken\n%!";
assert false
)
let spaces_not_at_beginning_of_line ?(html=false) n r lexemes =
assert_well_formed lexemes;
assert (n > 0);
if n = 1 then
(Text " "::r), [Space], lexemes
else (
match lexemes with
| Newline :: tl when not html ->
if debug then
eprintf
"(OMD) 2 or more spaces before a newline, eat the newline\n%!";
Br::r, [Spaces(n-2)], tl
| Newlines k :: tl when not html ->
if debug then
eprintf
"(OMD) 2 or more spaces before a newline, eat 1 newline";
let newlines = if k = 0 then Newline else Newlines(k-1) in
Br::r, [Spaces(n-2)], newlines :: tl
| _ ->
assert (n>1);
(Text (String.make n ' ')::r), [Spaces(n-2)], lexemes
)
let maybe_autoemail r p l =
assert_well_formed l;
match l with
| Lessthan::tl ->
begin
match
fsplit ~excl:(function (Newline|Newlines _|Space|Spaces _) :: _-> true
| [] -> true
| _ -> false)
~f:(function At::tl -> Split([],tl) | _ -> Continue)
tl
with
| None -> None
| Some(left, right) ->
match
fsplit
~excl:(function
| (Newline|Newlines _|Space|Spaces _) :: _-> true
| [] -> true
| _ -> false)
~f:(function Greaterthan::tl -> Split([],tl)
| Greaterthans 0::tl -> Split([],Greaterthan::tl)
| Greaterthans n::tl -> Split([],Greaterthans(n-1)::tl)
| _ -> Continue)
right
with
| None -> None
| Some(domain, tl) ->
let email = L.string_of_tokens left
^ "@" ^ L.string_of_tokens domain in
Some(Url("mailto:"^email,[Text email],"")::r,[Greaterthan],tl)
end
| _ -> failwith "Omd_parser.maybe_autoemail: wrong use of the function."
let is_hex s =
String.length s > 1
&& (s.[0] = 'X' || s.[0] = 'x')
&& (let rec loop i =
i = String.length s
||
(match s.[i] with
| '0' .. '9' | 'A' .. 'F' | 'a' .. 'f' ->
loop (succ i)
| _ -> false)
in loop 1)
let mediatypetextomd : string list ref = ref []
let filter_text_omd_rev l =
let rec loop b r = function
| [] -> if b then r else l
| ("media:type", Some "text/omd")::tl ->
loop true r tl
| e::tl ->
loop b (e::r) tl
in
loop false [] l
exception Orphan_closing of string * l * l
let rec main_impl_rev ~html (r:r) (previous:p) (lexemes:l) =
(* if debug then eprintf "(OMD) main_impl_rev html=%b\n%!" html; *)
assert_well_formed lexemes;
if debug then
eprintf "(OMD) main_impl_rev html=%b r=%s p=(%s) l=(%s)\n%!"
html
(Omd_backend.sexpr_of_md (List.rev r))
(L.destring_of_tokens previous)
(L.destring_of_tokens lexemes);
match previous, lexemes with
(* no more to process *)
| _, [] ->
(* return the result (/!\ it has to be reversed as some point) *)
r
(* Tag: tag system $\cup$ high-priority extension mechanism *)
| _, Tag(_name, e) :: tl ->
begin match e#parser_extension r previous tl with
| Some(r, p, l) ->
main_impl_rev ~html r p l
| None ->
main_impl_rev ~html r previous tl
end
(* HTML comments *)
| _, (Lessthan as t)::(Exclamation::Minuss 0::c as tl) ->
begin
let f = function
| (Minuss _ as m)::(Greaterthan|Greaterthans _ as g)::tl ->
Split([g;m], tl)
| _ ->
Continue
in
match fsplit ~f:f lexemes with
| None ->
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
| Some (comments, new_tl) ->
let r = Html_comment(L.string_of_tokens comments) :: r in
main_impl_rev ~html r [Newline] new_tl
end
(* email-style quoting / blockquote *)
| ([]|[Newline|Newlines _]), Greaterthan::(Space|Spaces _)::_ ->
begin
match
emailstyle_quoting main_loop r previous (Newline::lexemes)
with
| Some(r,p,l) -> main_impl_rev ~html r p l
| None ->
if debug then
eprintf "(OMD) Omd_parser.emailstyle_quoting or \
Omd_parser.main_loop is broken\n%!";
assert false
end
(* email-style quoting, with lines starting with spaces! *)
| ([]|[Newline|Newlines _]), (Space|Spaces(0|1) as s)
:: Greaterthan :: (Space|Spaces _)::_ ->
(* It's 1, 2 or 3 spaces, not more because it wouldn't mean
quoting anymore but code. *)
begin
let new_r, p, rest =
let foo, rest =
match unindent (L.length s) (Newline::lexemes) with
| (Newline|Newlines _)::foo, rest -> foo, rest
| res -> res
in
match
emailstyle_quoting main_loop [] previous (Newline::foo)
with
| Some(new_r, p, []) -> new_r, p, rest
| _ ->
if debug then
eprintf "(OMD) Omd_parser.emailstyle_quoting or \
Omd_parser.main_loop is broken\n%!";
assert false
in
main_impl_rev ~html (new_r@r) [Newline] rest
end
(* minus *)
| ([]|[Newline|Newlines _]),
(Minus|Minuss _ as t) :: ((Space|Spaces _)::_ as tl) ->
(* maybe hr *)
begin match hr_m lexemes with
| None -> (* no hr, so it could be a list *)
begin match t with
| Minus -> (* it's a list *)
let md, new_p, new_l =
parse_list main_loop r [] lexemes
in
main_impl_rev ~html md new_p new_l
| _ -> (* not a list *)
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
end
| Some l -> (* hr *)
main_impl_rev ~html (Hr::r) [Newline] l
end
| ([]|[Newline|Newlines _]), (Minus|Minuss _ as t)::tl ->
begin match hr_m lexemes with
| None -> (* no hr, and it's not a list either
because it's not followed by spaces *)
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
| Some l -> (* hr *)
main_impl_rev ~html (Hr::r) [Newline] l
end
(* hashes *)
| ([]|[(Newline|Newlines _)]),
(Hashs n as t) :: ((Space|Spaces _) :: ttl as tl)
| ([]|[(Newline|Newlines _)]),
(Hashs n as t) :: (ttl as tl) -> (* hash titles *)
if n <= 4 then
match read_title main_loop (n+2) r previous ttl with
| Some(r, p, l) -> main_impl_rev ~html r p l
| None ->
if debug then
eprintf "(OMD) Omd_parser.read_title or \
Omd_parser.main_loop is broken\n%!";
assert false
else
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
| ([]|[(Newline|Newlines _)]), Hash :: (Space|Spaces _) :: tl
| ([]|[(Newline|Newlines _)]), Hash :: tl -> (* hash titles *)
begin match read_title main_loop 1 r previous tl with
| Some(r, p, l) -> main_impl_rev ~html r p l
| None ->
if debug then
eprintf "(OMD) Omd_parser.read_title or \
Omd_parser.main_loop is broken\n%!";
assert false
end
| _, (Hash|Hashs _ as t) :: tl -> (* hash -- no title *)
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
(* spaces after a newline: could lead to hr *)
| ([]|[Newline|Newlines _]), ((Space|Spaces _) as sp) :: tl ->
begin match hr tl with
| None ->
(* No [Hr], but maybe [Ul], [Ol], code,... *)
let n = L.length sp in
let r, p, l =
spaces_at_beginning_of_line main_loop default_lang n r previous tl in
main_impl_rev ~html r p l
| Some tl ->
main_impl_rev ~html (Hr::r) [Newline] tl
end
(* spaces anywhere *)
| _, ((Space|Spaces _) as t) :: tl ->
(* too many cases to be handled here *)
let n = L.length t in
let r, p, l = spaces_not_at_beginning_of_line ~html n r tl in
main_impl_rev ~html r p l
(* underscores *)
| _, (Underscore as t) :: tl -> (* one "orphan" underscore, or emph *)
(match uemph_or_bold 1 tl with
| None ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
| Some(x, new_tl) ->
main_impl_rev ~html (Emph(main_impl ~html [] [t] x) :: r) [t] new_tl
)
| _, (Underscores((0|1) as n) as t) :: tl ->
(* 2 or 3 "orphan" underscores, or emph/bold *)
(match uemph_or_bold (n+2) tl with
| None ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
| Some(x, new_tl) ->
if n = 0 then (* 1 underscore *)
main_impl_rev ~html (Bold(main_impl ~html [] [t] x) :: r) [t] new_tl
else (* 2 underscores *)
main_impl_rev ~html (Emph([Bold(main_impl ~html [] [t] x)]) :: r) [t] new_tl
)
(* enumerated lists *)
| ([]|[Newline|Newlines _]), (Number _) :: Dot :: (Space|Spaces _) :: tl ->
let md, new_p, new_l =
parse_list main_loop r [] lexemes
in
main_impl_rev ~html md new_p new_l
(* plus *)
| ([]|[(Newline|Newlines _)]), Plus :: (Space|Spaces _) :: _ ->
let md, new_p, new_l =
parse_list main_loop r [] lexemes
in
main_impl_rev ~html md new_p new_l
(* stars *)
| ([]|[(Newline|Newlines _)]), Star :: (Space|Spaces _) :: _ ->
(* maybe hr or new list *)
begin match hr_s lexemes with
| Some l ->
main_impl_rev ~html (Hr::r) [Newline] l
| None ->
let md, new_p, new_l =
parse_list main_loop r [] lexemes
in
main_impl_rev ~html md new_p new_l
end
| ([]|[(Newline|Newlines _)]), Stars _ :: _ when hr_s lexemes <> None ->
(* hr *)
(match hr_s lexemes with
| Some l -> main_impl_rev ~html (Hr::r) [Newline] l
| None -> assert false
)
| ([]|[(Newline|Newlines _)]), (Star as t) :: tl -> (* maybe hr *)
begin match hr_s lexemes with
| Some l ->
main_impl_rev ~html (Hr::r) [Newline] l
| None ->
(match semph_or_bold 1 tl with
| Some(x, new_tl) ->
main_impl_rev ~html (Emph(main_impl ~html [] [t] x) :: r) [t] new_tl
| None ->
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
)
end
| _, (Star as t) :: tl -> (* one "orphan" star, or emph // can't be hr *)
(match semph_or_bold 1 tl with
| Some(x, new_tl) ->
main_impl_rev ~html (Emph(main_impl ~html [] [t] x) :: r) [t] new_tl
| None ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
)
| _, (Stars((0|1) as n) as t) :: tl ->
(* 2 or 3 "orphan" stars, or emph/bold *)
(match semph_or_bold (n+2) tl with
| Some(x, new_tl) ->
if n = 0 then
main_impl_rev ~html (Bold(main_impl ~html [] [t] x) :: r) [t] new_tl
else
main_impl_rev ~html (Emph([Bold(main_impl ~html [] [t] x)]) :: r) [t] new_tl
| None ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
)
(* backslashes *)
| _, Backslash :: (Newline as t) :: tl -> (* \\n *)
main_impl_rev ~html (Br :: r) [t] tl
| _, Backslash :: Newlines 0 :: tl -> (* \\n\n\n\n... *)
main_impl_rev ~html (Br :: r) [Backslash; Newline] (Newline :: tl)
| _, Backslash :: Newlines n :: tl -> assert (n >= 0); (* \\n\n\n\n... *)
main_impl_rev ~html (Br :: r) [Backslash; Newline]
(Newlines (n-1) :: tl)
| _, Backslash :: (Backquote as t) :: tl -> (* \` *)
main_impl_rev ~html (Text ("`") :: r) [t] tl
| _, Backslash :: Backquotes 0 :: tl -> (* \````... *)
main_impl_rev ~html (Text ("`") :: r) [Backslash; Backquote] (Backquote :: tl)
| _, Backslash :: Backquotes n :: tl -> assert (n >= 0); (* \````... *)
main_impl_rev ~html (Text ("`") :: r) [Backslash; Backquote]
(Backquotes (n-1) :: tl)
| _, Backslash :: (Star as t) :: tl -> (* \* *)
main_impl_rev ~html (Text ("*") :: r) [t] tl
| _, Backslash :: Stars 0 :: tl -> (* \****... *)
main_impl_rev ~html (Text ("*") :: r) [Backslash; Star] (Star :: tl)
| _, Backslash :: Stars n :: tl -> assert (n >= 0); (* \****... *)
main_impl_rev ~html (Text ("*") :: r) [Backslash; Star] (Stars (n-1) :: tl)
| _, Backslash :: (Underscore as t) :: tl -> (* \_ *)
main_impl_rev ~html (Text ("_") :: r) [t] tl
| _, Backslash :: Underscores 0 :: tl -> (* \___... *)
main_impl_rev ~html (Text ("_") :: r) [Backslash; Underscore] (Underscore :: tl)
| _, Backslash :: Underscores n :: tl -> assert (n >= 0); (* \___... *)
main_impl_rev ~html (Text ("_") :: r) [Backslash; Underscore]
(Underscores (n-1) :: tl)
| _, Backslash :: (Obrace as t) :: tl -> (* \{ *)
main_impl_rev ~html (Text ("{") :: r) [t] tl
| _, Backslash :: Obraces 0 :: tl -> (* \{{{... *)
main_impl_rev ~html (Text ("{") :: r) [Backslash; Obrace] (Obrace :: tl)
| _, Backslash :: Obraces n :: tl -> assert (n >= 0); (* \{{{... *)
main_impl_rev ~html (Text ("{") :: r) [Backslash; Obrace] (Obraces (n-1) :: tl)
| _, Backslash :: (Cbrace as t) :: tl -> (* \} *)
main_impl_rev ~html (Text ("}") :: r) [t] tl
| _, Backslash :: Cbraces 0 :: tl -> (* \}}}... *)
main_impl_rev ~html (Text ("}") :: r) [Backslash; Cbrace] (Cbrace :: tl)
| _, Backslash :: Cbraces n :: tl -> assert (n >= 0); (* \}}}... *)
main_impl_rev ~html (Text ("}") :: r) [Backslash; Cbrace] (Cbraces (n-1) :: tl)
| _, Backslash :: (Obracket as t) :: tl -> (* \[ *)
main_impl_rev ~html (Text ("[") :: r) [t] tl
| _, Backslash :: Obrackets 0 :: tl -> (* \[[[... *)
main_impl_rev ~html (Text ("[") :: r) [Backslash; Obracket] (Obracket :: tl)
| _, Backslash :: Obrackets n :: tl -> assert (n >= 0); (* \[[[... *)
main_impl_rev ~html (Text ("[") :: r) [Backslash; Obracket] (Obrackets (n-1) :: tl)
| _, Backslash :: (Cbracket as t) :: tl -> (* \} *)
main_impl_rev ~html (Text ("]") :: r) [t] tl
| _, Backslash :: Cbrackets 0 :: tl -> (* \}}}... *)
main_impl_rev ~html (Text ("]") :: r) [Backslash; Cbracket] (Cbracket :: tl)
| _, Backslash :: Cbrackets n :: tl -> assert (n >= 0); (* \}}}... *)
main_impl_rev ~html (Text ("]") :: r) [Backslash; Cbracket] (Cbrackets (n-1) :: tl)
| _, Backslash :: (Oparenthesis as t) :: tl -> (* \( *)
main_impl_rev ~html (Text ("(") :: r) [t] tl
| _, Backslash :: Oparenthesiss 0 :: tl -> (* \(((... *)
main_impl_rev ~html (Text ("(") :: r) [Backslash; Oparenthesis] (Oparenthesis :: tl)
| _, Backslash :: Oparenthesiss n :: tl -> assert (n >= 0); (* \(((... *)
main_impl_rev ~html (Text ("(") :: r) [Backslash; Oparenthesis]
(Oparenthesiss (n-1) :: tl)
| _, Backslash :: (Cparenthesis as t) :: tl -> (* \) *)
main_impl_rev ~html (Text (")") :: r) [t] tl
| _, Backslash :: Cparenthesiss 0 :: tl -> (* \)))... *)
main_impl_rev ~html (Text (")") :: r) [Backslash; Cparenthesis]
(Cparenthesis :: tl)
| _, Backslash :: Cparenthesiss n :: tl -> assert (n >= 0); (* \)))... *)
main_impl_rev ~html (Text (")") :: r) [Backslash; Cparenthesis]
(Cparenthesiss (n-1) :: tl)
| _, Backslash :: (Plus as t) :: tl -> (* \+ *)
main_impl_rev ~html (Text ("+") :: r) [t] tl
| _, Backslash :: Pluss 0 :: tl -> (* \+++... *)
main_impl_rev ~html (Text ("+") :: r) [Backslash; Plus] (Plus :: tl)
| _, Backslash :: Pluss n :: tl -> assert (n >= 0); (* \+++... *)
main_impl_rev ~html (Text ("+") :: r) [Backslash; Plus] (Pluss (n-1) :: tl)
| _, Backslash :: (Minus as t) :: tl -> (* \- *)
main_impl_rev ~html (Text ("-") :: r) [t] tl
| _, Backslash :: Minuss 0 :: tl -> (* \---... *)
main_impl_rev ~html (Text ("-") :: r) [Backslash; Minus] (Minus :: tl)
| _, Backslash :: Minuss n :: tl -> assert (n >= 0); (* \---... *)
main_impl_rev ~html (Text ("-") :: r) [Backslash; Minus] (Minuss (n-1) :: tl)
| _, Backslash :: (Dot as t) :: tl -> (* \. *)
main_impl_rev ~html (Text (".") :: r) [t] tl
| _, Backslash :: Dots 0 :: tl -> (* \....... *)
main_impl_rev ~html (Text (".") :: r) [Backslash; Dot] (Dot :: tl)
| _, Backslash :: Dots n :: tl -> assert (n >= 0); (* \....... *)
main_impl_rev ~html (Text (".") :: r) [Backslash; Dot] (Dots (n-1) :: tl)
| _, Backslash :: (Exclamation as t) :: tl -> (* \! *)
main_impl_rev ~html (Text ("!") :: r) [t] tl
| _, Backslash :: Exclamations 0 :: tl -> (* \!!!... *)
main_impl_rev ~html (Text ("!") :: r) [Backslash; Exclamation] (Exclamation :: tl)
| _, Backslash :: Exclamations n :: tl -> assert (n >= 0); (* \!!!... *)
main_impl_rev ~html (Text ("!") :: r) [Backslash; Exclamation]
(Exclamations (n-1) :: tl)
| _, Backslash :: (Hash as t) :: tl -> (* \# *)
main_impl_rev ~html (Text ("#") :: r) [t] tl
| _, Backslash :: Hashs 0 :: tl -> (* \###... *)
main_impl_rev ~html (Text ("#") :: r) [Backslash; Hash] (Hash :: tl)
| _, Backslash :: Hashs n :: tl -> assert (n >= 0); (* \###... *)
main_impl_rev ~html (Text ("#") :: r) [Backslash; Hash] (Hashs (n-1) :: tl)
| _, Backslash :: (Greaterthan as t) :: tl -> (* \> *)
main_impl_rev ~html (Text (">") :: r) [t] tl
| _, Backslash :: Greaterthans 0 :: tl -> (* \>>>... *)
main_impl_rev ~html (Text (">") :: r) [Backslash; Greaterthan] (Greaterthan :: tl)
| _, Backslash :: Greaterthans n :: tl -> assert (n >= 0); (* \>>>... *)
main_impl_rev ~html (Text (">") :: r) [Backslash; Greaterthan]
(Greaterthans (n-1) :: tl)
| _, Backslash :: (Lessthan as t) :: tl -> (* \< *)
main_impl_rev ~html (Text ("<") :: r) [t] tl
| _, Backslash :: Lessthans 0 :: tl -> (* \<<<... *)
main_impl_rev ~html (Text ("<") :: r) [Backslash; Lessthan] (Lessthan :: tl)
| _, Backslash :: Lessthans n :: tl -> assert (n >= 0); (* \<<<... *)
main_impl_rev ~html (Text ("<") :: r) [Backslash; Lessthan]
(Lessthans (n-1) :: tl)
| _, (Backslashs 0 as t) :: tl -> (* \\\\... *)
main_impl_rev ~html (Text ("\\") :: r) [t] tl
| _, (Backslashs n as t) :: tl -> (* \\\\... *)
if n mod 2 = 0 then
main_impl_rev ~html (Text(String.make ((n+2)/2) '\\') :: r) [t] tl
else
main_impl_rev ~html (Text(String.make ((n+2)/2) '\\') :: r) [t] (Backslash :: tl)
| _, Backslash::[] ->
main_impl_rev ~html (Text "\\" :: r) [] []
| _, Backslash::tl ->
main_impl_rev ~html (Text "\\" :: r) [Backslash] tl
(* < *)
| _, (Lessthan|Lessthans _ as t)
:: (Word("http"|"https"|"ftp"|"ftps"|"ssh"|"afp"|"imap") as w)
:: Colon::Slashs(n)::tl ->
(* "semi-automatic" URLs *)
let rec read_url accu = function
| (Newline|Newlines _)::tl ->
None
| Greaterthan::tl ->
let url =
(L.string_of_token w) ^ "://"
^ (if n = 0 then "" else String.make (n-1) '/')
^ L.string_of_tokens (List.rev accu)
in Some(url, tl)
| x::tl ->
read_url (x::accu) tl
| [] ->
None
in
begin match read_url [] tl with
| Some(url, new_tl) ->
let r =
match t with
| Lessthans 0 -> Text "<" :: r
| Lessthans n -> Text(String.make (n+1) '<') :: r
| _ -> r
in
main_impl_rev ~html (Url(url,[Text url],"")::r) [] new_tl
| None ->
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
end
(* Word(w) *)
| _, Word w::tl ->
main_impl_rev ~html (Text w :: r) [Word w] tl
(* newline at the end *)
| _, [Newline] ->
NL::r
(* named html entity *)
| _, Ampersand::((Word w::((Semicolon|Semicolons _) as s)::tl) as tl2) ->
if StringSet.mem w htmlcodes_set then
begin match s with
| Semicolon ->
main_impl_rev ~html (Raw("&"^w^";")::r) [s] tl
| Semicolons 0 ->
main_impl_rev ~html (Raw("&"^w^";")::r) [s] (Semicolon::tl)
| Semicolons n ->
main_impl_rev ~html (Raw("&"^w^";")::r) [s] (Semicolons(n-1)::tl)
| _ -> assert false
end
else
main_impl_rev ~html (Raw("&")::r) [] tl2
(* digit-coded html entity *)
| _, Ampersand::((Hash::Number w::((Semicolon|Semicolons _) as s)::tl)
as tl2) ->
if String.length w <= 4 then
begin match s with
| Semicolon ->
main_impl_rev ~html (Raw(""^w^";")::r) [s] tl
| Semicolons 0 ->
main_impl_rev ~html (Raw(""^w^";")::r) [s] (Semicolon::tl)
| Semicolons n ->
main_impl_rev ~html (Raw(""^w^";")::r) [s] (Semicolons(n-1)::tl)
| _ -> assert false
end
else
main_impl_rev ~html (Raw("&")::r) [] tl2
(* maybe hex digit-coded html entity *)
| _, Ampersand::((Hash::Word w::((Semicolon|Semicolons _) as s)::tl)
as tl2) when is_hex w ->
if String.length w <= 4 then
begin match s with
| Semicolon ->
main_impl_rev ~html (Raw(""^w^";")::r) [s] tl
| Semicolons 0 ->
main_impl_rev ~html (Raw(""^w^";")::r) [s] (Semicolon::tl)
| Semicolons n ->
main_impl_rev ~html (Raw(""^w^";")::r) [s] (Semicolons(n-1)::tl)
| _ -> assert false
end
else
main_impl_rev ~html (Raw("&")::r) [] tl2
(* Ampersand *)
| _, Ampersand::tl ->
main_impl_rev ~html (Raw("&")::r) [Ampersand] tl
(* 2 Ampersands *)
| _, Ampersands(0)::tl ->
main_impl_rev ~html (Raw("&")::r) [] (Ampersand::tl)
(* Several Ampersands (more than 2) *)
| _, Ampersands(n)::tl ->
main_impl_rev ~html (Raw("&")::r) [] (Ampersands(n-1)::tl)
(* backquotes *)
| _, (Backquote|Backquotes _ as t)::tl ->
begin match bcode ~default_lang r previous lexemes with
| Some(r, p, l) -> main_impl_rev ~html r p l
| None ->
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
end
(* HTML *)
(*
and
with or without space(s) *)
| _, (Lessthan::Word("br"|"hr" as w)::Slash
::(Greaterthan|Greaterthans _ as g)::tl)
| _, (Lessthan::Word("br"|"hr" as w)::(Space|Spaces _)::Slash
::(Greaterthan|Greaterthans _ as g)::tl) ->
begin match g with
| Greaterthans 0 ->
main_impl_rev ~html (Raw("<"^w^" />")::r) [Greaterthan] (Greaterthan::tl)
| Greaterthans n ->
main_impl_rev ~html (Raw("<"^w^" />")::r) [Greaterthan]
(Greaterthans(n-1)::tl)
| _ ->
main_impl_rev ~html (Raw("<"^w^" />")::r) [Greaterthan] tl
end
(* awaited orphan html closing tag *)
| _, Lessthan::Slash::Word(w)::(Greaterthan|Greaterthans _ as g)::tl
when !mediatypetextomd <> [] ->
raise (Orphan_closing(w,
lexemes,
(match g with
| Greaterthans 0 -> Greaterthan::tl
| Greaterthans n -> Greaterthans(n-1)::tl
| _ -> tl)))
(* block html *)
| ([] | [Newline|Newlines _|Tag("HTMLBLOCK", _)]),
(Lessthan as t)
::((Word(tagnametop) as w)
::((Space|Spaces _|Greaterthan|Greaterthans _)
::_ as html_stuff) as tlx) ->
if StringSet.mem tagnametop inline_htmltags_set then
main_impl_rev ~html r [Word ""] lexemes
else if not (blind_html || StringSet.mem tagnametop htmltags_set) then
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tlx
| Some(r, p, l) -> main_impl_rev ~html r p l
end
else
let read_html() =
let module T = struct
type t =
| Awaiting of string
| Open of string
type interm =
| HTML of string * (string * string option) list * interm list
| FTOKENS of L.t
| RTOKENS of L.t
| MD of Omd_representation.t
let rec md_of_interm_list html l =
let md_of_interm_list ?(html=html) l =
md_of_interm_list html l
in
match l with
| [] -> []
| HTML(t, a, c)::tl ->
(
let f_a = filter_text_omd_rev a in
if f_a != a then
Html_block
(t,
f_a,
make_paragraphs
(md_of_interm_list ~html:false (List.rev c)))
:: md_of_interm_list tl
else
Html_block
(t, f_a, md_of_interm_list ~html:true (List.rev c))
:: md_of_interm_list tl
)
| MD md::tl ->
md@md_of_interm_list tl
| RTOKENS t1::FTOKENS t2::tl ->
md_of_interm_list (FTOKENS(List.rev_append t1 t2)::tl)
| RTOKENS t1::RTOKENS t2::tl ->
md_of_interm_list
(FTOKENS(List.rev_append t1 (List.rev t2))::tl)
| FTOKENS t1::FTOKENS t2::tl ->
md_of_interm_list (FTOKENS(t1@t2)::tl)
| FTOKENS t :: tl ->
if html then
Raw(L.string_of_tokens t) :: md_of_interm_list tl
else
main_loop ~html [] [Word ""] t
@ md_of_interm_list tl
| RTOKENS t :: tl ->
md_of_interm_list (FTOKENS(List.rev t) :: tl)
let md_of_interm_list l = md_of_interm_list true l
let string_of_tagstatus tagstatus =
let b = Buffer.create 64 in
List.iter (function
| Open t -> bprintf b "{B/Open %s}" t
| Awaiting t -> bprintf b "{B/Awaiting %s}" t
) tagstatus;
Buffer.contents b
end in
let add_token_to_body x body =
match body with
| T.RTOKENS r :: body -> T.RTOKENS(x::r)::body
| _ -> T.RTOKENS[x] :: body
in
let rec loop (body:T.interm list) attrs tagstatus tokens =
if debug then
eprintf "(OMD) 3333 BHTML loop body=%S tagstatus=%S %S\n%!"
(Omd_backend.sexpr_of_md(T.md_of_interm_list body))
(T.string_of_tagstatus tagstatus)
(L.destring_of_tokens tokens);
match tokens with
| [] ->
begin
match tagstatus with
| [] -> Some(body, tokens)
| T.Open t :: _ when StringSet.mem t html_void_elements ->
Some(body, tokens)
| _ ->
if debug then
eprintf "(OMD) 3401 BHTML Not enough to read\n%!";
None
end
| Lessthans n::tokens ->
begin match tagstatus with
| T.Awaiting _ :: _ -> None
| _ ->
if debug then eprintf "(OMD) 3408 BHTML loop\n%!";
loop
(add_token_to_body
(if n = 0 then Lessthan else Lessthans(n-1))
body)
attrs tagstatus (Lessthan::tokens)
end
(* self-closing tags *)
| Slash::Greaterthan::tokens ->
begin match tagstatus with
| T.Awaiting(tagname) :: tagstatus
when StringSet.mem tagname html_void_elements ->
loop [T.HTML(tagname, attrs, [])] [] tagstatus tokens
| _ ->
if debug then eprintf "(OMD) 3419 BHTML loop\n%!";
loop
(add_token_to_body
Slash
(add_token_to_body
Greaterthan
body))
attrs tagstatus tokens
end
(* closing the tag opener *)
| Lessthan::Slash::(Word(tagname) as w)
::(Greaterthan|Greaterthans _ as g)::tokens ->
begin match tagstatus with
| T.Open t :: _ when t = tagname ->
if debug then
eprintf "(OMD) 3375 BHTML properly closing %S\n%!" t;
Some(body,
(match g with
| Greaterthans 0 -> Greaterthan :: tokens
| Greaterthans n -> Greaterthans(n-1) :: tokens
| _ -> tokens))
| T.Open t :: _ ->
if debug then
eprintf "(OMD) 3379 BHTML wrongly closing %S with %S 1\n%!"
t tagname;
loop (T.RTOKENS[g;w;Slash;Lessthan]::body)
[] tagstatus tokens
| T.Awaiting t :: _ ->
if debug then
eprintf "(OMD) 3383 BHTML wrongly closing %S with %S 2\n%!"
t tagname;
if !mediatypetextomd <> [] then
raise
(Orphan_closing(t,
lexemes,
(match g with
| Greaterthans 0 ->
Greaterthan::tokens
| Greaterthans n ->
Greaterthans(n-1)::tokens
| _ -> tokens)))
else
None
| [] ->
if debug then
eprintf "(OMD) BHTML wrongly closing %S 3\n%!" tagname;
None
end
(* tag *)
| Lessthan::(Word(tagname) as word)::tokens
when
blind_html
|| StringSet.mem tagname htmltags_set
->
if debug then
eprintf "(OMD) 3489 BHTML tagname && StringSet.mem t html_void_elements ->
None
| T.Awaiting _ :: _ -> None
| _ ->
if attrs <> [] then
begin
if debug then
eprintf "(OMD) 3496 BHTML tag %S but attrs <> []\n%!"
tagname;
None
end
else
begin
if debug then
eprintf "(OMD) 3421 BHTML tag %S, tagstatus=%S, \
attrs=[], tokens=%S\n%!"
tagname (T.string_of_tagstatus tagstatus)
(L.destring_of_tokens tokens);
match
loop [] [] (T.Awaiting tagname::tagstatus) tokens
with
| None ->
if debug then eprintf "(OMD) 3489 BHTML loop\n%!";
loop
(add_token_to_body
word
(add_token_to_body
Lessthan
body))
attrs tagstatus tokens
| Some(b, tokens) ->
if debug then begin
eprintf "(OMD) 3433 BHTML tagstatus=%S tokens=%S\n%!"
(T.string_of_tagstatus tagstatus)
(L.string_of_tokens tokens)
end;
Some(b@body, tokens)
end
end
(* end of opening tag *)
| Greaterthan::tokens ->
begin match tagstatus with
| T.Awaiting t :: tagstatus ->
if List.mem ("media:type", Some "text/omd") attrs then
(
mediatypetextomd := t :: !mediatypetextomd;
try
ignore(main_impl_rev ~html [] [] tokens);
if debug then
eprintf "(OMD) 3524 BHTML closing tag not found \
in %S\n%!" (L.destring_of_tokens tokens);
warn
(sprintf
"Closing tag `%s' not found for text/omd zone."
t);
mediatypetextomd := List.tl !mediatypetextomd;
None
with Orphan_closing(tagname, delimiter, after) ->
let before =
let rec f r = function
| Lessthans n as e :: tl ->
begin match delimiter with
| Lessthan::_ ->
if Lessthan::tl = delimiter then
List.rev
(if n = 0 then
Lessthan::r
else
Lessthans(n-1)::r)
else
f (e::r) tl
| _ ->
if tl == delimiter || tl = delimiter then
List.rev r
else
f (e::r) tl
end
| e::tl as l ->
if l == delimiter || l = delimiter then
List.rev r
else if tl == delimiter || tl = delimiter then
List.rev (e::r)
else
f (e::r) tl
| [] -> List.rev r
in
f [] tokens
in
if debug then
eprintf "(OMD) 3552 BHTML tokens=%s delimiter=%s \
after=%s before=%s (tagname=t)=%b\n%!"
(L.destring_of_tokens tokens)
(L.destring_of_tokens delimiter)
(L.destring_of_tokens after)
(L.destring_of_tokens before)
(tagname = t);
(match !mediatypetextomd with
| _ :: tl -> mediatypetextomd := tl
| [] -> assert false);
if tagname = t then
loop
[T.HTML
(t,
attrs,
[T.MD
(main_impl ~html [] []
(tag_setext main_loop before))])]
[]
tagstatus
after
else
None
)
else
begin
if debug then eprintf "(OMD) 3571 BHTML loop\n%!";
match loop body [] (T.Open t::tagstatus) tokens with
| None ->
if debug then
eprintf "(OMD) 3519 BHTML \
Couldn't find an closing tag for %S\n%!"
t;
None
| Some(body, l) ->
if debug then
eprintf "(OMD) 3498 BHTML Found a closing tag %s\n%!" t;
match tagstatus with
| _ :: _ ->
loop [T.HTML(t, attrs, body)] [] tagstatus l
| [] ->
Some([T.HTML(t, attrs, body)], l)
end
| T.Open t :: _ ->
if debug then
eprintf
"(OMD) 3591 BHTML Some `>` isn't for an opening tag\n%!";
loop (add_token_to_body Greaterthan body)
attrs tagstatus tokens
| [] ->
if debug then
eprintf "(OMD) 3542 BHTML tagstatus=[]\n%!";
None
end
(* maybe attribute *)
| (Colon|Colons _|Underscore|Underscores _|Word _ as t)::tokens
| (Space|Spaces _)
::(Colon|Colons _|Underscore|Underscores _|Word _ as t)
::tokens
when (match tagstatus with
| T.Awaiting _ :: _ -> true
| _ -> false) ->
begin
let module Attribute_value = struct
type t = Empty of name | Named of name | Void
and name = string
end in
let open Attribute_value in
let rec extract_attribute accu = function
| (Space | Spaces _ | Newline) :: tokens->
Empty(L.string_of_tokens(List.rev accu)), tokens
| (Greaterthan|Greaterthans _) :: _ as tokens->
Empty(L.string_of_tokens(List.rev accu)), tokens
| Equal :: tokens ->
Named(L.string_of_tokens(List.rev accu)), tokens
| Colon | Colons _ | Underscore | Underscores _ | Word _
| Number _ | Minus | Minuss _ | Dot | Dots _ as t :: tokens ->
extract_attribute (t::accu) tokens
| tokens -> Void, tokens
in
match extract_attribute [t] tokens with
| Empty attributename, tokens ->
(* attribute with no explicit value *)
if debug then eprintf "(OMD) 3628 BHTML loop\n%!";
loop body ((attributename, None)::attrs) tagstatus tokens
| Named attributename, tokens ->
begin match tokens with
| Quotes 0 :: tokens ->
if debug then
eprintf "(OMD) 3661 BHTML empty attribute 1 %S\n%!"
(L.string_of_tokens tokens);
loop body ((attributename, Some "")::attrs)
tagstatus tokens
| Quote :: tokens ->
begin
if debug then
eprintf "(OMD) 3668 BHTML non empty attribute 1 %S\n%!"
(L.string_of_tokens tokens);
match
fsplit
~excl:(function
| Quotes _ :: _ -> true
| _ -> false)
~f:(function
| Quote::tl -> Split([], tl)
| _ -> Continue)
tokens
with
| None -> None
| Some(at_val, tokens) ->
if debug then eprintf "(OMD) 3654 BHTML loop\n%!";
loop body ((attributename,
Some(L.string_of_tokens at_val))
::attrs) tagstatus tokens
end
| Doublequotes 0 :: tokens ->
begin
if debug then
eprintf "(OMD) 3690 BHTML empty attribute 2 %S\n%!"
(L.string_of_tokens tokens);
loop body ((attributename, Some "")::attrs)
tagstatus tokens
end
| Doublequote :: tokens ->
begin
if debug then
eprintf "(OMD) 3698 BHTML non empty attribute 2 %S\n%!"
(L.string_of_tokens tokens);
match fsplit
~excl:(function
| Doublequotes _ :: _ -> true
| _ -> false)
~f:(function
| Doublequote::tl -> Split([], tl)
| _ -> Continue)
tokens
with
| None -> None
| Some(at_val, tokens) ->
if debug then
eprintf "(OMD) 3622 BHTML %s=%S %s\n%!"
attributename
(L.string_of_tokens at_val)
(L.destring_of_tokens tokens);
loop body ((attributename,
Some(L.string_of_tokens at_val))
::attrs) tagstatus tokens
end
| _ -> None
end
| Void, _ -> None
end
| x::tokens as dgts
when (match tagstatus with T.Open _ :: _ -> true | _ -> false) ->
begin
if debug then
eprintf "(OMD) 3620 BHTML general %S\n%!"
(L.string_of_tokens dgts);
loop (add_token_to_body x body) attrs tagstatus tokens
end
| (Newline | Space | Spaces _) :: tokens
when
(match tagstatus with T.Awaiting _ :: _ -> true | _ -> false) ->
begin
if debug then eprintf "(OMD) 3737 BHTML spaces\n%!";
loop body attrs tagstatus tokens
end
| (Newlines _ as x) :: tokens
when
(match tagstatus with T.Awaiting _ :: _ -> true | _ -> false) ->
begin
if debug then eprintf "(OMD) 3827 BHTML newlines\n%!";
warn "there are empty lines in what may be an HTML block";
loop (add_token_to_body x body) attrs tagstatus tokens
end
| _ ->
if debug then
eprintf "(OMD) 3742 BHTML fallback with \
tokens=%s and tagstatus=%s\n%!"
(L.destring_of_tokens tokens)
(match tagstatus with
| [] -> "None"
| T.Awaiting _ :: _ -> "Awaiting"
| T.Open _ :: _ -> "Open (can't be)");
(match tagstatus with
| [] -> Some(body, tokens)
| T.Awaiting tag :: _ ->
warn (sprintf "expected to read an open HTML tag (%s), \
but found nothing" tag);
None
| T.Open tag :: _ ->
warn (sprintf "expected to find the closing HTML tag for %s, \
but found nothing" tag);
None)
in
if debug then eprintf "(OMD) 3408 BHTML loop\n%!";
match loop [] [] [] lexemes with
| Some(h, rest) ->
Some(T.md_of_interm_list h, rest)
| None -> None
in
begin match read_html() with
| Some(h, rest) ->
main_impl_rev ~html (h@r) [Tag("HTMLBLOCK", empty_extension)] rest
| None ->
let text = L.string_of_token t in
main_impl_rev ~html (Text(text ^ tagnametop)::r) [w] html_stuff
end
(* / end of block HTML. *)
(* inline HTML *)
| _,
(Lessthan as t)
::((Word(tagnametop) as w)
::((Space|Spaces _|Greaterthan|Greaterthans _)
::_ as html_stuff) as tlx) ->
if (strict_html && not(StringSet.mem tagnametop inline_htmltags_set))
|| not(blind_html || StringSet.mem tagnametop htmltags_set)
then
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tlx
| Some(r, p, l) -> main_impl_rev ~html r p l
end
else
let read_html() =
let module T = struct
type t =
| Awaiting of string
| Open of string
type interm =
| HTML of string * (string * string option) list * interm list
| TOKENS of L.t
| MD of Omd_representation.t
let rec md_of_interm_list = function
| [] -> []
| HTML(t, a, c)::tl ->
Html(t, a, md_of_interm_list(List.rev c))::md_of_interm_list tl
| MD md::tl -> md @ md_of_interm_list tl
| TOKENS t1::TOKENS t2::tl ->
md_of_interm_list (TOKENS(t1@t2)::tl)
| TOKENS t :: tl ->
main_impl ~html [] [Word ""] (t)
@ md_of_interm_list tl
let string_of_tagstatus tagstatus =
let b = Buffer.create 64 in
List.iter (function
| Open t -> bprintf b "{I/Open %s}" t
| Awaiting t -> bprintf b "{I/Awaiting %s}" t
) tagstatus;
Buffer.contents b
end in
let add_token_to_body x body =
T.TOKENS[x]::body
in
let rec loop (body:T.interm list) attrs tagstatus tokens =
if debug then
eprintf "(OMD) 3718 loop tagstatus=(%s) %s\n%!"
(* eprintf "(OMD) 3718 loop tagstatus=(%s) body=(%s) %s\n%!" *)
(T.string_of_tagstatus tagstatus)
(* (Omd_backend.sexpr_of_md(T.md_of_interm_list body)) *)
(L.destring_of_tokens tokens);
match tokens with
| [] ->
begin
match tagstatus with
| [] -> Some(body, tokens)
| T.Open(t)::_ when StringSet.mem t html_void_elements ->
Some(body, tokens)
| _ ->
if debug then
eprintf "(OMD) Not enough to read for inline HTML\n%!";
None
end
| Lessthans n::tokens ->
begin match tagstatus with
| T.Awaiting _ :: _ -> None
| _ ->
loop
(add_token_to_body
(if n = 0 then Lessthan else Lessthans(n-1))
body)
attrs tagstatus (Lessthan::tokens)
end
(* self-closing tags *)
| Slash::Greaterthan::tokens ->
begin match tagstatus with
| T.Awaiting(tagname)::tagstatus
when StringSet.mem tagname html_void_elements ->
loop [T.HTML(tagname, attrs, [])] [] tagstatus tokens
| _ ->
loop (T.TOKENS[Greaterthan;Slash]::body)
attrs tagstatus tokens
end
(* multiple newlines are not to be seen in inline HTML *)
| Newlines _ :: _ ->
if debug then eprintf "(OMD) Multiple lines in inline HTML\n%!";
(match tagstatus with
| [] -> Some(body, tokens)
| _ -> warn "multiple newlines in inline HTML"; None)
(* maybe code *)
| (Backquote | Backquotes _ as b)::tl ->
begin match tagstatus with
| T.Awaiting _ :: _ ->
if debug then
eprintf "(OMD) maybe code in inline HTML: no code\n%!";
None
| [] ->
if debug then
eprintf "(OMD) maybe code in inline HTML: none\n%!";
None
| T.Open _ :: _ ->
if debug then
eprintf "(OMD) maybe code in inline HTML: let's try\n%!";
begin match bcode [] [Space] tokens with
| Some (((Code _::_) as c), p, l) ->
if debug then
eprintf "(OMD) maybe code in inline HTML: \
confirmed\n%!";
loop (T.MD c::body) [] tagstatus l
| _ ->
if debug then
eprintf "(OMD) maybe code in inline HTML: failed\n%!";
loop (T.TOKENS[b]::body) [] tagstatus tl
end
end
(* closing the tag *)
| Lessthan::Slash::(Word(tagname) as w)
::(Greaterthan|Greaterthans _ as g)::tokens ->
begin match tagstatus with
| T.Open t :: _ when t = tagname ->
if debug then
eprintf "(OMD) 4136 properly closing %S tokens=%s\n%!"
t (L.string_of_tokens tokens);
Some(body,
(match g with
| Greaterthans 0 -> Greaterthan :: tokens
| Greaterthans n -> Greaterthans(n-1) :: tokens
| _ -> tokens))
| T.Open t :: _ ->
if debug then
eprintf "(OMD) 4144 \
wrongly closing %S with %S 1\n%!" t tagname;
loop (T.TOKENS[g;w;Slash;Lessthan]::body) [] tagstatus tokens
| T.Awaiting t :: _ ->
if debug then
eprintf "(OMD) 4149 \
wrongly closing %S with %S 2\n%!" t tagname;
None
| [] ->
if debug then
eprintf "(OMD) 4154 \
wrongly closing nothing with %S 3\n%!"
tagname;
None
end
(* tag *)
| Lessthan::(Word(tagname) as word)::tokens
when
blind_html
|| (strict_html && StringSet.mem tagname inline_htmltags_set)
|| (not strict_html && StringSet.mem tagname htmltags_set)
->
if debug then eprintf "(OMD) <%s...\n%!" tagname;
begin match tagstatus with
| T.Open(t) :: _
when t <> tagname && StringSet.mem t html_void_elements ->
None
| T.Awaiting _ :: _ -> None
| _ ->
begin
if debug then
eprintf "(OMD) 3796 tag %s, attrs=[]\n%!" tagname;
match loop [] [] (T.Awaiting tagname::tagstatus) tokens
with
| None ->
loop (T.TOKENS[word;Lessthan]::body)
attrs tagstatus tokens
| Some(b,tokens) ->
Some(b@body, tokens)
end
end
(* end of opening tag *)
| Greaterthan::tokens ->
if debug then
eprintf "(OMD) 4185 end of opening tag tokens=%s \
tagstatus=%s\n%!"
(L.string_of_tokens tokens)
(T.string_of_tagstatus tagstatus);
begin match tagstatus with
| T.Awaiting t :: tagstatus as ts ->
begin match loop body [] (T.Open t::tagstatus) tokens with
| None ->
if debug then
eprintf "(OMD) 4186 \
Couldn't find an closing tag for %S\n%!"
t;
None
| Some(b, tokens) ->
if debug then
eprintf
"(OMD) 4192 Found a closing tag %s ts=%s \
tokens=%s\n%!"
t
(T.string_of_tagstatus ts)
(L.string_of_tokens tokens);
match tagstatus with
| [] ->
Some(T.HTML(t, attrs, b)::body, tokens)
| _ ->
(* Note: we don't care about the value of
[attrs] here because in we have a
[tagstatus] matches [T.Open _ :: _] and
there's a corresponding filter that will
take care of attrs that will take care of
it. *)
loop (T.HTML(t, attrs, b)::body) [] tagstatus tokens
end
| T.Open t :: _ ->
if debug then
eprintf
"(OMD) Turns out an `>` isn't for an opening tag\n%!";
loop (T.TOKENS[Greaterthan]::body) attrs tagstatus tokens
| [] ->
if debug then
eprintf "(OMD) 4202 tagstatus=[]\n%!";
None
end
(* maybe attribute *)
| (Colon|Colons _|Underscore|Underscores _|Word _ as t)::tokens
| (Space|Spaces _)
::(Colon|Colons _|Underscore|Underscores _|Word _ as t)
::tokens
when (match tagstatus with
| T.Awaiting _ :: _ -> true
| _ -> false) ->
begin
let module Attribute_value = struct
type t = Empty of name | Named of name | Void
and name = string
end in
let open Attribute_value in
let rec extract_attribute accu = function
| (Space | Spaces _ | Newline) :: tokens->
Empty(L.string_of_tokens(List.rev accu)), tokens
| (Greaterthan|Greaterthans _) :: _ as tokens->
Empty(L.string_of_tokens(List.rev accu)), tokens
| Equal :: tokens ->
Named(L.string_of_tokens(List.rev accu)), tokens
| Colon | Colons _ | Underscore | Underscores _ | Word _
| Number _ | Minus | Minuss _ | Dot | Dots _ as t :: tokens ->
extract_attribute (t::accu) tokens
| tokens -> Void, tokens
in
match extract_attribute [t] tokens with
| Empty attributename, tokens ->
(* attribute with no explicit value *)
loop body ((attributename, None)::attrs) tagstatus tokens
| Named attributename, tokens ->
begin match tokens with
| Quotes 0 :: tokens ->
if debug then
eprintf "(OMD) (IHTML) empty attribute 1 %S\n%!"
(L.string_of_tokens tokens);
loop body ((attributename, Some "")::attrs) tagstatus tokens
| Quote :: tokens ->
begin
if debug then
eprintf "(OMD) (IHTML) non empty attribute 1 %S\n%!"
(L.string_of_tokens tokens);
match
fsplit
~excl:(function
| Quotes _ :: _ -> true
| _ -> false)
~f:(function
| Quote::tl -> Split([], tl)
| _ -> Continue)
tokens
with
| None -> None
| Some(at_val, tokens) ->
loop body ((attributename,
Some(L.string_of_tokens at_val))
::attrs) tagstatus tokens
end
| Doublequotes 0 :: tokens ->
begin
if debug then
eprintf "(OMD) (IHTML) empty attribute 2 %S\n%!"
(L.string_of_tokens tokens);
loop body ((attributename, Some "")::attrs) tagstatus tokens
end
| Doublequote :: tokens ->
begin
if debug then
eprintf "(OMD) (IHTML) non empty attribute 2 %S\n%!"
(L.string_of_tokens tokens);
match fsplit
~excl:(function
| Doublequotes _ :: _ -> true
| _ -> false)
~f:(function
| Doublequote::tl -> Split([], tl)
| _ -> Continue)
tokens
with
| None -> None
| Some(at_val, tokens) ->
if debug then
eprintf "(OMD) (3957) %s=%S %s\n%!" attributename
(L.string_of_tokens at_val)
(L.destring_of_tokens tokens);
loop body ((attributename,
Some(L.string_of_tokens at_val))
::attrs) tagstatus tokens
end
| _ -> None
end
| Void, _ -> None
end
| Backslash::x::tokens
when (match tagstatus with T.Open _ :: _ -> true | _ -> false) ->
loop (T.TOKENS[Backslash;x]::body) attrs tagstatus tokens
| Backslashs(n)::x::tokens
when (match tagstatus with T.Open _ :: _ -> true | _ -> false)
&& n mod 2 = 1 ->
loop (T.TOKENS[Backslashs(n);x]::body) attrs tagstatus tokens
| x::tokens
when (match tagstatus with T.Open _ :: _ -> true | _ -> false) ->
begin
if debug then
eprintf "(OMD) (4161) general %S\n%!"
(L.string_of_tokens (x::tokens));
loop (T.TOKENS[x]::body) attrs tagstatus tokens
end
| (Newline | Space | Spaces _) :: tokens
when
(match tagstatus with T.Awaiting _ :: _ -> true | _ -> false) ->
begin
if debug then eprintf "(OMD) (4289) spaces\n%!";
loop body attrs tagstatus tokens
end
| _ ->
if debug then
eprintf "(OMD) (4294) \
fallback with tokens=%s and tagstatus=%s\n%!"
(L.destring_of_tokens tokens)
(T.string_of_tagstatus tagstatus);
(match tagstatus with
| [] -> Some(body, tokens)
| T.Awaiting tag :: _ ->
warn (sprintf "expected to read an open HTML tag (%s), \
but found nothing" tag);
None
| T.Open tag :: _ ->
warn (sprintf "expected to find the closing HTML tag for %s, \
but found nothing" tag);
None)
in match loop [] [] [] lexemes with
| Some(html, rest) ->
Some(T.md_of_interm_list html, rest)
| None -> None
in
begin match read_html() with
| Some(h, rest) ->
main_impl_rev ~html (h@r) [Greaterthan] rest
| None ->
let text = L.string_of_token t in
main_impl_rev ~html (Text(text ^ tagnametop)::r) [w] html_stuff
end
(* / end of inline HTML. *)
(* < : emails *)
| _, (Lessthan as t)::tl ->
begin match maybe_autoemail r previous lexemes with
| Some(r,p,l) -> main_impl_rev ~html r p l
| None ->
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
end
(* line breaks *)
| _, Newline::tl ->
main_impl_rev ~html (NL::r) [Newline] tl
| _, Newlines _::tl ->
main_impl_rev ~html (NL::NL::r) [Newline] tl
(* [ *)
| _, (Obracket as t)::tl ->
begin match maybe_link main_loop r previous tl with
| Some(r, p, l) -> main_impl_rev ~html r p l
| None ->
match maybe_reference main_loop rc r previous tl with
| Some(r, p, l) -> main_impl_rev ~html r p l
| None ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
end
(* img *)
| _, (Exclamation|Exclamations _ as t)
::Obracket::Cbracket::Oparenthesis::tl ->
(* image insertion with no "alt" *)
(*  *)
(try
begin
let b, tl = read_until_cparenth ~bq:true ~no_nl:false tl in
(* new lines there are allowed *)
let r (* updated result *) = match t with
| Exclamations 0 -> Text "!" :: r
| Exclamations n -> Text(String.make (n+1) '!') :: r
| _ -> r in
match
try Some(read_until_space ~bq:false ~no_nl:true b)
with Premature_ending -> None
with
| Some(url, tls) ->
let title, should_be_empty_list =
read_until_dq ~bq:true (snd (read_until_dq ~bq:true tls)) in
let url = L.string_of_tokens url in
let title = L.string_of_tokens title in
main_impl_rev ~html (Img("", url, title) :: r) [Cparenthesis] tl
| None ->
let url = L.string_of_tokens b in
main_impl_rev ~html (Img("", url, "") :: r) [Cparenthesis] tl
end
with
| NL_exception ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
)
(* img ref *)
| _, (Exclamation as t)
::Obracket::Cbracket::Obracket::tl ->
(* ref image insertion with no "alt" *)
(* ![][ref] *)
(try
let id, tl = read_until_cbracket ~bq:true ~no_nl:true tl in
let fallback = extract_fallback main_loop tl lexemes in
let id = L.string_of_tokens id in
main_impl_rev ~html (Img_ref(rc, id, "", fallback) :: r) [Cbracket] tl
with NL_exception ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
)
(* img *)
| _, (Exclamation|Exclamations _ as t)::Obracket::tl ->
(* image insertion with "alt" *)
(*  *)
(try
match read_until_cbracket ~bq:true tl with
| alt, Oparenthesis::ntl ->
(try
let alt = L.string_of_tokens alt in
let path_title, rest =
read_until_cparenth ~bq:true ~no_nl:false ntl in
let path, title =
try
read_until_space ~bq:true ~no_nl:true path_title
with Premature_ending -> path_title, [] in
let title, nothing =
if title <> [] then
read_until_dq ~bq:true (snd(read_until_dq ~bq:true title))
else [], [] in
if nothing <> [] then
raise NL_exception; (* caught right below *)
let r =
match t with
| Exclamations 0 -> Text "!" :: r
| Exclamations n -> Text(String.make (n+1) '!') :: r
| _ -> r in
let path = L.string_of_tokens path in
let title = L.string_of_tokens title in
main_impl_rev ~html (Img(alt, path, title) :: r) [Cparenthesis] rest
with
| NL_exception
(* if NL_exception was raised, then fall back to "text" *)
| Premature_ending ->
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
)
| alt, Obracket::Word(id)::Cbracket::ntl
| alt, Obracket::(Space|Spaces _)::Word(id)::Cbracket::ntl
| alt, Obracket::(Space|Spaces _)::Word(id)::(Space|Spaces _)
::Cbracket::ntl
| alt, Obracket::Word(id)::(Space|Spaces _)::Cbracket::ntl ->
let fallback = extract_fallback main_loop ntl lexemes in
let alt = L.string_of_tokens alt in
main_impl_rev ~html (Img_ref(rc, id, alt, fallback)::r) [Cbracket] ntl
| alt, Obracket::((Newline|Space|Spaces _|Word _|Number _)::_
as ntl) ->
(try
match read_until_cbracket ~bq:true ~no_nl:false ntl with
| [], rest -> raise Premature_ending
| id, rest ->
let fallback = extract_fallback main_loop rest lexemes in
let id = L.string_of_tokens id in
let alt = L.string_of_tokens alt in
main_impl_rev ~html (Img_ref(rc, id, alt, fallback)::r)
[Cbracket]
rest
with
| Premature_ending
| NL_exception ->
begin match maybe_extension extensions r previous lexemes with
| None ->
main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
)
| _ ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
with
| Premature_ending ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
)
| _,
(At|Bar|Caret|Cbrace|Colon|Comma|Cparenthesis|Cbracket|Dollar
|Dot|Doublequote|Exclamation|Equal|Minus|Obrace|Oparenthesis
|Percent|Plus|Question|Quote|Semicolon|Slash|Tab|Tilde
|Greaterthan as t)::tl
->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
| _, (Number _ as t):: tl ->
begin match maybe_extension extensions r previous lexemes with
| None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl
| Some(r, p, l) -> main_impl_rev ~html r p l
end
| _, (Ats _ | Bars _ | Carets _ | Cbraces _ | Cbrackets _ | Colons _
| Commas _ | Cparenthesiss _ | Dollars _ | Dots _ | Doublequotes _
| Equals _ | Exclamations _ | Greaterthans _ | Lessthans _
| Minuss _ | Obraces _ | Obrackets _ | Oparenthesiss _
| Percents _ | Pluss _ | Questions _ | Quotes _ | Semicolons _
| Slashs _ | Stars _ | Tabs _ | Tildes _ | Underscores _ as tk)
:: tl ->
begin match maybe_extension extensions r previous lexemes with
| None ->
let tk0, tks = L.split_first tk in
let text = L.string_of_token tk0 in
main_impl_rev ~html (Text text :: r) [tk0] (tks :: tl)
| Some(r, p, l) ->
main_impl_rev ~html r p l
end
and main_impl ~html (r:r) (previous:p) (lexemes:l) =
(* if debug then eprintf "(OMD) main_impl html=%b\n%!" html; *)
assert_well_formed lexemes;
List.rev (main_loop_rev ~html r previous lexemes)
and main_loop ?(html=false) (r:r) (previous:p) (lexemes:l) =
main_impl ~html r previous lexemes
and main_loop_rev ?(html=false) (r:r) (previous:p) (lexemes:l) =
main_impl_rev ~html r previous lexemes
let main_parse lexemes =
main_loop [] [] (tag_setext main_loop lexemes)
let parse lexemes =
main_parse lexemes
end
let default_parse ?(extensions=[]) ?(default_lang="") lexemes =
let e = extensions and d = default_lang in
let module E = Default_env(Unit) in
let module M =
Make(struct
include E
let extensions = e
let default_lang = d
end)
in
M.main_parse lexemes
omd-1.3.2/src/omd_parser.mli 0000664 0000000 0000000 00000035064 14257632064 0015742 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
(** Beware: the functions in this module may raise exceptions! If you
use them, you should be careful. *)
type r = Omd_representation.t
(** accumulator (beware, reversed tokens) *)
and p = Omd_representation.tok list
(** context information: previous elements *)
and l = Omd_representation.tok list
(** tokens to parse *)
and main_loop =
?html:bool ->
r -> (* accumulator (beware, reversed tokens) *)
p -> (* info: previous elements *)
l -> (* tokens to parse *)
Omd_representation.t (* final result *)
(** most important loop, which has to be given as an argument *)
val default_parse :
?extensions:Omd_representation.extensions -> ?default_lang:string -> l
-> Omd_representation.t
(** Translate tokens to Markdown representation.
@param lang language for blocks of code where it was not specified.
Default: [""].
*)
module type Env =
sig
val rc: Omd_representation.ref_container
(** reference container *)
val extensions : Omd_representation.extensions
(** list of parser extensions *)
val default_lang : string
(** default language for code blocks *)
val gh_uemph_or_bold_style : bool
(** flag: bold/emph using using underscores is by default
github-style, which means that underscores inside words are
left as underscore, rather than special characters, because
it's more convenient. However it is also less expressive
because then you can't bold/emph a part of a word. You might
want to set this flag to false. *)
val blind_html : bool
(** flag: if true, will not check whether a used HTML tag actually
exists in HTML. *)
val strict_html : bool
(** flag: if true, will only accept known inline HTML tags in inline HTML. *)
val warning : bool
(** flag: if true, will output warnings *)
val warn_error : bool
(** flag: if true, will convert warnings to errors *)
end
module Default_env : functor (Unit: sig end) -> Env
module Make : functor (Env : Env) ->
sig
val rc: Omd_representation.ref_container
(** reference container *)
val extensions : Omd_representation.extensions
(** list of parser extensions *)
val default_lang : string
(** default language for code blocks *)
val gh_uemph_or_bold_style : bool
(** flag: bold/emph using using underscores is by default
github-style, which means that underscores inside words are
left as underscore, rather than special characters, because
it's more convenient. However it is also less expressive
because then you can't bold/emph a part of a word. You might
want to set this flag to false. *)
val blind_html : bool
(** flag: if true, will not check whether a used HTML tag actually
exists in HTML. *)
val strict_html : bool
(** flag: if true, will only accept known inline HTML tags in inline HTML. *)
val htmlcodes_set : Omd_utils.StringSet.t
(** set of known HTML codes *)
val inline_htmltags_set : Omd_utils.StringSet.t
(** set of known inline HTML tags *)
val htmltags_set : Omd_utils.StringSet.t
(** All known HTML tags *)
val unindent_rev :
int ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
(** [unindent_rev n l] returns the same couple as [unindent n l]
except that the first element (which is a list) is reversed.
This function is used for lists. *)
val unindent :
int ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
(** [unindent n l] returns [(unindented, rest)] where [unindented] is
the consecutive lines of [l] that are indented with at least [n]
spaces, and de-indented by [n] spaces. If [l] starts with a line
that is indented by less than [n] spaces, then it returns [([], l)].
(* This function is used for lists, so it does not require [n] *)
(* spaces on every single line, but only on some specific ones of them. *)
This function is used for lists and blockquotes.
*)
(* val unindent_strict_rev : *)
(* int -> *)
(* Omd_representation.tok list -> *)
(* Omd_representation.tok list * Omd_representation.tok list *)
(* (\** [unindent_strict_rev n l] returns the same couple as [unindent n l] *)
(* except that the first element (which is a list) is reversed. *)
(* This function is used for blockquotes. *\) *)
(* val unindent_strict : *)
(* int -> *)
(* Omd_representation.tok list -> *)
(* Omd_representation.tok list * Omd_representation.tok list *)
(* (\** [unindent_strict n l] returns [(unindented, rest)] where [unindented] is *)
(* the consecutive lines of [l] that are indented with at least [n] *)
(* spaces, and de-indented by [n] spaces. If [l] starts with a line *)
(* that is indented by less than [n] spaces, then it returns [([], l)]. *)
(* This function is used for blockquotes. *)
(* *\) *)
val is_blank : Omd_representation.tok list -> bool
(** [is_blank l] returns [true] if [l] only contains blanks, which are
spaces and newlines. *)
val semph_or_bold :
int ->
Omd_representation.tok list ->
(Omd_representation.tok list * Omd_representation.tok list) option
(** [semph_or_bold n l] returns [None] if [l] doesn't start with
a bold/emph phrase (marked using stars), else it returns [Some(x,y)]
where [x] is the emph and/or bold phrase at the beginning of [l]
and [y] is the rest of [l]. *)
val sm_uemph_or_bold :
int ->
Omd_representation.tok list ->
(Omd_representation.tok list * Omd_representation.tok list) option
(** [sm_uemph_or_bold n l] returns [None] if [l] doesn't start with
a bold/emph phrase (marked using underscores), else it returns [Some(x,y)]
where [x] is the emph and/or bold phrase at the beginning of [l]
and [y] is the rest of [l]. *)
val gh_uemph_or_bold :
int ->
Omd_representation.tok list ->
(Omd_representation.tok list * Omd_representation.tok list) option
(** [gh_uemph_or_bold n l] returns [None] if [l] doesn't start with
a bold/emph phrase (marked using underscores), else it returns [Some(x,y)]
where [x] is the emph and/or bold phrase at the beginning of [l]
and [y] is the rest of [l]. *)
val uemph_or_bold :
int ->
Omd_representation.tok list ->
(Omd_representation.tok list * Omd_representation.tok list) option
(** [uemph_or_bold n l] returns [None] if [l] doesn't start with a
bold/emph phrase (marked using underscores), else it returns
[Some(x,y)] where [x] is the emph and/or bold phrase at the
beginning of [l] and [y] is the rest of [l]. N.B. if
[!gh_uemph_or_bold_style] then in Github style (i.e., underscores
inside words are considered as underscores). *)
val eat_blank : Omd_representation.tok list -> Omd_representation.tok list
(** [eat_blank l] returns [l] where all blanks at the beginning of the
list have been removed (it stops removing as soon as it meets an element
that is not a blank). Blanks are spaces and newlines only. *)
val tag__maybe_h1 : main_loop -> Omd_representation.tok
(** [tag__maybe_h1 main_loop] is a tag that is injected everywhere that
might preceed a H1 title. It needs [main_loop] as argument because
it is used to parse the contents of the titles. *)
val tag__maybe_h2 : main_loop -> Omd_representation.tok
(** [tag__maybe_h2 main_loop] is the same as [tag__maybe_h1 main_loop]
but for H2. *)
val tag__md : Omd_representation.t -> Omd_representation.tok
(** [tag__md md] encapsulates [md] to make it a value of type [tok].
Its purpose is to inject some pre-parsed markdown (i.e., [md] of type [t])
in a yet-to-parse token stream of type [tok]. *)
val tag_setext :
main_loop -> Omd_representation.tok list -> Omd_representation.tok list
(** Tag used for the lines that *might* be titles using setext-style. *)
val hr_m : l -> l option
(** [hr_m l] returns [Some nl] where [nl] is the remaining of [l] if [l]
contains a horizontal rule "drawn" with dashes. If there's no HR, then
returns [None].*)
val hr_s : l -> l option
(** [hr_s l] is the same as [hr_m l] but for horizontal rules
"drawn" with stars instead. *)
exception NL_exception
exception Premature_ending
val read_until_gt :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_lt :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_cparenth :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_oparenth :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_dq :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_q :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_obracket :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_cbracket :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_space :
?bq:bool ->
?no_nl:bool ->
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
val read_until_newline :
Omd_representation.tok list ->
Omd_representation.tok list * Omd_representation.tok list
(** [read_until_...] are functions that read from a token list
and return two token lists: the first one is the tokens read
until a specific token is met, and the second one is the remainder.
The particularity of these functions is that they do consider
backslash-escaped characters and closing characters.
For instance, [read_until_gt "1 < 2 > 3 > 4"] returns
["1 < 2 > 3 ", " 4"]: note that the ">" before " 4" has disappeared
and that [read_until_gt] takes a [tok list] (not a string) and
returns a couple of [tok list] (not a couple of strings), the
string notation is used here for concision.
Until otherwise noted, those functions do *not* consider
backquote-trapped sections.
For instance, [read_until_gt "1 < 2 > 3 `>` 4"]
returns ["1 < 2 > 3 `", "` 4"].
If you use these functions, you should make sure that they
do what you think they do (i.e., do look at the code).
If the expected characters are not found, the exception
[Premature_ending] is raised. For instance,
[read_until_gt "1 < > 3"] raises [Premature_ending].
If [no_nl] is [true] (default value for [no_nl] is [false])
and ['\n'] occurs before the splitting character,
then [NL_exception] is raised.
*)
val read_title : main_loop -> int -> r -> p -> l -> (r * p * l) option
(** [read_title main_loop n r p l] returns [Some(r,p,l)]
if it succeeds, [None] otherwise.
[read_title main_loop n r p l] expects to read a [n]-level
hash-declared title from [l], where the hashes have *already*
been *removed*. If [n] is not between 1 and 6 (included), then
it returns [None].
[main_loop] is used to parse the contents of the title.
[r] and [p] are the classical "result" and "previous" parameters.
*)
val maybe_extension :
Omd_representation.extensions ->
r -> p -> l -> (r * p * l) option
(** [maybe_extension e r p l] returns [None] if there is no extension or
if extensions haven't had any effect, returns [Some(nr, np, nl)] if
at least one extension has applied successfully. *)
val emailstyle_quoting : main_loop -> r -> p -> l -> (r * p * l) option
(** [emailstyle_quoting main_loop r p l] returns [Some(r,p,l)] with
[r] being the updated result, [p] being the last parsed token
and [l] being the remaining tokens to parse. If [emailstyle_quoting]
fails, then it returns [None], in which case its user is advise
to investigate why it returns [None] because there's possibly a
real problem. *)
val maybe_reference :
main_loop ->
Omd_representation.ref_container -> r -> p -> l -> (r * p * l) option
(** [maybe_reference] tries to parse a reference, a reference definition or
a github-style short reference (e.g., [foo] as a shortcut for [foo][]),
and returns [Some(r,p,l)] if it succeeds, [None] otherwise. *)
val maybe_link : main_loop -> r -> p -> l -> (r * p * l) option
(** [maybe_link] tries to parse a link,
and returns [Some(r,p,l)] if it succeeds, [None] otherwise. *)
val parse_list : main_loop -> r -> p -> l -> r * p * l
(** [parse_list main_loop r p l] parses a list from [l].
***Important property***
It is considered in Omd that a sub-list is always more indented than
the item that contains it (so, 2 items with different indentations cannot
have the direct same parent).
*)
val make_paragraphs : Omd_representation.t -> Omd_representation.t
(** Since [Omd_parser.parse] doesn't build paragraphs, if you want
Markdown-style paragraphs, you need to apply this function to
the result of [Omd_parser.parse]. *)
val bcode :
?default_lang:Omd_representation.name ->
r -> p -> l -> (r * p * l) option
(** [bcode default_lang r p l]
tries to parse some code that's delimited by backquotes,
and returns [Some(r,p,l)] if it succeeds, [None] otherwise.
*)
val icode :
?default_lang:Omd_representation.name ->
r -> p -> l -> (r * p * l) option
(** [icode default_lang r p l]
tries to parse some code that's delimited by space indentation.
It should always return [Some(r,p,l)], if it returns [None]
it means that it's been misused or there's a bug.
*)
val main_loop_rev : ?html:bool -> r -> p -> l -> r
val main_loop : ?html:bool -> r -> p -> l -> Omd_representation.t
val main_parse : Omd_representation.tok list -> Omd_representation.t
val parse : Omd_representation.tok list -> Omd_representation.t
end
omd-1.3.2/src/omd_representation.ml 0000664 0000000 0000000 00000030230 14257632064 0017325 0 ustar 00root root 0000000 0000000 open Omd_utils
open Printf
(** references, instances created in [Omd_parser.main_parse] and
accessed in the [Omd_backend] module. *)
module R = Map.Make(String)
class ref_container : object
val mutable c : (string * string) R.t
method add_ref : R.key -> string -> string -> unit
method get_ref : R.key -> (string * string) option
method get_all : (string * (string * string)) list
end = object
val mutable c = R.empty
val mutable c2 = R.empty
method get_all = R.bindings c
method add_ref name title url =
c <- R.add name (url, title) c;
let ln = String.lowercase_ascii name in
if ln <> name then c2 <- R.add ln (url, title) c2
method get_ref name =
try
let (url, title) as r =
try R.find name c
with Not_found ->
let ln = String.lowercase_ascii name in
try R.find ln c
with Not_found ->
R.find ln c2
in Some r
with Not_found ->
None
end
type element =
| H1 of t
| H2 of t
| H3 of t
| H4 of t
| H5 of t
| H6 of t
| Paragraph of t
| Text of string
| Emph of t
| Bold of t
| Ul of t list
| Ol of t list
| Ulp of t list
| Olp of t list
| Code of name * string
| Code_block of name * string
| Br
| Hr
| NL
| Url of href * t * title
| Ref of ref_container * name * string * fallback
| Img_ref of ref_container * name * alt * fallback
| Html of name * (string * string option) list * t
| Html_block of name * (string * string option) list * t
| Html_comment of string
| Raw of string
| Raw_block of string
| Blockquote of t
| Img of alt * src * title
| X of
< name : string;
to_html : ?indent:int -> (t -> string) -> t -> string option;
to_sexpr : (t -> string) -> t -> string option;
to_t : t -> t option >
and fallback = < to_string : string ; to_t : t >
and name = string
and alt = string
and src = string
and href = string
and title = string
and t = element list
let rec loose_compare t1 t2 = match t1,t2 with
| H1 e1::tl1, H1 e2::tl2
| H2 e1::tl1, H2 e2::tl2
| H3 e1::tl1, H3 e2::tl2
| H4 e1::tl1, H4 e2::tl2
| H5 e1::tl1, H5 e2::tl2
| H6 e1::tl1, H6 e2::tl2
| Emph e1::tl1, Emph e2::tl2
| Bold e1::tl1, Bold e2::tl2
| Blockquote e1::tl1, Blockquote e2::tl2
| Paragraph e1::tl1, Paragraph e2::tl2
->
(match loose_compare e1 e2 with
| 0 -> loose_compare tl1 tl2
| i -> i)
| Ul e1::tl1, Ul e2::tl2
| Ol e1::tl1, Ol e2::tl2
| Ulp e1::tl1, Ulp e2::tl2
| Olp e1::tl1, Olp e2::tl2
->
(match loose_compare_lists e1 e2 with
| 0 -> loose_compare tl1 tl2
| i -> i)
| (Code _ as e1)::tl1, (Code _ as e2)::tl2
| (Br as e1)::tl1, (Br as e2)::tl2
| (Hr as e1)::tl1, (Hr as e2)::tl2
| (NL as e1)::tl1, (NL as e2)::tl2
| (Html _ as e1)::tl1, (Html _ as e2)::tl2
| (Html_block _ as e1)::tl1, (Html_block _ as e2)::tl2
| (Raw _ as e1)::tl1, (Raw _ as e2)::tl2
| (Raw_block _ as e1)::tl1, (Raw_block _ as e2)::tl2
| (Html_comment _ as e1)::tl1, (Html_comment _ as e2)::tl2
| (Img _ as e1)::tl1, (Img _ as e2)::tl2
| (Text _ as e1)::tl1, (Text _ as e2)::tl2
->
(match compare e1 e2 with
| 0 -> loose_compare tl1 tl2
| i -> i)
| Code_block(l1,c1)::tl1, Code_block(l2,c2)::tl2
->
(match compare l1 l2, String.length c1 - String.length c2 with
| 0, 0 ->
(match compare c1 c2 with
| 0 -> loose_compare tl1 tl2
| i -> i)
| 0, 1 ->
(match compare c1 (c2^"\n") with
| 0 -> loose_compare tl1 tl2
| i -> i)
| 0, -1 ->
(match compare (c1^"\n") c2 with
| 0 -> loose_compare tl1 tl2
| i -> i)
| i, _ -> i
)
| Url (href1, t1, title1)::tl1, Url (href2, t2, title2)::tl2
->
(match compare href1 href2 with
| 0 -> (match loose_compare t1 t2 with
| 0 -> (match compare title1 title2 with
| 0 -> loose_compare tl1 tl2
| i -> i)
| i -> i)
| i -> i)
| Ref (ref_container1, name1, x1, fallback1)::tl1,
Ref (ref_container2, name2, x2, fallback2)::tl2
| Img_ref (ref_container1, name1, x1, fallback1)::tl1,
Img_ref (ref_container2, name2, x2, fallback2)::tl2
->
(match compare (name1, x1) (name2, x2) with
| 0 ->
let cff =
if fallback1#to_string = fallback2#to_string then
0
else
loose_compare (fallback1#to_t) (fallback2#to_t)
in
if cff = 0 then
match
compare (ref_container1#get_all) (ref_container2#get_all)
with
| 0 -> loose_compare tl1 tl2
| i -> i
else
cff
| i -> i)
| X e1::tl1, X e2::tl2 ->
(match compare (e1#name) (e2#name) with
| 0 -> (match compare (e1#to_t) (e2#to_t) with
| 0 -> loose_compare tl1 tl2
| i -> i)
| i -> i)
| X _::_, _ -> 1
| _, X _::_ -> -1
| _ -> compare t1 t2
and loose_compare_lists l1 l2 =
match l1, l2 with
| [], [] -> 0
| e1::tl1, e2::tl2 ->
(match loose_compare e1 e2 with
| 0 -> loose_compare_lists tl1 tl2
| i -> i)
| _, [] -> 1
| _ -> -1
type tok = (* Cs(n) means (n+2) times C *)
| Ampersand
| Ampersands of int
| At
| Ats of int
| Backquote
| Backquotes of int
| Backslash
| Backslashs of int
| Bar
| Bars of int
| Caret
| Carets of int
| Cbrace
| Cbraces of int
| Colon
| Colons of int
| Comma
| Commas of int
| Cparenthesis
| Cparenthesiss of int
| Cbracket
| Cbrackets of int
| Dollar
| Dollars of int
| Dot
| Dots of int
| Doublequote
| Doublequotes of int
| Exclamation
| Exclamations of int
| Equal
| Equals of int
| Greaterthan
| Greaterthans of int
| Hash
| Hashs of int
| Lessthan
| Lessthans of int
| Minus
| Minuss of int
| Newline
| Newlines of int
| Number of string
| Obrace
| Obraces of int
| Oparenthesis
| Oparenthesiss of int
| Obracket
| Obrackets of int
| Percent
| Percents of int
| Plus
| Pluss of int
| Question
| Questions of int
| Quote
| Quotes of int
| Semicolon
| Semicolons of int
| Slash
| Slashs of int
| Space
| Spaces of int
| Star
| Stars of int
| Tab
| Tabs of int
| Tilde
| Tildes of int
| Underscore
| Underscores of int
| Word of string
| Tag of name * extension
and extension = <
parser_extension :
t -> tok list -> tok list -> ((t * tok list * tok list) option);
to_string : string
>
type extensions = extension list
let empty_extension = object
method parser_extension r p l = None
method to_string = ""
end
let rec normalise_md l =
if debug then
eprintf "(OMD) normalise_md\n%!";
let rec loop = function
| [NL;NL;NL;NL;NL;NL;NL;]
| [NL;NL;NL;NL;NL;NL;]
| [NL;NL;NL;NL;NL;]
| [NL;NL;NL;NL;]
| [NL;NL;NL;]
| [NL;NL]
| [NL] -> []
| [] -> []
| NL::NL::NL::tl -> loop (NL::NL::tl)
| Text t1::Text t2::tl -> loop (Text(t1^t2)::tl)
| NL::(((Paragraph _|H1 _|H2 _|H3 _|H4 _|H5 _|H6 _
|Code_block _|Ol _|Ul _|Olp _|Ulp _)::_) as tl) -> loop tl
| Paragraph[Text " "]::tl -> loop tl
| Paragraph[]::tl -> loop tl
| Paragraph(p)::tl -> Paragraph(loop p)::loop tl
| H1 v::tl -> H1(loop v)::loop tl
| H2 v::tl -> H2(loop v)::loop tl
| H3 v::tl -> H3(loop v)::loop tl
| H4 v::tl -> H4(loop v)::loop tl
| H5 v::tl -> H5(loop v)::loop tl
| H6 v::tl -> H6(loop v)::loop tl
| Emph v::tl -> Emph(loop v)::loop tl
| Bold v::tl -> Bold(loop v)::loop tl
| Ul v::tl -> Ul(List.map loop v)::loop tl
| Ol v::tl -> Ol(List.map loop v)::loop tl
| Ulp v::tl -> Ulp(List.map loop v)::loop tl
| Olp v::tl -> Olp(List.map loop v)::loop tl
| Blockquote v::tl -> Blockquote(loop v)::loop tl
| Url(href,v,title)::tl -> Url(href,(loop v),title)::loop tl
| Text _
| Code _
| Code_block _
| Br
| Hr
| NL
| Ref _
| Img_ref _
| Html _
| Html_block _
| Html_comment _
| Raw _
| Raw_block _
| Img _
| X _ as v::tl -> v::loop tl
in
let a = loop l in
let b = loop a in
if a = b then
a
else
normalise_md b
let dummy_X =
X (object
method name = "dummy"
method to_html ?(indent=0) _ _ = None
method to_sexpr _ _ = None
method to_t _ = None
end)
let rec visit f = function
| [] -> []
| Paragraph v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Paragraph(visit f v)::visit f tl
end
| H1 v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> H1(visit f v)::visit f tl
end
| H2 v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> H2(visit f v)::visit f tl
end
| H3 v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> H3(visit f v)::visit f tl
end
| H4 v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> H4(visit f v)::visit f tl
end
| H5 v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> H5(visit f v)::visit f tl
end
| H6 v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> H6(visit f v)::visit f tl
end
| Emph v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Emph(visit f v)::visit f tl
end
| Bold v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Bold(visit f v)::visit f tl
end
| Ul v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Ul(List.map (visit f) v)::visit f tl
end
| Ol v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Ol(List.map (visit f) v)::visit f tl
end
| Ulp v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Ulp(List.map (visit f) v)::visit f tl
end
| Olp v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Olp(List.map (visit f) v)::visit f tl
end
| Blockquote v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Blockquote(visit f v)::visit f tl
end
| Url(href,v,title) as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Url(href,visit f v,title)::visit f tl
end
| Text v as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Code _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Code_block _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Ref _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Img_ref _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Html _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Html_block _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Html_comment _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Raw _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Raw_block _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Img _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| X _ as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> e::visit f tl
end
| Br as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Br::visit f tl
end
| Hr as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> Hr::visit f tl
end
| NL as e::tl ->
begin match f e with
| Some(l) -> l@visit f tl
| None -> NL::visit f tl
end
omd-1.3.2/src/omd_representation.mli 0000664 0000000 0000000 00000013562 14257632064 0017507 0 ustar 00root root 0000000 0000000
module R : Map.S with type key = string
class ref_container :
object
val mutable c : (string * string) R.t
method add_ref : R.key -> string -> string -> unit
method get_ref : R.key -> (string * string) option
method get_all : (string * (string * string)) list
end
type element =
| H1 of t
| H2 of t
| H3 of t
| H4 of t
| H5 of t
| H6 of t
| Paragraph of t
| Text of string
| Emph of t
| Bold of t
| Ul of t list
| Ol of t list
| Ulp of t list
| Olp of t list
| Code of name * string
| Code_block of name * string
| Br
| Hr
| NL
| Url of href * t * title
| Ref of ref_container * name * string * fallback
| Img_ref of ref_container * name * alt * fallback
| Html of name * (string * string option) list * t
| Html_block of name * (string * string option) list * t
| Html_comment of string
| Raw of string
| Raw_block of string
| Blockquote of t
| Img of alt * src * title
| X of
< name : string;
to_html : ?indent:int -> (t -> string) -> t -> string option;
to_sexpr : (t -> string) -> t -> string option;
to_t : t -> t option >
and fallback = < to_string : string ; to_t : t >
and name = string
and alt = string
and src = string
and href = string
and title = string
and t = element list
type tok =
Ampersand (* one & *)
| Ampersands of int (* [Ampersands(n)] is (n+2) consecutive occurrences of & *)
| At (* @ *)
| Ats of int (* @@.. *)
| Backquote (* ` *)
| Backquotes of int (* ``.. *)
| Backslash (* \\ *)
| Backslashs of int (* \\\\.. *)
| Bar (* | *)
| Bars of int (* ||.. *)
| Caret (* ^ *)
| Carets of int (* ^^.. *)
| Cbrace (* } *)
| Cbraces of int (* }}.. *)
| Colon (* : *)
| Colons of int (* ::.. *)
| Comma (* , *)
| Commas of int (* ,,.. *)
| Cparenthesis (* ) *)
| Cparenthesiss of int (* )).. *)
| Cbracket (* ] *)
| Cbrackets of int (* ]].. *)
| Dollar (* $ *)
| Dollars of int (* $$.. *)
| Dot (* . *)
| Dots of int (* .... *)
| Doublequote (* \034 *)
| Doublequotes of int (* \034\034.. *)
| Exclamation (* ! *)
| Exclamations of int (* !!.. *)
| Equal (* = *)
| Equals of int (* ==.. *)
| Greaterthan (* > *)
| Greaterthans of int (* >>.. *)
| Hash (* # *)
| Hashs of int (* ##.. *)
| Lessthan (* < *)
| Lessthans of int (* <<.. *)
| Minus (* - *)
| Minuss of int (* --.. *)
| Newline (* \n *)
| Newlines of int (* \n\n.. *)
| Number of string
| Obrace (* { *)
| Obraces of int (* {{.. *)
| Oparenthesis (* ( *)
| Oparenthesiss of int (* ((.. *)
| Obracket (* [ *)
| Obrackets of int (* [[.. *)
| Percent (* % *)
| Percents of int (* %%.. *)
| Plus (* + *)
| Pluss of int (* ++.. *)
| Question (* ? *)
| Questions of int (* ??.. *)
| Quote (* ' *)
| Quotes of int (* ''.. *)
| Semicolon (* ; *)
| Semicolons of int (* ;;.. *)
| Slash (* / *)
| Slashs of int (* //.. *)
| Space (* *)
| Spaces of int (* .. *)
| Star (* * *)
| Stars of int (* **.. *)
| Tab (* \t *)
| Tabs of int (* \t\t.. *)
| Tilde (* ~ *)
| Tildes of int (* ~~.. *)
| Underscore (* _ *)
| Underscores of int (* __.. *)
| Word of string
| Tag of name * extension
(** Lexer's tokens. If you want to use the parser with an extended
lexer, you may use the constructor [Tag] to implement
the parser's extension. In the parser, [Tag] is used (at least)
3 times in order to represent metadata or to store data.
The integers carried by constructors means that the represented
character appears (n+2) times. So, [Ampersand(0)] is "&&".
Notably, this allows to use the property that in the match
case [Ampersand _ ->], we know there are at least 2 ampersands.
This is particularly useful for some characters, such as newlines
and spaces. It's not useful for all of them indeed but it has
been designed this way for the sake of uniformity (one doesn't
want to know by heart which constructor have that "at least 2"
property and which haven't).
*)
and extension = <
parser_extension : t -> tok list -> tok list -> ((t * tok list * tok list) option);
to_string : string
>
(** - [parser_extension] is a method that takes the current state of the
parser's data and returns None if nothing has been changed,
otherwise it returns the new state. The current state of the
parser's data is [(r, p, l)] where [r] is the result so far, [p]
is the list of the previous tokens (it's typically empty or
contains information on how many newlines we've just seen), and
[l] is the remaining tokens to parse.
- and [to_string] is a method that returns directly a string
representation of the object (it's normal if it returns the
empty string). *)
type extensions = extension list
(** One must use this type to extend the parser. It's a list of
functions of type [extension]. They are processed in order (the
head is applied first), so be careful about it. If you use it
wrong, it will behave wrong. *)
val empty_extension : extension
(** An empty extension *)
val loose_compare : t -> t -> int
(** [loose_compare t1 t2] returns [0] if [t1] and [t2]
are equivalent, otherwise it returns another number. *)
val normalise_md : t -> t
(** [normalise_md md] returns a copy of [md] where some elements
have been factorized. *)
val visit : (element -> t option) -> t -> t
(** visitor for structures of type t: [visit f md] will return a new
potentially altered copy of [md] that has been created by the
visit of [md] by [f].
The function [f] takes each [element] (from [md]) and returns
[Some t] if it has effectively been applied to [element], and
[None] otherwise. When it returns [Some t], [t] replaces [element]
in the copy of [md], and when it returns [None], either [element]
is copied as it is in the copy of [md] or a visited version is
copied instead (well, that depends on if [element] has elements
inside of it or not).
*)
omd-1.3.2/src/omd_utils.ml 0000664 0000000 0000000 00000020515 14257632064 0015430 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013/2014 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
open Printf
let debug =
let _DEBUG =
try
Some(Sys.getenv "DEBUG")
with _ -> None
and _OMD_DEBUG =
try
Some(Sys.getenv "OMD_DEBUG")
with _ -> None
in
match _DEBUG, _OMD_DEBUG with
| _, Some "false" ->
false
| Some _, None ->
eprintf "omd: debug mode activated because DEBUG is set, \
you can deactivate the mode by unsetting DEBUG \
or by setting OMD_DEBUG to the string \"false\".\n%!";
true
| None, None ->
false
| _, Some _ ->
eprintf "omd: debug mode activated because OMD_DEBUG is set
to a value that isn't the string \"false\".\n%!";
true
exception Error of string
let warn ?(we=false) msg =
if we then
raise (Error msg)
else
eprintf "(OMD) Warning: %s\n%!" msg
let trackfix =
try
ignore(Sys.getenv "OMD_FIX");
eprintf "omd: tracking mode activated: token list are very often checked, \
it might take a *very* long time if your input is large.\n%!";
true
with Not_found ->
false
let _ = if debug then Printexc.record_backtrace true
let raise =
if debug then
(fun e ->
eprintf "(OMD) Exception raised: %s\n%!" (Printexc.to_string e);
raise e)
else
raise
module StringSet : sig
include Set.S with type elt = string
val of_list : elt list -> t
end = struct
include Set.Make(String)
let of_list l = List.fold_left (fun r e -> add e r) empty l
end
type 'a split = 'a list -> 'a split_action
and 'a split_action =
| Continue
| Continue_with of 'a list * 'a list
| Split of 'a list * 'a list
let fsplit_rev ?(excl=(fun _ -> false)) ~(f:'a split) l
: ('a list * 'a list) option =
let rec loop accu = function
| [] ->
begin
match f [] with
| Split(left, right) -> Some(left@accu, right)
| Continue_with(left, tl) -> loop (left@accu) tl
| Continue -> None
end
| e::tl as l ->
if excl l then
None
else match f l with
| Split(left, right) -> Some(left@accu, right)
| Continue_with(left, tl) -> loop (left@accu) tl
| Continue -> loop (e::accu) tl
in loop [] l
let fsplit ?(excl=(fun _ -> false)) ~f l =
match fsplit_rev ~excl:excl ~f:f l with
| None -> None
| Some(rev, l) -> Some(List.rev rev, l)
let id_of_string ids s =
let n = String.length s in
let out = Buffer.create 0 in
(* Put [s] into [b], replacing non-alphanumeric characters with dashes. *)
let rec loop started i =
if i = n then ()
else
match s.[i] with
| 'a' .. 'z' | 'A' .. 'Z' | '0' .. '9' as c ->
Buffer.add_char out c ;
loop true (i + 1)
(* Don't want to start with dashes. *)
| _ when not started ->
loop false (i + 1)
| _ ->
Buffer.add_char out '-' ;
loop false (i + 1)
in
loop false 0 ;
let s' = Buffer.contents out in
if s' = "" then ""
else
(* Find out the index of the last character in [s'] that isn't a dash. *)
let last_trailing =
let rec loop i =
if i < 0 || s'.[i] <> '-' then i
else loop (i - 1)
in
loop (String.length s' - 1)
in
(* Trim trailing dashes. *)
ids#mangle @@ String.sub s' 0 (last_trailing + 1)
(* only convert when "necessary" *)
let htmlentities ?(md=false) s =
let module Break = struct exception Break end in
let b = Buffer.create 64 in
let rec loop i =
if i = String.length s then
()
else
let () =
match s.[i] with
| ( '0' .. '9' | 'a' .. 'z' | 'A' .. 'Z' ) as c -> Buffer.add_char b c
| '"' -> Buffer.add_string b """
| '\'' -> Buffer.add_string b "'"
| '&' ->
if md then
begin
try
let () = match s.[i+1] with
| '#' ->
let rec ff j =
match s.[j] with
| '0' .. '9' -> ff (succ j)
| ';' -> ()
| _ -> raise Break.Break
in
ff (i+2)
| 'A' .. 'Z' | 'a' .. 'z' ->
let rec ff j =
match s.[j] with
| 'A' .. 'Z' | 'a' .. 'z' -> ff (succ j)
| ';' -> ()
| _ -> raise Break.Break
in
ff (i+2)
| _ -> raise Break.Break
in
Buffer.add_string b "&"
with _ -> Buffer.add_string b "&"
end
else
Buffer.add_string b "&"
| '<' -> Buffer.add_string b "<"
| '>' -> Buffer.add_string b ">"
| c -> Buffer.add_char b c
in loop (succ i)
in
loop 0;
Buffer.contents b
let minimalize_blanks s =
let l = String.length s in
let b = Buffer.create l in
let rec loop f i =
if i = l then
Buffer.contents b
else
match s.[i] with
| ' ' | '\t' | '\n' ->
loop true (succ i)
| c ->
if Buffer.length b > 0 && f then
Buffer.add_char b ' ';
loop false (succ i)
in loop false 0
let rec eat f = function
| [] -> []
| e::tl as l -> if f e then eat f tl else l
let rec extract_html_attributes (html:string) =
let rec cut_on_char_from s i c =
match String.index_from s i c with
| 0 -> "", String.sub s 1 (String.length s - 1)
| j -> String.sub s i (j-i), String.sub s (j+1) (String.length s - (j+1))
in
let remove_prefix_spaces s =
if s = "" then
s
else if s.[0] <> ' ' then
s
else
let rec loop i =
if i = String.length s then
String.sub s i (String.length s - i)
else
match s.[i] with
| ' ' -> loop (i+1)
| _ -> String.sub s i (String.length s - i)
in loop 1
in
let remove_suffix_spaces s =
if s = "" then
s
else if s.[String.length s - 1] <> ' ' then
s
else
let rec loop i =
match s.[i] with
| ' ' -> loop (i-1)
| _ -> String.sub s 0 (i+1)
in loop (String.length s - 1)
in
let rec loop s res i =
if i = String.length s then
res
else
match
try
Some (take_attribute s i)
with Not_found -> None
with
| Some (((_,_) as a), new_s) ->
loop new_s (a::res) 0
| None -> res
and take_attribute s i =
let name, after_eq = cut_on_char_from s i '=' in
let name = remove_suffix_spaces name in
let after_eq = remove_prefix_spaces after_eq in
let value, rest = cut_on_char_from after_eq 1 after_eq.[0] in
(name,value), remove_prefix_spaces rest
in
if (* Has it at least one attribute? *)
try String.index html '>' < String.index html ' '
with Not_found -> true
then
[]
else
match html.[1] with
| '<' | ' ' ->
extract_html_attributes
(remove_prefix_spaces (String.sub html 1 (String.length html - 1)))
| _ ->
try
let html = snd (cut_on_char_from html 0 ' ') in
loop (String.sub html 0 (String.index html '>')) [] 0
with Not_found -> []
let rec extract_inner_html (html:string) =
let rec cut_on_char_from s i c =
match String.index_from s i c with
| 0 -> "", String.sub s 1 (String.length s - 1)
| j -> String.sub s i (j-i), String.sub s (j+1) (String.length s - (j+1))
in
let rec rcut_on_char_from s i c =
match String.rindex_from s i c with
| 0 -> "", String.sub s 1 (String.length s - 1)
| j -> String.sub s 0 j, String.sub s (j+1) (String.length s - (j+1))
in
let _, p = cut_on_char_from html 0 '>' in
let r, _ = rcut_on_char_from p (String.length p - 1) '<' in
r
let html_void_elements = StringSet.of_list [
"img";
"input";
"link";
"meta";
"br";
"hr";
"source";
"wbr";
"param";
"embed";
"base";
"area";
"col";
"track";
"keygen";
]
let ( @ ) l1 l2 =
List.rev_append (List.rev l1) l2
omd-1.3.2/src/omd_utils.mli 0000664 0000000 0000000 00000010736 14257632064 0015605 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013/2014 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
val debug : bool
(** Equals [true] if the environment variable DEBUG is set,
or if the environment variable OMD_DEBUG is set to a string
that is not ["false"]. *)
val trackfix : bool
exception Error of string
val raise : exn -> 'a
(** Same as [Pervasives.raise] except if [debug] equals true,
in which case it prints a trace on stderr before raising the exception. *)
val warn : ?we:bool -> string -> unit
(** [warn we x] prints a warning with the message [x] if [we] is true,
else raises [Omd_utils.Error x]. *)
module StringSet :
sig
include Set.S with type elt = string
val of_list : elt list -> t
end
(** Set of [string]. Cf. documentation of {!Set.S} *)
type 'a split = 'a list -> 'a split_action
(** Type of a split function *)
and 'a split_action =
(** Don't split yet *)
| Continue
(** Don't split yet but continue with those two lists instead of default *)
| Continue_with of 'a list * 'a list
(** Do split with this split scheme *)
| Split of 'a list * 'a list
(** Type of a split action *)
val fsplit_rev :
?excl:('a list -> bool) ->
f:'a split -> 'a list -> ('a list * 'a list) option
(** [fsplit_rev ?excl ~f l] returns [Some(x,y)] where [x] is the
**reversed** list of the consecutive elements of [l] that obey the
split function [f].
Note that [f] is applied to a list of elements and not just an
element, so that [f] can look farther in the list when applied.
[f l] returns [Continue] if there're more elements to consume,
[Continue_with(left,right)] if there's more elements to consume
but we want to choose what goes to the left part and what remains
to process (right part), and returns [Split(left,right)] if
the splitting is decided.
When [f] is applied to an empty list, if it returns [Continue]
then the result will be [None].
If [excl] is given, then [excl] is applied before [f] is, to check
if the splitting should be stopped right away. When the split
fails, it returns [None]. *)
val fsplit :
?excl:('a list -> bool) ->
f:'a split -> 'a list -> ('a list * 'a list) option
(** [fsplit ?excl ~f l] returns [Some(List.rev x, y)]
if [fsplit ?excl ~f l] returns [Some(x,y)], else it returns [None]. *)
val id_of_string : < mangle : string -> string; .. > -> string -> string
(** [id_of_string ids id] returns a mangled version of [id], using the
method [ids#mangle]. If you don't need mangling, you may use
[object method mangle x = x end] for [ids]. However, the name
[ids] also means that your object should have knowledge of all IDs
it has issued, in order to avoid collision. This is why
[id_of_string] asks for an object rather than "just a
function". *)
val htmlentities : ?md:bool -> string -> string
(** [htmlentities s] returns a new string in which html-significant
characters have been converted to html entities. For instance,
"" is converted to "<Foo&Bar>". *)
val minimalize_blanks : string -> string
(** [minimalize_blanks s] returns a copy of [s] in which the first and last
characters are never blank, and two consecutive blanks never happen. *)
val eat : ('a -> bool) -> 'a list -> 'a list
(** [eat f l] returns [l] where elements satisfying [f] have been removed,
but it stops removing as soon as one element doesn't satisfy [f]. *)
val extract_html_attributes : string -> (string * string) list
(** Takes some HTML and returns the list of attributes of the first
HTML tag.
Notes:
* Doesn't check the validity of HTML tags or attributes.
* Doesn't support backslash escaping.
* Attribute names are delimited by the space and equal characters.
* Attribute values are either delimited by the double quote
or the simple quote character.
*)
val extract_inner_html : string -> string
(** Takes an HTML node and returns the contents of the node.
If it's not given a node, it returns something rubbish.
*)
val html_void_elements : StringSet.t
(** HTML void elements *)
val ( @ ) : 'a list -> 'a list -> 'a list
(** Tail-recursive version of [Pervasives.(@)]. *)
omd-1.3.2/src/omd_xtxt.ml 0000664 0000000 0000000 00000001410 14257632064 0015270 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
(* xtxt = eXTernal eXTension *)
(* let extensions = ref [] *)
(* let get () = *)
(* !extensions *)
(* let register e = *)
(* extensions := e :: !extensions *)
(* let set es = extensions := es *)
(* let activate ... *)
(* (\* let deactivate ... *\) *)
(* priority (integer?) *)
(* pre-extension *)
(* post-extension *)
omd-1.3.2/src/omd_xtxt.mli 0000664 0000000 0000000 00000000737 14257632064 0015454 0 ustar 00root root 0000000 0000000 (***********************************************************************)
(* omd: Markdown frontend in OCaml *)
(* (c) 2013 by Philippe Wang *)
(* Licence : ISC *)
(* http://www.isc.org/downloads/software-support-policy/isc-license/ *)
(***********************************************************************)
(** xtxt = eXTernal eXTension *)
omd-1.3.2/tests/ 0000775 0000000 0000000 00000000000 14257632064 0013447 5 ustar 00root root 0000000 0000000 John_MacFarlane_said_peg-markdown_takes_forever_to_process_this--jgm_peg-markdown_issues_28.md 0000664 0000000 0000000 00000000265 14257632064 0035754 0 ustar 00root root 0000000 0000000 omd-1.3.2/tests ***************************************[[[[[[[[[[[[[[[[[[[[-----------------]]]]]]]]]]]]]]]]]***************************
[[[[[[[[[[[[[[[[[[[[-----------------]]]]]]]]]]]]]]]]]: ::
omd-1.3.2/tests/cow/ 0000775 0000000 0000000 00000000000 14257632064 0014237 5 ustar 00root root 0000000 0000000 omd-1.3.2/tests/cow/anchors-by-reference.html 0000664 0000000 0000000 00000000634 14257632064 0021131 0 ustar 00root root 0000000 0000000
This is an example reference-style link.
This is another reference-style link.
This is a third reference-style link.
This is a fourth reference-style link.
omd-1.3.2/tests/cow/anchors-by-reference.md 0000664 0000000 0000000 00000000611 14257632064 0020560 0 ustar 00root root 0000000 0000000
This is [an example][id] reference-style link.
This is [another] [foo] reference-style link.
This is [a third][bar] reference-style link.
This is [a fourth][4] reference-style link.
[id]: http://example.com/ "Optional Title Here"
[foo]: http://example.com/ (Optional Title Here)
[bar]: http://example.com/ (Optional Title Here)
[4]:
"Optional Title Here" omd-1.3.2/tests/cow/automatic-anchors.html 0000664 0000000 0000000 00000000075 14257632064 0020550 0 ustar 00root root 0000000 0000000
omd-1.3.2/tests/cow/automatic-anchors.md 0000664 0000000 0000000 00000000026 14257632064 0020200 0 ustar 00root root 0000000 0000000