pax_global_header00006660000000000000000000000064142576320640014523gustar00rootroot0000000000000052 comment=bc6c0d568b90b61143e9863cb6ef7b3989b3313a omd-1.3.2/000077500000000000000000000000001425763206400123055ustar00rootroot00000000000000omd-1.3.2/.github/000077500000000000000000000000001425763206400136455ustar00rootroot00000000000000omd-1.3.2/.github/workflows/000077500000000000000000000000001425763206400157025ustar00rootroot00000000000000omd-1.3.2/.github/workflows/main.yml000066400000000000000000000013141425763206400173500ustar00rootroot00000000000000name: Main workflow on: pull_request: push: jobs: build: strategy: fail-fast: false matrix: os: - macos-latest - ubuntu-latest - windows-latest ocaml-compiler: - 4.04.2 - 4.13.x runs-on: ${{ matrix.os }} steps: - name: Checkout code uses: actions/checkout@v3 - name: Use OCaml ${{ matrix.ocaml-compiler }} uses: ocaml/setup-ocaml@v2 with: ocaml-compiler: ${{ matrix.ocaml-compiler }} cache-prefix: 1.3.1-${{ matrix.container }} - run: opam install . --deps-only --with-test - run: opam exec -- dune build - run: opam exec -- dune runtest omd-1.3.2/.gitignore000066400000000000000000000001651425763206400142770ustar00rootroot00000000000000_build/ _opam/ *.data *.log *.native *.opt *.byte #* .#* *~ *.cmi *.cma *.cmo *.cmx *.cmxa *.o omd .DS_Store .depend omd-1.3.2/ABOUT.md000066400000000000000000000242021425763206400135010ustar00rootroot00000000000000 About [OMD](https://github.com/pw374/omd/) ========================================== The implementation of this library and command-line tool is based on [DFMSD][]. That description doesn't define a grammar but a sort of guide for human users who are not trying to implement it. In other words, it's ambiguous, which is a problem since there are no errors in the Markdown language, which design is mostly based on some email-writing experience: the meaning of a phrase is the meaning a human would give when reading the phrase as some email contents. For instance, if there are blank lines that have spaces (lines that read empty but actually contain some characters, from the computer point of view since spaces are represented by characters), since they're invisible to the normal human reader, they should be ignored. Specificities ------------- There follows a list of specificities of OMD. This list is probably not exhaustive. **Please note that OMD's semantics have changed over time, but they are becoming more and more stable with time and new releases. The goal is to eventually have a semantics that's as sane as it can possibly be for a Markdown parser. Please [browse and open issues](https://github.com/pw374/omd/issues/) if you find something that seems wrong.** - Email addresses encoding: email addresses are not hex entity-encoded. - `[foo]` is a short-cut for `[foo][]`, but if `foo` is not a reference then `[foo]` is printed `[foo]`, not `[foo][]`. *(Taken from Github Flavour Markdown.)* - The Markdown to Markdown conversion may performe some cleaning (some meaningless characters may disappear) or spoiling (some meaningless characters may appear), but both inputs and ouputs should have the same semantics (otherwise please do report the bug). - A list containing at least one item which has at least one paragraph is a list for which all items have paragraphs and/or blocks. In HTML words, in practice, if an `li` of a `ul` or `ol` has a `p`, then all other `li`s of that list have at least a `p` or a `pre`. - It's not possible to emphasise a part of a word using underscores. *(Taken from Github Flavour Markdown.)* - A code section declared with at least 3 backquotes (`` ` ``) at the first element on a line is a code block. The backquotes should be followed by a language name (made of a-z characters) or by a newline. - A code block starting with several backquotes (e.g., ```` ``` ````) immediately followed by a word W made of a-z characters is a code block for which the code language is W. (If you use other characters than a-z, the semantics is currently undefined although it's deterministic of course, because it may change in the near future.) Also, if you use the command line tool `omd`, you can define programs to process code blocks specifically to the languages that are declared for those code blocks. - Each and every tabulation is converted by OMD to 4 spaces at the lexing step. And the behaviour of the parser is undefined for tabulations. - Note that it does mean that if you have a document with some code written using the [Whitespace](http://en.wikipedia.org/wiki/Whitespace_(programming_language)) language, it will not work very well. This might be fixed in the future but unless you have a very good reason for OMD to support tabulations, it will probably not. - Parentheses and square brackets are generally parsed in a way such that `[a[b]](http://c/(d))` is the URL `http://c/(d)` with the text `a[b]`. If you want a parenthesis or bracket not to count in the balanced parsing, escape it with a backslash, such as in `[a\[b](http://c/\(d)`. *This is typically something that's not defined in [DFMSD].* - Note about backslashes in URLs: some web browsers (e.g., Safari) automatically convert `\` to `/`. It's not the case of CURL. However I assume it's safe to consider that backslashes are not to be used in URLs. Still it's always possible to backslashe-escape them anyways. - HTML is somewhat a part of Markdown. OMD will partially parse HTML tags and if you have a tag that isn't a known HTML tag, then it's possible that OMD will not consider it as HTML. For instance, a document containing just `` will be converted to `

<foo></foo>

`. - It's possible to ask `omd` to relax this constraint. - Some additional features are available on the command line. For more information, used the command `omd -help` [DFMSD]: http://daringfireball.net/projects/markdown/syntax "John Gruber's description of the syntax of Markdown" "DFMSD" is short for "Daring Fireball: Markdown Syntax Documentation", which is the HTML title of the page located at . Extension mechanisms -------------------- The parser is implemented using a big (very big) recursive function (`Omd_parser.Make(Env).main_loop_rev`), with a set of some auxiliary functions. Some parts are easy to understand, some parts are not. However, overall, it should be easy enough. The parser has a double extension mechanism. 1. To use the first mechanism, you may define a set of functions in the module `Env` given to instanciate the functor `Omd_parser.Make`. * The value `Env.extensions` is a list of elements of type `Omd_representation.extension` which is equal to `r -> p -> l -> (r * p * l) option` where * `r = Omd_representation.t` and represents the result of the parsing process, * `p = Omd_representation.tok list` and represents the tokens preceding `l`, * and `l = tok list` and is the list of tokens to parse. * The result, of type `(r * p * l) option`, is `None` if the extension has no effect (and the parser will continue doing its job with its state it had before using the extension), and is `Some(r,p,l)` when it gives a new set of data to the parser. * Each element of the list `Env.extensions` is applied in a fold left manner. (The first element of that list is applied first.) * And they are applied when a standard parsing rule fails. 2. The second extension stands in the representation of the lexemes (`Tag of string * extension`). It allows to insert extensions directly into the lexeme list. The Markdown representation also provides an extension mechanism, which is useful if you want to insert “smart objects” (which are as “smart” as smartphones). Those objects have four methods, 2 of them are particularly useful: `to_html` and `to_t`, and implementing one of them is necessary. They both return a `string option`, and a default dummy such smart object can be defined as follows: ```ocaml let dummy = X (object method name = "dummy" method to_html ?(indent=0) _ _ = None method to_sexpr _ _ = None method to_t _ = None end) ``` History ------- OMD has been developed by [Philippe Wang](https://github.com/pw374/) at [OCaml Labs](http://ocaml.io/) in [Cambridge](http://www.cl.cam.ac.uk), with precious feedbacks and [pull requests](https://github.com/pw374/omd/pulls) (cf. next section). Its development was motivated by at least these facts: - We wanted an OCaml implementation of Markdown; some OCaml parsers of Markdown existed before but they were incomplete. It's easier for an OCaml project to depend on an pure-OCaml implementation of Markdown than to depend some interface to a library implemented using another language, and this is ever more important since [Opam](https://opam.ocaml.org) exists. - We wanted to provide a way to make the contents of the [OCaml.org](http://ocaml.org/) website be essentially in Markdown instead of HTML. And we wanted to this website to be implemented in OCaml. - Having an OCaml implementation of Markdown is virtually mandatory for those who want to use a Markdown parser in a [Mirage](http://www.openmirage.org) application. Note that OMD has replaced the previous Markdown parser of [COW](https://github.com/mirage/ocaml-cow), which has been developed as part of the Mirage project. Thanks ------ Thank you to [Christophe Troestler](https://github.com/Chris00), [Ashish Argawal](https://github.com/agarwal), [Sebastien Mondet](https://github.com/smondet), [Thomas Gazagnaire](https://github.com/samoht), [Daniel Bünzli](https://github.com/dbuenzli), [Amir Chaudry](https://github.com/amirmc), [Anil Madhavapeddy](https://github.com/avsm/), [David Sheets](https://github.com/dsheets/), [Jeremy Yallop](https://github.com/yallop/), and \ for their feedbacks and contributions to this project. Miscellaneous notes ------------------- - There's been absolutely no effort in making OMD fast, but it should be amongst the fastest parsers of Markdown, just thanks to the fact that it is implemented in OCaml. That being said, there's quite some room for performance improvements. One way would be to make a several-pass parser with different intermediate representations (there're currently only 2 representations: one for the lexing tokens and one for the parse tree). - The hardest part of implementing a parser of Markdown is the process of understanding and unravelling the grammar of Markdown to turn it into a program. - OMD 1.0.0 will probably use some external libraries, e.g., [UUNF](http://erratique.ch/software/uunf) and perhaps [Xmlm](http://erratique.ch/software/xmlm/doc/Xmlm) - "OMD" is the name of this library and command-line tool. - It might be written "Omd" or "omd" sometimes, but it should be written using capital letters because it should be read `əʊ ɛm diː` rather than `ə'md` or `ˌɒmd`. - "`Omd`" is a module. - It's written using monospace font and it's capitalized. - "`omd`" is a command-line tool. - It's written using monospace font and it's always lowercase letters only because unless you have a non-sensitive file system, calling `Omd` on the command line is not just another way of calling `omd`. - OMD has been added on the quite long list of Markdown parsers on the 29th of January. omd-1.3.2/CHANGES.md000066400000000000000000000041271425763206400137030ustar00rootroot00000000000000# Document Title 1.3.2 ------ - port from oasis to dune (#273, @tmattio) 1.3.x ----- - might stop checking validity of HTML tag *names* and accept any XML-parsable tag name. 1.2.5 ----- - only fixes a single bug (an ordered list could be transformed into an unordered list) 1.2.4 ----- - only fixes a single bug (some spaces were wrongly handled in the HTML parsing) 1.2.2/3 ------- - fix a few issues with HTML parsing. 1.2.1 ----- - mainly fixes issues with HTML parsing. 1.2.0 ----- - introduces options `-w` and `-W`. Fixes mostly concern subtle uses of `\n`s in HTML and Markdown outputs. 1.1.2 ----- - fix: some URL-related parsing issues. 1.1.0/1.1.1 ----------- - fix: some HTML-related issues. 1.0.1 ----- - fixes some parsing issues, improves output. (2014-10-02) 1.0.0 ----- - warning: this release is only partially compatible with previous versions. - accept HTML blocks which directly follow each other - fix: accept all XML-compatible attribute names for HTML attributes - fix backslash-escaping for hash-ending ATX-titles + fix Markdown output for Html_block - fix (HTML parsing) bugs introduced in 1.0.0.b and 1.0.0.c - rewrite parser of block HTML to use the updated Omd.t - rewrite parser of inline HTML to use the updated Omd.t - upgrade Omd.t for HTML representation There will not be any newer 0.9.x release although new bugs have been discovered. Thus it's recommended to upgrade to the latest 1.x.y. 0.9.7 ----- - introduction of media:end + bug fixes. If you need to have a version that still has `Tag of extension` instead of `Tag of name * extension` and don't want to upgrade, you may use 0.9.3 0.9.6 ----- - fix a bug (concerning extensions) introduced by 0.9.4. 0.9.5 ----- - bug fix + `Tag of extension` changed to `Tag of name * extension` 0.9.4 ----- - fixes a bug for the new feature 0.9.3 ----- - new feature `media:type="text/omd"`. This version is recommended if you do not use that new feature and want to use 0.9.x 0.9.2 ----- - not released... older versions -------------- - cf. [commit log](https://github.com/ocaml/omd/commits/master) omd-1.3.2/Makefile000066400000000000000000000003231425763206400137430ustar00rootroot00000000000000## # Omd # # @file .PHONY: test build fmt deps build: deps dune build deps: opam install . --deps-only --yes test: dune build @gen --auto-promote dune runtest fmt: dune build @fmt --auto-promote # end omd-1.3.2/README.md000066400000000000000000000120641425763206400135670ustar00rootroot00000000000000OMD: extensible Markdown library and tool in OCaml ================================================== OMD provides two things: 1. the command-line tool `omd`, which takes some Markdown and converts it to HTML or Markdown. Use `omd -help` for more information on how to use it. 2. the library for OCaml contains several modules: - the module `Omd` contains most functions a user will need for basic Markdown manipulation. - the modules `Omd_parser`, `Omd_lexer`, `Omd_backend`, `Omd_representation` and `Omd_utils` basically implement what their names say: * `Omd_parser` implements the parser (the most complex part). * `Omd_lexer` implements a (basic) lexer. * `Omd_backend` implements 3 backends: 1. HTML: default backend. 2. Markdown: sometimes it's useful to show that the fix-point is easily reachable. 3. S-expression: it's mainly used for debugging. * `Omd_representation` declares the datatypes used in `Omd`. It also provides some functions to work on those datatypes. * `Omd_utils` provides some useful tools that are not very specific to the OMD-specific datatypes. OMD aims at implementing the ["original Markdown specs"](http://daringfireball.net/projects/markdown/syntax) with a few Github Flavour Markdown characteristics. OMD is also meant to be more "sane" than other Markdown parsers from the semantics point of view: if something bothers you from the semantics point of view, please [open an issue on Github](https://github.com/ocaml/omd/issues). Encoding -------- **OMD assumes its input is US-ASCII or UTF-8 encoded.** Dependencies ------------ OMD is implemented in OCaml, therefore it needs it to be compiled. OCaml 4.00.1 and then 4.01.0 have been used. OMD should be compatible with 3.12.0 as well, if it's not then please [open an issue](https://github.com/ocaml/omd/issues). The opam package for OMD depends on ocamlfind, which is only used to compile and install OMD. The root Makefile uses oasis, ocamlbuild and oasis2opam. The Makefile in src/ only use the compilers from the standard distribution of OCaml. OMD, compiled as a library and/or a tool, doesn't depend on anything other than the OCaml standard library and runtime. ---------------- Usage ----- - to install `omd` using opam (recommended) `opam install omd` - to get the development version of omd `git clone git://github.com/ocaml/omd.git` - to compile `omd` - without `oasis` nor `ocamlbuild` `cd omd/src && make` - using `oasis` and `ocamlbuild` `cd omd && make` ---------------- Log --- The recommended version numbers are typefaced in **bold**. As new releases come out and bugs are discovered, a version can stop being recommended. Version numbers are trying to follow this scheme: `x.y.z`, `z` is is for minor changes, `y` may include algorithm, interface or editorial policy changes, and `x` is for deeper changes. - 1.3.x might stop checking validity of HTML tag *names* and accept any XML-parsable tag name. - **1.2.5** only fixes a single bug (an ordered list could be transformed into an unordered list) - 1.2.4 only fixes a single bug (some spaces were wrongly handled in the HTML parsing) - 1.2.2 and 1.2.3 fix a few issues with HTML parsing. - 1.2.1 mainly fixes issues with HTML parsing. - 1.2.0 introduces options `-w` and `-W`. Fixes mostly concern subtle uses of `\n`s in HTML and Markdown outputs. - 1.1.2: fix: some URL-related parsing issues. - 1.1.0 and 1.1.1: fix: some HTML-related issues. - 1.0.1: fixes some parsing issues, improves output. (2014-10-02) - 1.0.0: warning: this release is only partially compatible with previous versions. - tags 1.0.0.x precede 1.0.0. Also, tags 1.0.0.x will not be released in OPAM, next release will be 1.0.0. And 1.0.0.x may not be compatible with each other. - tag 1.0.0.g: accept HTML blocks which directly follow each other - tag 1.0.0.f: fix: accept all XML-compatible attribute names for HTML attributes - tag 1.0.0.e: fix backslash-escaping for hash-ending ATX-titles + fix Markdown output for Html_block - tag 1.0.0.d: fix (HTML parsing) bugs introduced in 1.0.0.b and 1.0.0.c - tag 1.0.0.c: rewrite parser of block HTML to use the updated Omd.t - tag 1.0.0.b: rewrite parser of inline HTML to use the updated Omd.t - tag 1.0.0.a: upgrade Omd.t for HTML representation There will not be any newer 0.9.x release although new bugs have been discovered. Thus it's recommended to upgrade to the latest 1.x.y. - **0.9.7**: introduction of media:end + bug fixes If you need to have a version that still has `Tag of extension` instead of `Tag of name * extension` and don't want to upgrade, you may use 0.9.3 - 0.9.6: fix a bug (concerning extensions) introduced by 0.9.4. - 0.9.5: bug fix + `Tag of extension` changed to `Tag of name * extension` - 0.9.4: fixes a bug for the new feature - 0.9.3: new feature `media:type="text/omd"`. This version is recommended if you do not use that new feature and want to use 0.9.x - 0.9.2: not released... - older versions: cf. [commit log](https://github.com/ocaml/omd/commits/master) omd-1.3.2/bmd/000077500000000000000000000000001425763206400130475ustar00rootroot00000000000000omd-1.3.2/bmd/README.md000066400000000000000000000013571425763206400143340ustar00rootroot00000000000000This directory contains expressions that **"break"** Markdown. For instance: ``` * Starting a list with 3 spaces and a star. * Continuing the list. * 2 spaces and a star: are we still in the same list? ``` On Github it renders like this: * Starting a list with 3 spaces and a star. * Continuing the list. * 2 spaces and a star: are we still in the same list? So you can see that on Github, the star with preceded by less spaces starts an **inner** list, which is kind of very wrong... Pandoc considers that the 3rd bullet starts the 3rd element of the unique list, which is not so right either. I'm not blaming those tools, but rather the language. (On second thoughts, I might endup blaming the tools rather than the language.) omd-1.3.2/dune-project000066400000000000000000000016141425763206400146310ustar00rootroot00000000000000(lang dune 2.7) (name omd) (version 1.3.1) (generate_opam_files) (license ISC) (authors "Philippe Wang " "Nicolás Ojeda Bär ") (maintainers "Shon Feder " "Raphael Sousa Santos <@sonologico>") (source (github ocaml/omd)) (package (name omd) (synopsis "A Markdown frontend in pure OCaml") (description "This Markdown library is implemented using only pure OCaml (including I/O operations provided by the standard OCaml compiler distribution). OMD is meant to be as faithful as possible to the original Markdown. Additionally, OMD implements a few Github markdown features, an extension mechanism, and some other features. Note that the opam package installs both the OMD library and the command line tool `omd`.") (tags (org:ocamllabs org:mirage)) (depends (ocaml (>= 4.04)) base-bigarray base-bytes)) omd-1.3.2/omd.opam000066400000000000000000000023221425763206400137410ustar00rootroot00000000000000# This file is generated by dune, edit dune-project instead opam-version: "2.0" version: "1.3.1" synopsis: "A Markdown frontend in pure OCaml" description: """ This Markdown library is implemented using only pure OCaml (including I/O operations provided by the standard OCaml compiler distribution). OMD is meant to be as faithful as possible to the original Markdown. Additionally, OMD implements a few Github markdown features, an extension mechanism, and some other features. Note that the opam package installs both the OMD library and the command line tool `omd`.""" maintainer: [ "Shon Feder " "Raphael Sousa Santos <@sonologico>" ] authors: [ "Philippe Wang " "Nicolás Ojeda Bär " ] license: "ISC" tags: ["org:ocamllabs" "org:mirage"] homepage: "https://github.com/ocaml/omd" bug-reports: "https://github.com/ocaml/omd/issues" depends: [ "dune" {>= "2.7"} "ocaml" {>= "4.04"} "base-bigarray" "base-bytes" "odoc" {with-doc} ] build: [ ["dune" "subst"] {dev} [ "dune" "build" "-p" name "-j" jobs "@install" "@runtest" {with-test} "@doc" {with-doc} ] ] dev-repo: "git+https://github.com/ocaml/omd.git" omd-1.3.2/setup.ml000066400000000000000000000041661425763206400140060ustar00rootroot00000000000000let () = try Topdirs.dir_directory (Sys.getenv "OCAML_TOPLEVEL_PATH") with Not_found -> ();; (* OASIS_START *) (* DO NOT EDIT (digest: a426e2d026defb34183b787d31fbdcff) *) (******************************************************************************) (* OASIS: architecture for building OCaml libraries and applications *) (* *) (* Copyright (C) 2011-2016, Sylvain Le Gall *) (* Copyright (C) 2008-2011, OCamlCore SARL *) (* *) (* This library is free software; you can redistribute it and/or modify it *) (* under the terms of the GNU Lesser General Public License as published by *) (* the Free Software Foundation; either version 2.1 of the License, or (at *) (* your option) any later version, with the OCaml static compilation *) (* exception. *) (* *) (* This library is distributed in the hope that it will be useful, but *) (* WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY *) (* or FITNESS FOR A PARTICULAR PURPOSE. See the file COPYING for more *) (* details. *) (* *) (* You should have received a copy of the GNU Lesser General Public License *) (* along with this library; if not, write to the Free Software Foundation, *) (* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA *) (******************************************************************************) let () = try Topdirs.dir_directory (Sys.getenv "OCAML_TOPLEVEL_PATH") with Not_found -> () ;; #use "topfind";; #require "oasis.dynrun";; open OASISDynRun;; let setup_t = BaseCompat.Compat_0_4.adapt_setup_t setup_t open BaseCompat.Compat_0_4 (* OASIS_STOP *) let () = setup ();; omd-1.3.2/src/000077500000000000000000000000001425763206400130745ustar00rootroot00000000000000omd-1.3.2/src/dune000066400000000000000000000004021425763206400137460ustar00rootroot00000000000000(library (name omd) (wrapped false) (modules :standard \ omd_main) (flags (:standard -w -50-6-27-32-39-33-35)) (public_name omd) (libraries bigarray bytes)) (executable (name omd_main) (public_name omd) (modules omd_main) (libraries omd)) omd-1.3.2/src/html_characters.ml000066400000000000000000000473041425763206400166010ustar00rootroot00000000000000(* UTF-8 and HTML entities *) let characters_htmlentities_descriptions = (* data extracted from http://www.w3schools.com/ on December, 18th, 2013 *) [ " ", " ", "space"; "!", "!", "exclamation mark"; "\"", """, "quotation mark"; "#", "#", "number sign"; "$", "$", "dollar sign"; "%", "%", "percent sign"; "&", "&", "ampersand"; "'", "'", "apostrophe"; "(", "(", "left parenthesis"; ")", ")", "right parenthesis"; "*", "*", "asterisk"; "+", "+", "plus sign"; ",", ",", "comma"; "-", "-", "hyphen"; ".", ".", "period"; "/", "/", "slash"; "0", "0", "digit 0"; "1", "1", "digit 1"; "2", "2", "digit 2"; "3", "3", "digit 3"; "4", "4", "digit 4"; "5", "5", "digit 5"; "6", "6", "digit 6"; "7", "7", "digit 7"; "8", "8", "digit 8"; "9", "9", "digit 9"; ":", ":", "colon"; ";", ";", "semicolon"; "<", "<", "less-than"; "=", "=", "equals-to"; ">", ">", "greater-than"; "?", "?", "question mark"; "@", "@", "at sign"; "A", "A", "uppercase A"; "B", "B", "uppercase B"; "C", "C", "uppercase C"; "D", "D", "uppercase D"; "E", "E", "uppercase E"; "F", "F", "uppercase F"; "G", "G", "uppercase G"; "H", "H", "uppercase H"; "I", "I", "uppercase I"; "J", "J", "uppercase J"; "K", "K", "uppercase K"; "L", "L", "uppercase L"; "M", "M", "uppercase M"; "N", "N", "uppercase N"; "O", "O", "uppercase O"; "P", "P", "uppercase P"; "Q", "Q", "uppercase Q"; "R", "R", "uppercase R"; "S", "S", "uppercase S"; "T", "T", "uppercase T"; "U", "U", "uppercase U"; "V", "V", "uppercase V"; "W", "W", "uppercase W"; "X", "X", "uppercase X"; "Y", "Y", "uppercase Y"; "Z", "Z", "uppercase Z"; "[", "[", "left square bracket"; "\\", "\", "backslash"; "]", "]", "right square bracket"; "^", "^", "caret"; "_", "_", "underscore"; "`", "`", "grave accent"; "a", "a", "lowercase a"; "b", "b", "lowercase b"; "c", "c", "lowercase c"; "d", "d", "lowercase d"; "e", "e", "lowercase e"; "f", "f", "lowercase f"; "g", "g", "lowercase g"; "h", "h", "lowercase h"; "i", "i", "lowercase i"; "j", "j", "lowercase j"; "k", "k", "lowercase k"; "l", "l", "lowercase l"; "m", "m", "lowercase m"; "n", "n", "lowercase n"; "o", "o", "lowercase o"; "p", "p", "lowercase p"; "q", "q", "lowercase q"; "r", "r", "lowercase r"; "s", "s", "lowercase s"; "t", "t", "lowercase t"; "u", "u", "lowercase u"; "v", "v", "lowercase v"; "w", "w", "lowercase w"; "x", "x", "lowercase x"; "y", "y", "lowercase y"; "z", "z", "lowercase z"; "{", "{", "left curly brace"; "|", "|", "vertical bar"; "}", "}", "right curly brace"; "~", "~", "tilde"; "\000", "�", "null character"; "\001", "", "start of header"; "\002", "", "start of text"; "\003", "", "end of text"; "\004", "", "end of transmission"; "\005", "", "enquiry"; "\006", "", "acknowledge"; "\007", "", "bell (ring)"; "\008", "", "backspace"; "\009", " ", "horizontal tab"; "\010", " ", "line feed"; "\011", " ", "vertical tab"; "\012", " ", "form feed"; "\013", " ", "carriage return"; "\014", "", "shift out"; "\015", "", "shift in"; "\016", "", "data link escape"; "\017", "", "device control 1"; "\018", "", "device control 2"; "\019", "", "device control 3"; "\020", "", "device control 4"; "\021", "", "negative acknowledge"; "\022", "", "synchronize"; "\023", "", "end transmission block"; "\024", "", "cancel"; "\025", "", "end of medium"; "\026", "", "substitute"; "\027", "", "escape"; "\028", "", "file separator"; "\029", "", "group separator"; "\030", "", "record separator"; "\031", "", "unit separator"; "\127", "", "delete (rubout)"; "\"", """, "quotation mark"; "'", "'", "apostrophe"; "&", "&", "ampersand"; "<", "<", "less-than"; ">", ">", "greater-than"; "\xc2\xa0", " ", "non-breaking space"; "\xc2\xa0", " ", "non-breaking space"; "¡", "¡", "inverted exclamation mark"; "¡", "¡", "inverted exclamation mark"; "¢", "¢", "cent"; "¢", "¢", "cent"; "£", "£", "pound"; "£", "£", "pound"; "¤", "¤", "currency"; "¤", "¤", "currency"; "¥", "¥", "yen"; "¥", "¥", "yen"; "¦", "¦", "broken vertical bar"; "¦", "¦", "broken vertical bar"; "§", "§", "section"; "§", "§", "section"; "¨", "¨", "spacing diaeresis"; "¨", "¨", "spacing diaeresis"; "©", "©", "copyright"; "©", "©", "copyright"; "ª", "ª", "feminine ordinal indicator"; "ª", "ª", "feminine ordinal indicator"; "«", "«", "angle quotation mark (left)"; "«", "«", "angle quotation mark (left)"; "¬", "¬", "negation"; "¬", "¬", "negation"; "�­", "­", "soft hyphen"; "�­", "­", "soft hyphen"; "®", "®", "registered trademark"; "®", "®", "registered trademark"; "¯", "¯", "spacing macron"; "¯", "¯", "spacing macron"; "°", "°", "degree"; "°", "°", "degree"; "±", "±", "plus-or-minus"; "±", "±", "plus-or-minus"; "²", "²", "superscript 2"; "²", "²", "superscript 2"; "³", "³", "superscript 3"; "³", "³", "superscript 3"; "´", "´", "spacing acute"; "´", "´", "spacing acute"; "µ", "µ", "micro"; "µ", "µ", "micro"; "¶", "¶", "paragraph"; "¶", "¶", "paragraph"; "·", "·", "middle dot"; "·", "·", "middle dot"; "¸", "¸", "spacing cedilla"; "¸", "¸", "spacing cedilla"; "¹", "¹", "superscript 1"; "¹", "¹", "superscript 1"; "º", "º", "masculine ordinal indicator"; "º", "º", "masculine ordinal indicator"; "»", "»", "angle quotation mark (right)"; "»", "»", "angle quotation mark (right)"; "¼", "¼", "fraction 1/4"; "¼", "¼", "fraction 1/4"; "½", "½", "fraction 1/2"; "½", "½", "fraction 1/2"; "¾", "¾", "fraction 3/4"; "¾", "¾", "fraction 3/4"; "¿", "¿", "inverted question mark"; "¿", "¿", "inverted question mark"; "×", "×", "multiplication"; "×", "×", "multiplication"; "÷", "÷", "division"; "÷", "÷", "division"; "À", "À", "capital a, grave accent"; "À", "À", "capital a, grave accent"; "Á", "Á", "capital a, acute accent"; "Á", "Á", "capital a, acute accent"; "Â", "Â", "capital a, circumflex accent"; "Â", "Â", "capital a, circumflex accent"; "Ã", "Ã", "capital a, tilde"; "Ã", "Ã", "capital a, tilde"; "Ä", "Ä", "capital a, umlaut mark"; "Ä", "Ä", "capital a, umlaut mark"; "Å", "Å", "capital a, ring"; "Å", "Å", "capital a, ring"; "Æ", "Æ", "capital ae"; "Æ", "Æ", "capital ae"; "Ç", "Ç", "capital c, cedilla"; "Ç", "Ç", "capital c, cedilla"; "È", "È", "capital e, grave accent"; "È", "È", "capital e, grave accent"; "É", "É", "capital e, acute accent"; "É", "É", "capital e, acute accent"; "Ê", "Ê", "capital e, circumflex accent"; "Ê", "Ê", "capital e, circumflex accent"; "Ë", "Ë", "capital e, umlaut mark"; "Ë", "Ë", "capital e, umlaut mark"; "Ì", "Ì", "capital i, grave accent"; "Ì", "Ì", "capital i, grave accent"; "Í", "Í", "capital i, acute accent"; "Í", "Í", "capital i, acute accent"; "Î", "Î", "capital i, circumflex accent"; "Î", "Î", "capital i, circumflex accent"; "Ï", "Ï", "capital i, umlaut mark"; "Ï", "Ï", "capital i, umlaut mark"; "Ð", "Ð", "capital eth, Icelandic"; "Ð", "Ð", "capital eth, Icelandic"; "Ñ", "Ñ", "capital n, tilde"; "Ñ", "Ñ", "capital n, tilde"; "Ò", "Ò", "capital o, grave accent"; "Ò", "Ò", "capital o, grave accent"; "Ó", "Ó", "capital o, acute accent"; "Ó", "Ó", "capital o, acute accent"; "Ô", "Ô", "capital o, circumflex accent"; "Ô", "Ô", "capital o, circumflex accent"; "Õ", "Õ", "capital o, tilde"; "Õ", "Õ", "capital o, tilde"; "Ö", "Ö", "capital o, umlaut mark"; "Ö", "Ö", "capital o, umlaut mark"; "Ø", "Ø", "capital o, slash"; "Ø", "Ø", "capital o, slash"; "Ù", "Ù", "capital u, grave accent"; "Ù", "Ù", "capital u, grave accent"; "Ú", "Ú", "capital u, acute accent"; "Ú", "Ú", "capital u, acute accent"; "Û", "Û", "capital u, circumflex accent"; "Û", "Û", "capital u, circumflex accent"; "Ü", "Ü", "capital u, umlaut mark"; "Ü", "Ü", "capital u, umlaut mark"; "Ý", "Ý", "capital y, acute accent"; "Ý", "Ý", "capital y, acute accent"; "Þ", "Þ", "capital THORN, Icelandic"; "Þ", "Þ", "capital THORN, Icelandic"; "ß", "ß", "small sharp s, German"; "ß", "ß", "small sharp s, German"; "à", "à", "small a, grave accent"; "à", "à", "small a, grave accent"; "á", "á", "small a, acute accent"; "á", "á", "small a, acute accent"; "â", "â", "small a, circumflex accent"; "â", "â", "small a, circumflex accent"; "ã", "ã", "small a, tilde"; "ã", "ã", "small a, tilde"; "ä", "ä", "small a, umlaut mark"; "ä", "ä", "small a, umlaut mark"; "å", "å", "small a, ring"; "å", "å", "small a, ring"; "æ", "æ", "small ae"; "æ", "æ", "small ae"; "ç", "ç", "small c, cedilla"; "ç", "ç", "small c, cedilla"; "è", "è", "small e, grave accent"; "è", "è", "small e, grave accent"; "é", "é", "small e, acute accent"; "é", "é", "small e, acute accent"; "ê", "ê", "small e, circumflex accent"; "ê", "ê", "small e, circumflex accent"; "ë", "ë", "small e, umlaut mark"; "ë", "ë", "small e, umlaut mark"; "ì", "ì", "small i, grave accent"; "ì", "ì", "small i, grave accent"; "í", "í", "small i, acute accent"; "í", "í", "small i, acute accent"; "î", "î", "small i, circumflex accent"; "î", "î", "small i, circumflex accent"; "ï", "ï", "small i, umlaut mark"; "ï", "ï", "small i, umlaut mark"; "ð", "ð", "small eth, Icelandic"; "ð", "ð", "small eth, Icelandic"; "ñ", "ñ", "small n, tilde"; "ñ", "ñ", "small n, tilde"; "ò", "ò", "small o, grave accent"; "ò", "ò", "small o, grave accent"; "ó", "ó", "small o, acute accent"; "ó", "ó", "small o, acute accent"; "ô", "ô", "small o, circumflex accent"; "ô", "ô", "small o, circumflex accent"; "õ", "õ", "small o, tilde"; "õ", "õ", "small o, tilde"; "ö", "ö", "small o, umlaut mark"; "ö", "ö", "small o, umlaut mark"; "ø", "ø", "small o, slash"; "ø", "ø", "small o, slash"; "ù", "ù", "small u, grave accent"; "ù", "ù", "small u, grave accent"; "ú", "ú", "small u, acute accent"; "ú", "ú", "small u, acute accent"; "û", "û", "small u, circumflex accent"; "û", "û", "small u, circumflex accent"; "ü", "ü", "small u, umlaut mark"; "ü", "ü", "small u, umlaut mark"; "ý", "ý", "small y, acute accent"; "ý", "ý", "small y, acute accent"; "þ", "þ", "small thorn, Icelandic"; "þ", "þ", "small thorn, Icelandic"; "ÿ", "ÿ", "small y, umlaut mark"; "ÿ", "ÿ", "small y, umlaut mark"; "∀", "∀", "for all"; "∀", "∀", "for all"; "∂", "∂", "part"; "∂", "∂", "part"; "∃", "∃", "exists"; "∃", "∃", "exists"; "∅", "∅", "empty"; "∅", "∅", "empty"; "∇", "∇", "nabla"; "∇", "∇", "nabla"; "∈", "∈", "isin"; "∈", "∈", "isin"; "∉", "∉", "notin"; "∉", "∉", "notin"; "∋", "∋", "ni"; "∋", "∋", "ni"; "∏", "∏", "prod"; "∏", "∏", "prod"; "∑", "∑", "sum"; "∑", "∑", "sum"; "−", "−", "minus"; "−", "−", "minus"; "∗", "∗", "lowast"; "∗", "∗", "lowast"; "√", "√", "square root"; "√", "√", "square root"; "∝", "∝", "proportional to"; "∝", "∝", "proportional to"; "∞", "∞", "infinity"; "∞", "∞", "infinity"; "∠", "∠", "angle"; "∠", "∠", "angle"; "∧", "∧", "and"; "∧", "∧", "and"; "∨", "∨", "or"; "∨", "∨", "or"; "∩", "∩", "cap"; "∩", "∩", "cap"; "∪", "∪", "cup"; "∪", "∪", "cup"; "∫", "∫", "integral"; "∫", "∫", "integral"; "∴", "∴", "therefore"; "∴", "∴", "therefore"; "∼", "∼", "similar to"; "∼", "∼", "similar to"; "≅", "≅", "congruent to"; "≅", "≅", "congruent to"; "≈", "≈", "almost equal"; "≈", "≈", "almost equal"; "≠", "≠", "not equal"; "≠", "≠", "not equal"; "≡", "≡", "equivalent"; "≡", "≡", "equivalent"; "≤", "≤", "less or equal"; "≤", "≤", "less or equal"; "≥", "≥", "greater or equal"; "≥", "≥", "greater or equal"; "⊂", "⊂", "subset of"; "⊂", "⊂", "subset of"; "⊃", "⊃", "superset of"; "⊃", "⊃", "superset of"; "⊄", "⊄", "not subset of"; "⊄", "⊄", "not subset of"; "⊆", "⊆", "subset or equal"; "⊆", "⊆", "subset or equal"; "⊇", "⊇", "superset or equal"; "⊇", "⊇", "superset or equal"; "⊕", "⊕", "circled plus"; "⊕", "⊕", "circled plus"; "⊗", "⊗", "circled times"; "⊗", "⊗", "circled times"; "⊥", "⊥", "perpendicular"; "⊥", "⊥", "perpendicular"; "⋅", "⋅", "dot operator"; "⋅", "⋅", "dot operator"; "Α", "Α", "Alpha"; "Α", "Α", "Alpha"; "Β", "Β", "Beta"; "Β", "Β", "Beta"; "Γ", "Γ", "Gamma"; "Γ", "Γ", "Gamma"; "Δ", "Δ", "Delta"; "Δ", "Δ", "Delta"; "Ε", "Ε", "Epsilon"; "Ε", "Ε", "Epsilon"; "Ζ", "Ζ", "Zeta"; "Ζ", "Ζ", "Zeta"; "Η", "Η", "Eta"; "Η", "Η", "Eta"; "Θ", "Θ", "Theta"; "Θ", "Θ", "Theta"; "Ι", "Ι", "Iota"; "Ι", "Ι", "Iota"; "Κ", "Κ", "Kappa"; "Κ", "Κ", "Kappa"; "Λ", "Λ", "Lambda"; "Λ", "Λ", "Lambda"; "Μ", "Μ", "Mu"; "Μ", "Μ", "Mu"; "Ν", "Ν", "Nu"; "Ν", "Ν", "Nu"; "Ξ", "Ξ", "Xi"; "Ξ", "Ξ", "Xi"; "Ο", "Ο", "Omicron"; "Ο", "Ο", "Omicron"; "Π", "Π", "Pi"; "Π", "Π", "Pi"; "Ρ", "Ρ", "Rho"; "Ρ", "Ρ", "Rho"; "Σ", "Σ", "Sigma"; "Σ", "Σ", "Sigma"; "Τ", "Τ", "Tau"; "Τ", "Τ", "Tau"; "Υ", "Υ", "Upsilon"; "Υ", "Υ", "Upsilon"; "Φ", "Φ", "Phi"; "Φ", "Φ", "Phi"; "Χ", "Χ", "Chi"; "Χ", "Χ", "Chi"; "Ψ", "Ψ", "Psi"; "Ψ", "Ψ", "Psi"; "Ω", "Ω", "Omega"; "Ω", "Ω", "Omega"; "α", "α", "alpha"; "α", "α", "alpha"; "β", "β", "beta"; "β", "β", "beta"; "γ", "γ", "gamma"; "γ", "γ", "gamma"; "δ", "δ", "delta"; "δ", "δ", "delta"; "ε", "ε", "epsilon"; "ε", "ε", "epsilon"; "ζ", "ζ", "zeta"; "ζ", "ζ", "zeta"; "η", "η", "eta"; "η", "η", "eta"; "θ", "θ", "theta"; "θ", "θ", "theta"; "ι", "ι", "iota"; "ι", "ι", "iota"; "κ", "κ", "kappa"; "κ", "κ", "kappa"; "λ", "λ", "lambda"; "λ", "λ", "lambda"; "μ", "μ", "mu"; "μ", "μ", "mu"; "ν", "ν", "nu"; "ν", "ν", "nu"; "ξ", "ξ", "xi"; "ξ", "ξ", "xi"; "ο", "ο", "omicron"; "ο", "ο", "omicron"; "π", "π", "pi"; "π", "π", "pi"; "ρ", "ρ", "rho"; "ρ", "ρ", "rho"; "ς", "ς", "sigmaf"; "ς", "ς", "sigmaf"; "σ", "σ", "sigma"; "σ", "σ", "sigma"; "τ", "τ", "tau"; "τ", "τ", "tau"; "υ", "υ", "upsilon"; "υ", "υ", "upsilon"; "φ", "φ", "phi"; "φ", "φ", "phi"; "χ", "χ", "chi"; "χ", "χ", "chi"; "ψ", "ψ", "psi"; "ψ", "ψ", "psi"; "ω", "ω", "omega"; "ω", "ω", "omega"; "ϑ", "ϑ", "theta symbol"; "ϑ", "ϑ", "theta symbol"; "ϒ", "ϒ", "upsilon symbol"; "ϒ", "ϒ", "upsilon symbol"; "ϖ", "ϖ", "pi symbol"; "ϖ", "ϖ", "pi symbol"; "Œ", "Œ", "capital ligature OE"; "Œ", "Œ", "capital ligature OE"; "œ", "œ", "small ligature oe"; "œ", "œ", "small ligature oe"; "Š", "Š", "capital S with caron"; "Š", "Š", "capital S with caron"; "š", "š", "small S with caron"; "š", "š", "small S with caron"; "Ÿ", "Ÿ", "capital Y with diaeres"; "Ÿ", "Ÿ", "capital Y with diaeres"; "ƒ", "ƒ", "f with hook"; "ƒ", "ƒ", "f with hook"; "ˆ", "ˆ", "modifier letter circumflex accent"; "ˆ", "ˆ", "modifier letter circumflex accent"; "˜", "˜", "small tilde"; "˜", "˜", "small tilde"; " ", " ", "en space"; " ", " ", "en space"; " ", " ", "em space"; " ", " ", "em space"; " ", " ", "thin space"; " ", " ", "thin space"; "‌", "‌", "zero width non-joiner"; "‌", "‌", "zero width non-joiner"; "‍", "‍", "zero width joiner"; "‍", "‍", "zero width joiner"; "‎", "‎", "left-to-right mark"; "‎", "‎", "left-to-right mark"; "‏", "‏", "right-to-left mark"; "‏", "‏", "right-to-left mark"; "–", "–", "en dash"; "–", "–", "en dash"; "—", "—", "em dash"; "—", "—", "em dash"; "‘", "‘", "left single quotation mark"; "‘", "‘", "left single quotation mark"; "’", "’", "right single quotation mark"; "’", "’", "right single quotation mark"; "‚", "‚", "single low-9 quotation mark"; "‚", "‚", "single low-9 quotation mark"; "“", "“", "left double quotation mark"; "“", "“", "left double quotation mark"; "”", "”", "right double quotation mark"; "”", "”", "right double quotation mark"; "„", "„", "double low-9 quotation mark"; "„", "„", "double low-9 quotation mark"; "†", "†", "dagger"; "†", "†", "dagger"; "‡", "‡", "double dagger"; "‡", "‡", "double dagger"; "•", "•", "bullet"; "•", "•", "bullet"; "…", "…", "horizontal ellipsis"; "…", "…", "horizontal ellipsis"; "‰", "‰", "per mille "; "‰", "‰", "per mille "; "′", "′", "minutes"; "′", "′", "minutes"; "″", "″", "seconds"; "″", "″", "seconds"; "‹", "‹", "single left angle quotation"; "‹", "‹", "single left angle quotation"; "›", "›", "single right angle quotation"; "›", "›", "single right angle quotation"; "‾", "‾", "overline"; "‾", "‾", "overline"; "€", "€", "euro"; "€", "€", "euro"; "™", "™", "trademark"; "™", "™", "trademark"; "™", "™", "trademark"; "™", "™", "trademark"; "←", "←", "left arrow"; "←", "←", "left arrow"; "↑", "↑", "up arrow"; "↑", "↑", "up arrow"; "→", "→", "right arrow"; "→", "→", "right arrow"; "↓", "↓", "down arrow"; "↓", "↓", "down arrow"; "↔", "↔", "left right arrow"; "↔", "↔", "left right arrow"; "↵", "↵", "carriage return arrow"; "↵", "↵", "carriage return arrow"; "⌈", "⌈", "left ceiling"; "⌈", "⌈", "left ceiling"; "⌉", "⌉", "right ceiling"; "⌉", "⌉", "right ceiling"; "⌊", "⌊", "left floor"; "⌊", "⌊", "left floor"; "⌋", "⌋", "right floor"; "⌋", "⌋", "right floor"; "◊", "◊", "lozenge"; "◊", "◊", "lozenge"; "♠", "♠", "spade"; "♠", "♠", "spade"; "♣", "♣", "club"; "♣", "♣", "club"; "♥", "♥", "heart"; "♥", "♥", "heart"; "♦", "♦", "diamond"; "♦", "♦", "diamond"; ] omd-1.3.2/src/implementation_notes.md000066400000000000000000000101431425763206400176520ustar00rootroot00000000000000# Notes on the Implementation and Semantics of omd ## In short I believe that all features described in have now been implemented. Extensive testing should be done. Parsing: it mainly relies on the property that two consecutive tokens produced by the lexer cannot designate the same "thing". For instance, there can't be [Word "foo"; Word "bar"] because it should be [Word "foobar"] instead. There can't be [Newlines 4; Newline; Newlines 3] because it should be [Newlines(10)] (yes it's 10, not 8). ## More details ### Checklist * HTML * As in "standard" Markdown, it's a "subset" of HTML with **restrictions** on the syntax that is supported. For instance, one cannot write `< a href...` instead of ``. For instance, `"plop\\nhello"` is translated to `"

plop
hello

"` (this has been implemented on 2013.08.15) * Code: * verbatim: done (but needs more testing) * syntax-highlighted code: *todo* ### Flaws in Markdown Since there are no errors in Markdown, it means taht everything has a meaning. Sometimes, one has to imagine a meaning that is not too much nonsense. #### Lists ##### Problem Description There are several semantics for a "broken" list such as the following one: ``` * Indentation 1, Element 1 * Indentation 1, Element 2 * Indentation 3, Element 1 * Indentation 3, Element 2 * Indentation 2, Element 1 * Indentation 2, Element 2 * Indentation 3, Element 1 (not 3) * Indentation 1, Element 3 ``` I have chosen the following semantics, because to me that it's the less nonsense I have ever thought about: ##### Semantics Let N be the indentation level of the current element. - If N is equal to the previous indentation, then it's still the same list as the current one. - If N is greater than the previous indentation, then it's a new list. - If N is lesser than the previous indentation, then I check its predecessors: * if N is the level of a predecessor and no other level inbetween is lesser, then it means that the current item belongs to a list that hasn't been closed yet, so I close the current list and I delegate the rest to the closest parent (which does mean that the current item will be processed _again_). * else, it means that it's a kind of wrong level (don't forget N is smaller than the previous indentation), so I close the current list and open a new one with the current item. ``` * Indentation 1, Element 1 * Indentation 1, Element 2 % here I do not close (I1), and I open for the next one (I3) * Indentation 3, Element 1 * Indentation 3, Element 2 % here I close (I3) and open for the next one (I2) * Indentation 2, Element 1 * Indentation 2, Element 2 % here I close (I2) and open for the next one (I3) * Indentation 3, Element 1 (not 3) % here I close (I3) and continue for the next one (I1) * Indentation 1, Element 3 ``` ----- file implementation_notes.md omd-1.3.2/src/omd.ml000066400000000000000000000163261425763206400142150ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) include Omd_representation include Omd_backend let of_input lex ?extensions:e ?default_lang:d s = let module E = Omd_parser.Default_env(struct end) in let module Parser = Omd_parser.Make( struct include E let extensions = match e with Some x -> x | None -> E.extensions let default_lang = match d with Some x -> x | None -> E.default_lang end ) in let md = Parser.parse (lex s) in Parser.make_paragraphs md let of_string = of_input Omd_lexer.lex let of_bigarray = of_input Omd_lexer.lex_bigarray let to_html : ?override:(Omd_representation.element -> string option) -> ?pindent:bool -> ?nl2br:bool -> ?cs:code_stylist -> t -> string = html_of_md let to_text : t -> string = text_of_md let to_markdown : t -> string = markdown_of_md let html_of_string (html:string) : string = html_of_md (Omd_parser.default_parse (Omd_lexer.lex html)) let rec set_default_lang lang = function | Code("", code) :: tl -> Code(lang, code) :: set_default_lang lang tl | Code_block("", code) :: tl -> Code_block(lang, code) :: set_default_lang lang tl (* Recurse on all elements even though code (blocks) are not allowed everywhere. *) | H1 t :: tl -> H1(set_default_lang lang t) :: set_default_lang lang tl | H2 t :: tl -> H2(set_default_lang lang t) :: set_default_lang lang tl | H3 t :: tl -> H3(set_default_lang lang t) :: set_default_lang lang tl | H4 t :: tl -> H4(set_default_lang lang t) :: set_default_lang lang tl | H5 t :: tl -> H5(set_default_lang lang t) :: set_default_lang lang tl | H6 t :: tl -> H6(set_default_lang lang t) :: set_default_lang lang tl | Paragraph t :: tl -> Paragraph(set_default_lang lang t) :: set_default_lang lang tl | Emph t :: tl -> Emph(set_default_lang lang t) :: set_default_lang lang tl | Bold t :: tl -> Bold(set_default_lang lang t) :: set_default_lang lang tl | Ul t :: tl -> Ul(List.map (set_default_lang lang) t) :: set_default_lang lang tl | Ol t :: tl -> Ol(List.map (set_default_lang lang) t) :: set_default_lang lang tl | Ulp t :: tl -> Ulp(List.map (set_default_lang lang) t) :: set_default_lang lang tl | Olp t :: tl -> Olp(List.map (set_default_lang lang) t) :: set_default_lang lang tl | Url(href, t, title) :: tl -> Url(href, set_default_lang lang t, title) :: set_default_lang lang tl | Blockquote t :: tl -> Blockquote(set_default_lang lang t) :: set_default_lang lang tl (* Elements that do not contain Markdown. *) | (Text _|Code _|Code_block _|Br|Hr|NL|Ref _|Img_ref _|Raw _|Raw_block _ |Html _|Html_block _|Html_comment _|Img _|X _) as e :: tl -> e :: set_default_lang lang tl | [] -> [] (* Table of contents ***********************************************************************) (* Given a list of headers — in the order of the document — go to the requested subsection. We first seek for the [number]th header at [level]. *) let rec find_start headers level number subsections = match headers with | [] -> [] | (H1 _, _, _) :: tl -> deal_with_header 1 headers tl level number subsections | (H2 _, _, _) :: tl -> deal_with_header 2 headers tl level number subsections | (H3 _, _, _) :: tl -> deal_with_header 3 headers tl level number subsections | (H4 _, _, _) :: tl -> deal_with_header 4 headers tl level number subsections | (H5 _, _, _) :: tl -> deal_with_header 5 headers tl level number subsections | (H6 _, _, _) :: tl -> deal_with_header 6 headers tl level number subsections | _ :: _ -> assert false and deal_with_header h_level headers tl level number subsections = if h_level > level then (* Skip, right [level]-header not yet reached. *) if number = 0 then (* Assume empty section at [level], do not consume token. *) (match subsections with | [] -> headers (* no subsection to find *) | n :: subsections -> find_start headers (level + 1) n subsections) else find_start tl level number subsections else if h_level = level then ( (* At proper [level]. Have we reached the [number] one? *) if number <= 1 then ( match subsections with | [] -> tl (* no subsection to find *) | n :: subsections -> find_start tl (level + 1) n subsections ) else find_start tl level (number - 1) subsections ) else (* h_level < level *) [] (* Sought [level] has not been found in the current section *) (* Assume we are at the start of the headers we are interested in. Return the list of TOC entries for [min_level] and the [headers] not used for the TOC entries. *) let rec make_toc (headers:(element*string*string)list) ~min_level ~max_level = if min_level > max_level then [], headers else ( match headers with | [] -> [], [] | (H1 t, id, _) :: tl -> toc_entry headers 1 t id tl ~min_level ~max_level | (H2 t, id, _) :: tl -> toc_entry headers 2 t id tl ~min_level ~max_level | (H3 t, id, _) :: tl -> toc_entry headers 3 t id tl ~min_level ~max_level | (H4 t, id, _) :: tl -> toc_entry headers 4 t id tl ~min_level ~max_level | (H5 t, id, _) :: tl -> toc_entry headers 5 t id tl ~min_level ~max_level | (H6 t, id, _) :: tl -> toc_entry headers 6 t id tl ~min_level ~max_level | _ :: _ -> assert false ) and toc_entry headers h_level t id tl ~min_level ~max_level = if h_level > max_level then (* too deep, skip *) make_toc tl ~min_level ~max_level else if h_level < min_level then (* section we wanted the TOC for is finished, do not comsume the token *) [], headers else if h_level = min_level then ( let sub_toc, tl = make_toc tl ~min_level:(min_level + 1) ~max_level in let toc_entry = match sub_toc with | [] -> [Url("#" ^ id, t, ""); NL] | _ -> [Url("#" ^ id, t, ""); NL; Ul sub_toc; NL] in let toc, tl = make_toc tl ~min_level ~max_level in toc_entry :: toc, tl ) else (* h_level > min_level *) let sub_toc, tl = make_toc headers ~min_level:(min_level + 1) ~max_level in let toc, tl = make_toc tl ~min_level ~max_level in [Ul sub_toc] :: toc, tl let toc ?(start=[]) ?(depth=2) md = if depth < 1 then invalid_arg "Omd.toc: ~depth must be >= 1"; let headers = Omd_backend.headers_of_md ~remove_header_links:true md in let headers = match start with | [] -> headers | number :: subsections -> if number < 0 then invalid_arg("Omd.toc: level 1 start must be >= 0"); find_start headers 1 number subsections in let len = List.length start in let toc, _ = make_toc headers ~min_level:(len + 1) ~max_level:(len + depth) in match toc with | [] -> [] | _ -> [Ul(toc)] let add_toc ?start ?depth ?title md = let toc = toc ?start ?depth md in (* Replace "*Table of contents*" with the actual TOC. *) toc omd-1.3.2/src/omd.mli000066400000000000000000000141131425763206400143560ustar00rootroot00000000000000(** A markdown parser in OCaml, with no extra dependencies. This module represents this entire Markdown library written in OCaml only. Its main purpose is to allow you to use the Markdown library while keeping you away from the other modules. If you want to extend the Markdown parser, you can do it without accessing any module of this library but this one, and by doing so, you are free from having to maintain a fork of this library. N.B. This module is supposed to be reentrant, if it's not then please report the bug. *) (************************************************************************) (** {2 Representation of Markdown documents} *) type t = element list (** Representation of a Markdown document. *) and ref_container = (< add_ref: string -> string -> string -> unit ; get_ref : string -> (string*string) option; get_all : (string * (string * string)) list; >) (** A element of a Markdown document. *) and element = Omd_representation.element = | H1 of t (** Header of level 1 *) | H2 of t (** Header of level 2 *) | H3 of t (** Header of level 3 *) | H4 of t (** Header of level 4 *) | H5 of t (** Header of level 5 *) | H6 of t (** Header of level 6 *) | Paragraph of t (** A Markdown paragraph (must be enabled in {!of_string}) *) | Text of string (** Text. *) | Emph of t (** Emphasis (italic) *) | Bold of t (** Bold *) | Ul of t list (** Unumbered list *) | Ol of t list (** Ordered (i.e. numbered) list *) | Ulp of t list | Olp of t list | Code of name * string (** [Code(lang, code)] represent [code] within the text (Markdown: `code`). The language [lang] cannot be specified from Markdown, it can be from {!of_string} though or when programatically generating Markdown documents. Beware that the [code] is taken verbatim from Markdown and may contain characters that must be escaped for HTML. *) | Code_block of name * string (** [Code_block(lang, code)]: a code clock (e.g. indented by 4 spaces in the text). The first parameter [lang] is the language if specified. Beware that the [code] is taken verbatim from Markdown and may contain characters that must be escaped for HTML. *) | Br (** (Forced) line break *) | Hr (** Horizontal rule *) | NL (** Newline character. Newline characters that act like delimiters (e.g. for paragraphs) are removed from the AST. *) | Url of href * t * title | Ref of ref_container * name * string * fallback | Img_ref of ref_container * name * alt * fallback | Html of name * (string * string option) list * t | Html_block of name * (string * string option) list * t | Html_comment of string (** An HTML comment, including "". *) | Raw of string (** Raw: something that shall never be converted *) | Raw_block of string (** Raw_block: a block with contents that shall never be converted *) | Blockquote of t (** Quoted block *) | Img of alt * src * title | X of (< (* extension of [element]. *) name: string; (* N.B. [to_html] means that htmlentities will not be applied to its output. *) to_html: ?indent:int -> (t -> string) -> t -> string option; to_sexpr: (t -> string) -> t -> string option; to_t: t -> t option >) and fallback = < to_string : string ; to_t : t > (** Fallback for references in case they refer to non-existant references *) and name = string (** Markdown reference name. *) and alt = string (** HTML img tag attribute. *) and src = string (** HTML attribute. *) and href = string (** HTML attribute. *) and title = string (** HTML attribute. *) type code_stylist = lang:string -> string -> string (** Function that takes a language name and some code and returns that code with style. *) (************************************************************************) (** {2 Input and Output} *) val of_string : ?extensions:Omd_representation.extensions -> ?default_lang: name -> string -> t (** [of_string s] returns the Markdown representation of the string [s]. @param lang language for blocks of code where it was not specified. Default: [""]. If you want to use a custom lexer or parser, use {!Omd_lexer.lex} and {!Omd_parser.parse}. *) val of_bigarray : ?extensions:Omd_representation.extensions -> ?default_lang: name -> Omd_lexer.bigstring -> t (** As {!of_string}, but read input from a bigarray rather than from a string. *) val set_default_lang : name -> t -> t (** [set_default_lang lang md] return a copy of [md] where the language of all [Code] or [Code_block] with an empty language is set to [lang]. *) val to_html : ?override:(Omd_representation.element -> string option) -> ?pindent:bool -> ?nl2br:bool -> ?cs:code_stylist -> t -> string (** Translate markdown representation into raw HTML. If you need a full HTML representation, you mainly have to figure out how to convert [Html of string] and [Html_block of string] into your HTML representation. *) val to_markdown : t -> string (** Translate markdown representation into textual markdown. *) val to_text : t -> string (** Translate markdown representation into raw text. *) (************************************************************************) (** {2 Tansforming Markdown documents} *) val toc : ?start:int list -> ?depth:int -> t -> t (** [toc md] returns [toc] a table of contents for [md]. @param start gives the section for which the TOC must be built. For example [~start:[2;3]] will build the TOC for subsections of the second [H1] header, and within that section, the third [h2] header. If a number is [0], it means to look for the first section at that level but stop if one encounters any other subsection. If no subsection exists, an empty TOC [[]] will be returned. Default: [[]] i.e. list all sections, starting with the first [H1]. @param depth the table of contents. Default: [2]. *) ;; omd-1.3.2/src/omd_backend.ml000066400000000000000000001115711425763206400156620ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) type code_stylist = lang:string -> string -> string open Printf open Omd_representation open Omd_utils let default_language = ref "" let text_of_md md = let b = Buffer.create 128 in let rec loop = function | X _ :: tl -> loop tl | Blockquote q :: tl -> loop q; loop tl | Ref(rc, name, text, fallback) :: tl -> Buffer.add_string b (htmlentities ~md:true name); loop tl | Img_ref(rc, name, alt, fallback) :: tl -> Buffer.add_string b (htmlentities ~md:true name); loop tl | Paragraph md :: tl -> loop md; Buffer.add_char b '\n'; Buffer.add_char b '\n'; loop tl | Img(alt, src, title) :: tl -> Buffer.add_string b (htmlentities ~md:true alt); loop tl | Text t :: tl -> Buffer.add_string b (htmlentities ~md:true t); loop tl | Raw t :: tl -> Buffer.add_string b t; loop tl | Raw_block t :: tl -> Buffer.add_char b '\n'; Buffer.add_string b t; Buffer.add_char b '\n'; loop tl | Emph md :: tl -> loop md; loop tl | Bold md :: tl -> loop md; loop tl | (Ul l | Ol l) :: tl -> List.iter (fun item -> loop item; Buffer.add_char b '\n') l; loop tl | (Ulp l | Olp l) :: tl -> List.iter loop l; loop tl | Code_block(lang, c) :: tl -> Buffer.add_string b (htmlentities ~md:false c); loop tl | Code(lang, c) :: tl -> Buffer.add_string b (htmlentities ~md:false c); loop tl | Br :: tl -> loop tl | Hr :: tl -> loop tl | Html(tagname, attrs, body) :: tl -> loop body; loop tl | Html_block(tagname, attrs, body) :: tl -> loop body; loop tl | Html_comment s :: tl -> loop tl | Url (href,s,title) :: tl -> loop s; loop tl | H1 md :: tl | H2 md :: tl | H3 md :: tl | H4 md :: tl | H5 md :: tl | H6 md :: tl -> loop md; loop tl | NL :: tl -> Buffer.add_string b "\n"; loop tl | [] -> () in loop md; Buffer.contents b let default_code_stylist ~lang code = code let filter_text_omd_rev l = let rec loop b r = function | [] -> if b then r else l | ("media:type", Some "text/omd")::tl -> loop true r tl | e::tl -> loop b (e::r) tl in loop false [] l let remove_links : t -> t = Omd_representation.visit (fun e -> match e with | Url(_, t, _) -> Some t | _ -> None ) let rec html_and_headers_of_md ?(remove_header_links=false) ?(override=(fun (e:element) -> (None:string option))) ?(pindent=false) ?(nl2br=false) ?cs:(code_style=default_code_stylist) md = let ids = object(this) val mutable ids = StringSet.add "" StringSet.empty method mangle id = let rec m i = if StringSet.mem id ids then let idx = if i > 0 then id^"_"^string_of_int i else id in if StringSet.mem idx ids then m (i+1) else (ids <- StringSet.add idx ids; idx) else (ids <- StringSet.add id ids; id) in m 0 end in let empty s = let rec loop i = if i < String.length s then match s.[i] with | ' ' | '\n' -> loop (i+1) | _ -> false else true in loop 0 in let remove_trailing_blanks s = let rec loop i = if i < 0 then "" else match s.[i] with | ' '|'\t'|'\n' -> loop (pred i) | _ -> if i = String.length s - 1 then s else String.sub s 0 (i+1) in loop (String.length s - 1) in let b = Buffer.create 64 in let headers = ref [] in let rec loop indent = function | X x as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> (match x#to_t md with | Some t -> loop indent t | None -> match x#to_html ~indent:indent (html_of_md ~override ~pindent ~nl2br ~cs:code_style) md with | Some s -> Buffer.add_string b s | None -> ()); loop indent tl end | Blockquote q as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b "
"; loop indent q; Buffer.add_string b "
"; loop indent tl end | Ref(rc, name, text, fallback) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> begin match rc#get_ref name with | Some(href, title) -> loop indent (Url(htmlentities ~md:true href, [Text(text)], htmlentities ~md:true title) ::tl) | None -> loop indent (fallback#to_t); loop indent tl end end | Img_ref(rc, name, alt, fallback) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> begin match rc#get_ref name with | Some(src, title) -> loop indent (Img(htmlentities ~md:true alt, htmlentities ~md:true src, htmlentities ~md:true title)::tl) | None -> loop indent (fallback#to_t); loop indent tl end end | Paragraph [] :: tl -> loop indent tl | Paragraph md as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> (let s = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in if empty s then () else begin Buffer.add_string b "

"; Buffer.add_string b (remove_trailing_blanks s); Buffer.add_string b "

\n"; end); loop indent tl end | Img(alt, src, title) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b "";
          Buffer.add_string b (htmlentities ~md:true alt);
          Buffer.add_string b " "" then (Buffer.add_string b " title='"; Buffer.add_string b (htmlentities ~md:true title); Buffer.add_string b "' "); Buffer.add_string b "/>"; loop indent tl end | Text t as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> (* Buffer.add_string b t; *) Buffer.add_string b (htmlentities ~md:true t); loop indent tl end | Emph md as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b ""; loop indent md; Buffer.add_string b ""; loop indent tl end | Bold md as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b ""; loop indent md; Buffer.add_string b ""; loop indent tl end | (Ul l|Ol l|Ulp l|Olp l as e) :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b (match e with | Ol _|Olp _ -> "
    " | _ -> "
      "); List.iter ( fun li -> Buffer.add_string b "
    • "; loop (indent+2) li; Buffer.add_string b "
    • " ) l; Buffer.add_string b (match e with | Ol _|Olp _ -> "
" | _ -> ""); loop indent tl end | Code_block(lang, c) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> if lang = "" && !default_language = "" then Buffer.add_string b "
"
          else if lang = "" then
            bprintf b "
"
              !default_language !default_language
          else
            bprintf b "
" lang lang;
          let new_c = code_style ~lang:lang c in
          if c = new_c then
            Buffer.add_string b (htmlentities ~md:false c)
          else
            Buffer.add_string b new_c;
          Buffer.add_string b "
"; loop indent tl end | Code(lang, c) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> if lang = "" && !default_language = "" then Buffer.add_string b "" else if lang = "" then bprintf b "" !default_language else bprintf b "" lang; let new_c = code_style ~lang:lang c in if c = new_c then Buffer.add_string b (htmlentities ~md:false c) else Buffer.add_string b new_c; Buffer.add_string b ""; loop indent tl end | Br as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b "
"; loop indent tl end | Hr as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b "
"; loop indent tl end | Raw s as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b s; loop indent tl end | Raw_block s as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b s; loop indent tl end | Html(tagname, attrs, []) as e :: tl when StringSet.mem tagname html_void_elements -> let attrs = filter_text_omd_rev attrs in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs attrs); Printf.bprintf b " />"; loop indent tl end | Html(tagname, attrs, body) as e :: tl -> let attrs = filter_text_omd_rev attrs in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs attrs); Buffer.add_string b ">"; loop indent body; Printf.bprintf b "" tagname; loop indent tl end | Html_block(tagname, attrs, body) as e :: tl -> let attrs = filter_text_omd_rev attrs in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> if body = [] && StringSet.mem tagname html_void_elements then ( Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs attrs); Buffer.add_string b " />"; loop indent tl ) else ( Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs attrs); Buffer.add_string b ">"; loop indent body; Printf.bprintf b "" tagname; loop indent tl ) end | Html_comment s as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> Buffer.add_string b s; loop indent tl end | Url (href,s,title) as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> let s = html_of_md ~override ~pindent ~nl2br ~cs:code_style s in Buffer.add_string b "
"" then begin Buffer.add_string b " title='"; Buffer.add_string b (htmlentities ~md:true title); Buffer.add_string b "'"; end; Buffer.add_string b ">"; Buffer.add_string b s; Buffer.add_string b ""; loop indent tl end | (H1 md as e) :: tl -> let e, md = if not remove_header_links then e, md else let md = remove_links md in H1 md, md in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in let id = id_of_string ids (text_of_md md) in headers := (e, id, ih) :: !headers; Buffer.add_string b "

"; Buffer.add_string b ih; Buffer.add_string b "

"; loop indent tl end | (H2 md as e) :: tl -> let e, md = if not remove_header_links then e, md else let md = remove_links md in H2 md, md in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in let id = id_of_string ids (text_of_md md) in headers := (e, id, ih) :: !headers; Buffer.add_string b "

"; Buffer.add_string b ih; Buffer.add_string b "

"; loop indent tl end | (H3 md as e) :: tl -> let e, md = if not remove_header_links then e, md else let md = remove_links md in H3 md, md in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in let id = id_of_string ids (text_of_md md) in headers := (e, id, ih) :: !headers; Buffer.add_string b "

"; Buffer.add_string b ih; Buffer.add_string b "

"; loop indent tl end | (H4 md as e) :: tl -> let e, md = if not remove_header_links then e, md else let md = remove_links md in H4 md, md in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in let id = id_of_string ids (text_of_md md) in headers := (e, id, ih) :: !headers; Buffer.add_string b "

"; Buffer.add_string b ih; Buffer.add_string b "

"; loop indent tl end | (H5 md as e) :: tl -> let e, md = if not remove_header_links then e, md else let md = remove_links md in H5 md, md in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in let id = id_of_string ids (text_of_md md) in headers := (e, id, ih) :: !headers; Buffer.add_string b "
"; Buffer.add_string b ih; Buffer.add_string b "
"; loop indent tl end | (H6 md as e) :: tl -> let e, md = if not remove_header_links then e, md else let md = remove_links md in H6 md, md in begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> let ih = html_of_md ~override ~pindent ~nl2br ~cs:code_style md in let id = id_of_string ids (text_of_md md) in headers := (e, id, ih) :: !headers; Buffer.add_string b "
"; Buffer.add_string b ih; Buffer.add_string b "
"; loop indent tl end | NL as e :: tl -> begin match override e with | Some s -> Buffer.add_string b s; loop indent tl | None -> if nl2br then Buffer.add_string b "
" else Buffer.add_string b "\n"; loop indent tl end | [] -> () in loop 0 md; Buffer.contents b, List.rev !headers and string_of_attrs attrs = let b = Buffer.create 1024 in List.iter (function | (a, Some v) -> if not(String.contains v '\'') then Printf.bprintf b " %s='%s'" a v else if not(String.contains v '"') then Printf.bprintf b " %s=\"%s\"" a v else Printf.bprintf b " %s=\"%s\"" a v | a, None -> (* if html4 then *) (* Printf.bprintf b " %s='%s'" a a *) (* else *) Printf.bprintf b " %s=''" a (* HTML5 *) ) attrs; Buffer.contents b and html_of_md ?(override=(fun (e:element) -> (None:string option))) ?(pindent=false) ?(nl2br=false) ?cs md = fst (html_and_headers_of_md ~override ~pindent ~nl2br ?cs md) and headers_of_md ?remove_header_links md = snd (html_and_headers_of_md ?remove_header_links md) let rec sexpr_of_md md = let b = Buffer.create 64 in let rec loop = function | X x :: tl -> (match x#to_t md with | Some t -> Buffer.add_string b "(X"; loop t; Buffer.add_string b ")" | None -> match x#to_sexpr sexpr_of_md md with | Some s -> Buffer.add_string b "(X"; Buffer.add_string b s; Buffer.add_string b ")" | None -> match x#to_html ~indent:0 html_of_md md with | Some s -> Buffer.add_string b "(X"; Buffer.add_string b s; Buffer.add_string b ")" | None -> ()); loop tl | Blockquote q :: tl -> Buffer.add_string b "(Blockquote"; loop q; Buffer.add_string b ")"; loop tl | Ref(rc, name, text, _) :: tl -> bprintf b "(Ref %S %S)" name text; loop tl | Img_ref(rc, name, alt, _) :: tl -> bprintf b "(Img_ref %S %S)" name alt; loop tl | Paragraph md :: tl -> Buffer.add_string b "(Paragraph"; loop md; Buffer.add_string b ")"; loop tl | Img(alt, src, title) :: tl -> bprintf b "(Img %S %S %S)" alt src title; loop tl | Text t :: tl -> bprintf b "(Text %S" t; let rec f = function | Text t :: tl -> bprintf b " %S" t; f tl | x -> x in let tl = f tl in bprintf b ")"; loop tl | Emph md :: tl -> Buffer.add_string b "(Emph"; loop md; Buffer.add_string b ")"; loop tl | Bold md :: tl -> Buffer.add_string b "(Bold"; loop md; Buffer.add_string b ")"; loop tl | Ol l :: tl -> bprintf b "(Ol"; List.iter(fun li -> bprintf b "(Li "; loop li; bprintf b ")") l; bprintf b ")"; loop tl | Ul l :: tl -> bprintf b "(Ul"; List.iter(fun li -> bprintf b "(Li "; loop li;bprintf b ")") l; bprintf b ")"; loop tl | Olp l :: tl -> bprintf b "(Olp"; List.iter(fun li -> bprintf b "(Li "; loop li; bprintf b ")") l; bprintf b ")"; loop tl | Ulp l :: tl -> bprintf b "(Ulp"; List.iter(fun li -> bprintf b "(Li "; loop li;bprintf b ")") l; bprintf b ")"; loop tl | Code(lang, c) :: tl -> bprintf b "(Code %S)" c; loop tl | Code_block(lang, c) :: tl -> bprintf b "(Code_block %s)" c; loop tl | Br :: tl -> Buffer.add_string b "(Br)"; loop tl | Hr :: tl -> Buffer.add_string b "(Hr)"; loop tl | Raw s :: tl -> bprintf b "(Raw %S)" s; loop tl | Raw_block s :: tl -> bprintf b "(Raw_block %S)" s; loop tl | Html(tagname, attrs, body) :: tl -> bprintf b "(Html %s %s " tagname (string_of_attrs attrs); loop body; bprintf b ")"; loop tl | Html_block(tagname, attrs, body) :: tl -> bprintf b "(Html_block %s %s " tagname (string_of_attrs attrs); loop body; bprintf b ")"; loop tl | Html_comment s :: tl -> bprintf b "(Html_comment %S)" s; loop tl | Url (href,s,title) :: tl -> bprintf b "(Url %S %S %S)" href (html_of_md s) title; loop tl | H1 md :: tl -> Buffer.add_string b "(H1"; loop md; Buffer.add_string b ")"; loop tl | H2 md :: tl -> Buffer.add_string b "(H2"; loop md; Buffer.add_string b ")"; loop tl | H3 md :: tl -> Buffer.add_string b "(H3"; loop md; Buffer.add_string b ")"; loop tl | H4 md :: tl -> Buffer.add_string b "(H4"; loop md; Buffer.add_string b ")"; loop tl | H5 md :: tl -> Buffer.add_string b "(H5"; loop md; Buffer.add_string b ")"; loop tl | H6 md :: tl -> Buffer.add_string b "(H6"; loop md; Buffer.add_string b ")"; loop tl | NL :: tl -> Buffer.add_string b "(NL)"; loop tl | [] -> () in loop md; Buffer.contents b let escape_markdown_characters s = let b = Buffer.create (String.length s * 2) in for i = 0 to String.length s - 1 do match s.[i] with | '.' as c -> if i > 0 && match s.[i-1] with | '0' .. '9' -> i+1 < String.length s && s.[i+1] = ' ' | _ -> false then Buffer.add_char b '\\'; Buffer.add_char b c | '-' as c -> if (i = 0 || match s.[i-1] with ' '| '\n' -> true | _ -> false) && (i+1 < String.length s && (s.[i+1] = ' '||s.[i+1] = '-')) then Buffer.add_char b '\\'; Buffer.add_char b c | '+' as c -> if (i = 0 || match s.[i-1] with ' '| '\n' -> true | _ -> false) && (i+1 < String.length s && s.[i+1] = ' ') then Buffer.add_char b '\\'; Buffer.add_char b c | '!' as c -> if i+1 < String.length s && s.[i+1] = '[' then Buffer.add_char b '\\'; Buffer.add_char b c | '<' as c -> if i <> String.length s - 1 && (match s.[i+1] with 'a' .. 'z' | 'A' .. 'Z' -> false | _ -> true) then Buffer.add_char b '\\'; Buffer.add_char b c | '>' as c -> if i = 0 || (match s.[i-1] with ' ' | '\n' -> false | _ -> true) then Buffer.add_char b '\\'; Buffer.add_char b c | '#' as c -> if i = 0 || s.[i-1] = '\n' then Buffer.add_char b '\\'; Buffer.add_char b c | '\\' | '[' | ']' | '(' | ')' | '`' | '*' as c -> Buffer.add_char b '\\'; Buffer.add_char b c | c -> Buffer.add_char b c done; Buffer.contents b let rec markdown_of_md md = if debug then eprintf "(OMD) markdown_of_md(%S)\n%!" (sexpr_of_md md); let quote ?(indent=0) s = let b = Buffer.create (String.length s) in let l = String.length s in let rec loop nl i = if i < l then begin if nl && i < l - 1 then (for i = 1 to indent do Buffer.add_char b ' ' done; Buffer.add_string b "> "); match s.[i] with | '\n' -> Buffer.add_char b '\n'; loop true (succ i) | c -> Buffer.add_char b c; loop false (succ i) end else Buffer.contents b in loop true 0 in let b = Buffer.create 64 in let add_spaces n = for i = 1 to n do Buffer.add_char b ' ' done in let references = ref None in let rec loop ?(fst_p_in_li=true) ?(is_in_list=false) list_indent l = (* [list_indent: int] is the indentation level in number of spaces. *) (* [is_in_list: bool] is necessary to know if we are inside a paragraph which is inside a list item because those need to be indented! *) let loop ?(fst_p_in_li=fst_p_in_li) ?(is_in_list=is_in_list) list_indent l = loop ~fst_p_in_li:fst_p_in_li ~is_in_list:is_in_list list_indent l in match l with | X x :: tl -> (match x#to_t md with | Some t -> loop list_indent t | None -> match x#to_html ~indent:0 html_of_md md with | Some s -> Buffer.add_string b s | None -> ()); loop list_indent tl | Blockquote q :: tl -> Buffer.add_string b (quote ~indent:list_indent (markdown_of_md q)); if tl <> [] then Buffer.add_string b "\n"; loop list_indent tl | Ref(rc, name, text, fallback) :: tl -> if !references = None then references := Some rc; loop list_indent (Raw(fallback#to_string)::tl) | Img_ref(rc, name, alt, fallback) :: tl -> if !references = None then references := Some rc; loop list_indent (Raw(fallback#to_string)::tl) | Paragraph [] :: tl -> loop list_indent tl | Paragraph md :: tl -> if is_in_list then if fst_p_in_li then add_spaces (list_indent-2) else add_spaces list_indent; loop ~fst_p_in_li:false list_indent md; Printf.bprintf b "\n\n"; loop ~fst_p_in_li:false list_indent tl | Img(alt, src, title) :: tl -> Printf.bprintf b "![%s](%s \"%s\")" alt src title; loop list_indent tl | Text t :: tl -> Printf.bprintf b "%s" (escape_markdown_characters t); loop list_indent tl | Emph md :: tl -> Buffer.add_string b "*"; loop list_indent md; Buffer.add_string b "*"; loop list_indent tl | Bold md :: tl -> Buffer.add_string b "**"; loop list_indent md; Buffer.add_string b "**"; loop list_indent tl | Ol l :: tl -> if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n' then Buffer.add_char b '\n'; let c = ref 0 in (* don't use List.iteri because it's not in 3.12 *) List.iter(fun li -> incr c; add_spaces list_indent; Printf.bprintf b "%d. " !c; loop ~is_in_list:true (list_indent+4) li; Buffer.add_char b '\n'; ) l; if list_indent = 0 then Buffer.add_char b '\n'; loop list_indent tl | Ul l :: tl -> if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n' then Buffer.add_char b '\n'; List.iter(fun li -> add_spaces list_indent; Printf.bprintf b "- "; loop ~is_in_list:true (list_indent+4) li; Buffer.add_char b '\n'; ) l; if list_indent = 0 then Buffer.add_char b '\n'; loop list_indent tl | Olp l :: tl -> let c = ref 0 in (* don't use List.iteri because it's not in 3.12 *) List.iter(fun li -> if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n' then Buffer.add_char b '\n'; add_spaces list_indent; incr c; bprintf b "%d. " !c; loop ~is_in_list:true (list_indent+4) li; (* Paragraphs => No need of '\n' *) ) l; loop list_indent tl | Ulp l :: tl -> List.iter(fun li -> if Buffer.length b > 0 && Buffer.nth b (Buffer.length b - 1) <> '\n' then Buffer.add_char b '\n'; add_spaces list_indent; bprintf b "+ "; loop ~is_in_list:true (list_indent+4) li; (* Paragraphs => No need of '\n' *) ) l; begin match tl with | (H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _)::_ | NL::(H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _)::_ -> Buffer.add_char b '\n' | _ -> () end; loop list_indent tl | Code(_lang, c) :: tl -> (* FIXME *) let n = (* compute how many backquotes we need to use *) let filter (n:int) (s:int list) = if n > 0 && n < 10 then List.filter (fun e -> e <> n) s else s in let l = String.length c in let rec loop s x b i = if i = l then match filter b s with | hd::_ -> hd | [] -> x+1 else match c.[i] with | '`' -> loop s x (succ b) (succ i) | _ -> loop (filter b s) (max b x) 0 (succ i) in loop [1;2;3;4;5;6;7;8;9;10] 0 0 0 in begin Printf.bprintf b "%s" (String.make n '`'); if c.[0] = '`' then Buffer.add_char b ' '; Printf.bprintf b "%s" c; if c.[String.length c - 1] = '`' then Buffer.add_char b ' '; Printf.bprintf b "%s" (String.make n '`'); end; loop list_indent tl | Code_block(lang, c) :: tl -> let n = (* compute how many backquotes we need to use *) let filter n s = if n > 0 && n < 10 then List.filter (fun e -> e <> n) s else s in let l = String.length c in let rec loop s b i = if i = l then match filter b s with | hd::_ -> hd | [] -> 0 else match c.[i] with | '`' -> loop s (succ b) (succ i) | _ -> loop (filter b s) 0 (succ i) in loop [3;4;5;6;7;8;9;10] 0 0 in let output_indented_block n s = let rec loop p i = if i = String.length s then () else match p with | '\n' -> Printf.bprintf b "%s" (String.make n ' '); Buffer.add_char b s.[i]; loop s.[i] (succ i) | _ -> Buffer.add_char b s.[i]; loop s.[i] (succ i) in loop '\n' 0 in if n = 0 then (* FIXME *) begin (* case where we can't use backquotes *) Buffer.add_char b '\n'; output_indented_block (4+list_indent) c; if tl <> [] then Buffer.add_string b "\n\n" end else begin Buffer.add_string b (String.make (list_indent) ' '); Printf.bprintf b "%s%s\n" (String.make n '`') (if lang = "" then !default_language else lang); output_indented_block (list_indent) c; if Buffer.nth b (Buffer.length b - 1) <> '\n' then Buffer.add_char b '\n'; Buffer.add_string b (String.make (list_indent) ' '); Printf.bprintf b "%s\n" (String.make n '`'); end; loop list_indent tl | Br :: tl -> Buffer.add_string b "
"; loop list_indent tl | Hr :: tl -> Buffer.add_string b "* * *\n"; loop list_indent tl | Raw s :: tl -> Buffer.add_string b s; loop list_indent tl | Raw_block s :: tl -> Buffer.add_char b '\n'; Buffer.add_string b s; Buffer.add_char b '\n'; loop list_indent tl | Html(tagname, attrs, []) :: tl when StringSet.mem tagname html_void_elements -> Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs attrs); Buffer.add_string b " />"; loop list_indent tl | Html(tagname, attrs, body) :: tl -> let a = filter_text_omd_rev attrs in Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs a); Buffer.add_string b ">"; if a == attrs then loop list_indent body else Buffer.add_string b (html_of_md body); Printf.bprintf b "" tagname; loop list_indent tl | (Html_block(tagname, attrs, body))::tl -> let needs_newlines = match tl with | NL :: Paragraph p :: _ | Paragraph p :: _ -> p <> [] | (H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _ | Ul _ | Ol _ | Ulp _ | Olp _ | Code (_, _) | Code_block (_, _) | Text _ | Emph _ | Bold _ | Br |Hr | Url (_, _, _) | Ref (_, _, _, _) | Img_ref (_, _, _, _) | Html (_, _, _) | Blockquote _ | Img (_, _, _)) :: _ -> true | ( Html_block (_, _, _) | Html_comment _ | Raw _|Raw_block _) :: _-> false | X _ :: _ -> false | NL :: _ -> false | [] -> false in if body = [] && StringSet.mem tagname html_void_elements then ( Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs attrs); Buffer.add_string b " />"; if needs_newlines then Buffer.add_string b "\n\n"; loop list_indent tl ) else ( let a = filter_text_omd_rev attrs in Printf.bprintf b "<%s" tagname; Buffer.add_string b (string_of_attrs a); Buffer.add_string b ">"; if a == attrs then loop list_indent body else Buffer.add_string b (html_of_md body); Printf.bprintf b "" tagname; if needs_newlines then Buffer.add_string b "\n\n"; loop list_indent tl ) | Html_comment s :: tl -> Buffer.add_string b s; loop list_indent tl | Url (href,s,title) :: tl -> if title = "" then bprintf b "[%s](%s)" (markdown_of_md s) href else bprintf b "[%s](%s \"%s\")" (markdown_of_md s) href title; loop list_indent tl | H1 md :: tl -> Buffer.add_string b "# "; loop list_indent md; Buffer.add_string b "\n"; loop list_indent tl | H2 md :: tl -> Buffer.add_string b "## "; loop list_indent md; Buffer.add_string b "\n"; loop list_indent tl | H3 md :: tl -> Buffer.add_string b "### "; loop list_indent md; Buffer.add_string b "\n"; loop list_indent tl | H4 md :: tl -> Buffer.add_string b "#### "; loop list_indent md; Buffer.add_string b "\n"; loop list_indent tl | H5 md :: tl -> Buffer.add_string b "##### "; loop list_indent md; Buffer.add_string b "\n"; loop list_indent tl | H6 md :: tl -> Buffer.add_string b "###### "; loop list_indent md; Buffer.add_string b "\n"; loop list_indent tl | NL :: tl -> if Buffer.length b = 1 || (Buffer.length b > 1 && not(Buffer.nth b (Buffer.length b - 1) = '\n' && Buffer.nth b (Buffer.length b - 2) = '\n')) then Buffer.add_string b "\n"; loop list_indent tl | [] -> () in loop 0 md; begin match !references with | None -> () | Some r -> Buffer.add_char b '\n'; List.iter (fun (name, (url, title)) -> if title = "" then bprintf b "[%s]: %s \n" name url else bprintf b "[%s]: %s \"%s\"\n" name url title ) r#get_all end; let res = Buffer.contents b in if debug then eprintf "(OMD) markdown_of_md(%S) => %S\n%!" (sexpr_of_md md) res; res omd-1.3.2/src/omd_backend.mli000066400000000000000000000077421425763206400160370ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) type code_stylist = lang:string -> string -> string (** Function that takes a language name and some code and returns that code with style. *) val default_language : string ref (** default language for code blocks can be set to any name, by default it is the empty string *) val html_of_md : ?override:(Omd_representation.element -> string option) -> ?pindent:bool -> ?nl2br:bool -> ?cs:code_stylist -> Omd_representation.t -> string (** [html_of_md md] returns a string containing the HTML version of [md]. Note that [md] uses the internal representation of Markdown. The optional parameter [override] allows to override an precise behaviour for a constructor of Omd_representation.element, as in the following example: let customized_to_html = Omd.html_of_md ~override:(function | Url (href,s,title) -> Some(" "" then " title='" ^ (Omd_utils.htmlentities ~md:true title) ^ "'" else "") ^ ">" ^ Omd_backend.html_of_md s ^ " target='_blank'") | _ -> None) *) val headers_of_md : ?remove_header_links:bool -> Omd_representation.t -> (Omd_representation.element * string * string) list (** [headers_of_md md] returns a list of 3-tuples; in each of them the first element is the header (e.g., [H1(foo)]), the second is the HTML id (as produced by [html_of_md]), and the third element is the HTML version of [foo]. The third elements of those 3-tuples exist because if you use [html_and_headers_of_md], then you have the guarantee that the HTML version of [foo] is the same for both the headers and the HTML version of [md]. If [remove_header_links], then remove links inside headers (h1, h2, ...). Default value of [remove_header_links]: cf. [html_and_headers_of_md]. *) val html_and_headers_of_md : ?remove_header_links:bool -> ?override:(Omd_representation.element -> string option) -> ?pindent:bool -> ?nl2br:bool -> ?cs:code_stylist -> Omd_representation.t -> string * (Omd_representation.element * Omd_utils.StringSet.elt * string) list (** [html_and_headers_of_md md] is the same as [(html_of_md md, headers_of_md md)] except that it's two times faster. If you need both headers and html, don't use [html_of_md] and [headers_of_md] but this function instead. If [remove_header_links], then remove links inside headers (h1, h2, ...). Default value of [remove_header_links]: false. *) val escape_markdown_characters : string -> string (** [escape_markdown_characters s] returns a string where markdown-significant characters in [s] have been backslash-escaped. Note that [escape_markdown_characters] takes a "raw" string, therefore it doesn't have the whole context in which the string appears, thus the escaping cannot really be minimal. However the implementation tries to minimalise the extra escaping. *) val text_of_md : Omd_representation.t -> string (** [text_of_md md] is basically the same as [html_of_md md] but without the HTML tags in the output. *) val markdown_of_md : Omd_representation.t -> string (** [markdown_of_md md] is basically the same as [html_of_md md] but with the output in Markdown syntax rather than HTML. *) val sexpr_of_md : Omd_representation.t -> string (** [sexpr_of_md md] is basically the same as [html_of_md md] but with the output in s-expressions rather than HTML. This is mainly used for debugging. *) omd-1.3.2/src/omd_html.ml000066400000000000000000000032351425763206400152340ustar00rootroot00000000000000(***********************************************************************) (* OMD: Markdown tool in OCaml *) (* (c) 2014 by Philippe Wang *) (* Licence: ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) type html = html_node list and html_node = | Node of nodename * attributes * html | Data of string | Rawdata of string | Comment of string and nodename = string and attributes = attribute list and attribute = string * string option let to_string html = let b = Buffer.create 1024 in let pp f = Printf.bprintf b f in let rec loop = function | Node(nodename, attributes, html) -> pp "<%s" nodename; ppa attributes; pp ">"; List.iter loop html; pp "" nodename | Data s -> pp "%s" s | Rawdata s -> pp "%s" s | Comment c -> pp "" c and ppa attrs = List.iter (function | (a, Some v) -> if not (String.contains v '\'') then pp " %s='%s'" a v else if not (String.contains v '"') then pp " %s=\"%s\"" a v else ( pp " %s=\"" a; for i = 0 to String.length v - 1 do match v.[i] with | '"' -> pp """ | c -> pp "%c" c done; pp "\"" ) | a, None -> Printf.bprintf b " %s=''" a (* HTML5 *) ) attrs in List.iter loop html; Buffer.contents b omd-1.3.2/src/omd_lexer.ml000066400000000000000000000373661425763206400154230ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) (* Implementation notes ********************************************* * - This module should depend on OCaml's standard library only and * should be as 'pure OCaml' (i.e. depend as least as possible on * external tools) as possible. * - `while' loops are sometimes preferred to recursion because this * may be used on systems where tail recursion is not well * supported. (I tried to write "while" as often as possible, but it * turned out that it was pretty inconvenient, so I do use * recursion. When I have time, I'll do some tests and see if I * need to convert recursive loops into iterative loops. Sorry if it * makes it harder to read.) *) (* class type tag = object method is_me : 'a. 'a -> bool end *) open Omd_representation type token = Omd_representation.tok type t = Omd_representation.tok list let string_of_token = function | Tag (name, o) -> if Omd_utils.debug then "TAG("^name^")" ^ o#to_string else o#to_string | Ampersand -> "&" | Ampersands n -> assert (n >= 0); String.make (2+n) '&' | At -> "@" | Ats n -> assert (n >= 0); String.make (2+n) '@' | Backquote -> "`" | Backquotes n -> assert (n >= 0); String.make (2+n) '`' | Backslash -> "\\" | Backslashs n -> assert (n >= 0); String.make (2+n) '\\' | Bar -> "|" | Bars n -> assert (n >= 0); String.make (2+n) '|' | Caret -> "^" | Carets n -> assert (n >= 0); String.make (2+n) '^' | Cbrace -> "}" | Cbraces n -> assert (n >= 0); String.make (2+n) '}' | Colon -> ":" | Colons n -> assert (n >= 0); String.make (2+n) ':' | Comma -> "," | Commas n -> assert (n >= 0); String.make (2+n) ',' | Cparenthesis -> ")" | Cparenthesiss n -> assert (n >= 0); String.make (2+n) ')' | Cbracket -> "]" | Cbrackets n -> assert (n >= 0); String.make (2+n) ']' | Dollar -> "$" | Dollars n -> assert (n >= 0); String.make (2+n) '$' | Dot -> "." | Dots n -> assert (n >= 0); String.make (2+n) '.' | Doublequote -> "\"" | Doublequotes n -> assert (n >= 0); String.make (2+n) '"' | Exclamation -> "!" | Exclamations n -> assert (n >= 0); String.make (2+n) '!' | Equal -> "=" | Equals n -> assert (n >= 0); String.make (2+n) '=' | Greaterthan -> ">" | Greaterthans n -> assert (n >= 0); String.make (2+n) '>' | Hash -> "#" | Hashs n -> assert (n >= 0); String.make (2+n) '#' | Lessthan -> "<" | Lessthans n -> assert (n >= 0); String.make (2+n) '<' | Minus -> "-" | Minuss n -> assert (n >= 0); String.make (2+n) '-' | Newline -> "\n" | Newlines n -> assert (n >= 0); String.make (2+n) '\n' | Number s -> s | Obrace -> "{" | Obraces n -> assert (n >= 0); String.make (2+n) '{' | Oparenthesis -> "(" | Oparenthesiss n -> assert (n >= 0); String.make (2+n) '(' | Obracket -> "[" | Obrackets n -> assert (n >= 0); String.make (2+n) '[' | Percent -> "%" | Percents n -> assert (n >= 0); String.make (2+n) '%' | Plus -> "+" | Pluss n -> assert (n >= 0); String.make (2+n) '+' | Question -> "?" | Questions n -> assert (n >= 0); String.make (2+n) '?' | Quote -> "'" | Quotes n -> assert (n >= 0); String.make (2+n) '\'' | Semicolon -> ";" | Semicolons n -> assert (n >= 0); String.make (2+n) ';' | Slash -> "/" | Slashs n -> assert (n >= 0); String.make (2+n) '/' | Space -> " " | Spaces n -> assert (n >= 0); String.make (2+n) ' ' | Star -> "*" | Stars n -> assert (n >= 0); String.make (2+n) '*' | Tab -> " " | Tabs n -> assert (n >= 0); String.make ((2+n)*4) ' ' | Tilde -> "~" | Tildes n -> assert (n >= 0); String.make (2+n) '~' | Underscore -> "_" | Underscores n -> assert (n >= 0); String.make (2+n) '_' | Word s -> s let size_and_newlines = function | Tag _ -> (0, 0) | Ampersand | At | Backquote | Backslash | Bar | Caret | Cbrace | Colon | Comma | Cparenthesis | Cbracket | Dollar | Dot | Doublequote | Exclamation | Equal | Greaterthan | Hash | Lessthan | Minus | Obrace | Oparenthesis | Obracket | Percent | Plus | Question | Quote | Semicolon | Slash | Space | Star | Tab | Tilde | Underscore -> (1, 0) | Ampersands x | Ats x | Backquotes x | Backslashs x | Bars x | Carets x | Cbraces x | Colons x | Commas x | Cparenthesiss x | Cbrackets x | Dollars x | Dots x | Doublequotes x | Exclamations x | Equals x | Greaterthans x | Hashs x | Lessthans x | Minuss x | Obraces x | Oparenthesiss x | Obrackets x | Percents x | Pluss x | Questions x | Quotes x | Semicolons x | Slashs x | Spaces x | Stars x | Tabs x | Tildes x | Underscores x -> (2+x, 0) | Newline -> (0, 1) | Newlines x -> (0, 2+x) | Number s | Word s -> (String.length s, 0) let length t = let c, nl = size_and_newlines t in c + nl let split_first = function | Ampersands n -> Ampersand, (if n > 0 then Ampersands(n-1) else Ampersand) | Ats n -> At, (if n > 0 then Ats(n-1) else At) | Backquotes n -> Backquote, (if n > 0 then Backquotes(n-1) else Backquote) | Backslashs n -> Backslash, (if n > 0 then Backslashs(n-1) else Backslash) | Bars n -> Bar, (if n > 0 then Bars(n-1) else Bar) | Carets n -> Caret, (if n > 0 then Carets(n-1) else Caret) | Cbraces n -> Cbrace, (if n > 0 then Cbraces(n-1) else Cbrace) | Colons n -> Colon, (if n > 0 then Colons(n-1) else Colon) | Commas n -> Comma, (if n > 0 then Commas(n-1) else Comma) | Cparenthesiss n -> Cparenthesis, (if n > 0 then Cparenthesiss(n-1) else Cparenthesis) | Cbrackets n -> Cbracket, (if n > 0 then Cbrackets(n-1) else Cbracket) | Dollars n -> Dollar, (if n > 0 then Dollars(n-1) else Dollar) | Dots n -> Dot, (if n > 0 then Dots(n-1) else Dot) | Doublequotes n -> Doublequote, (if n > 0 then Doublequotes(n-1) else Doublequote) | Exclamations n -> Exclamation, (if n > 0 then Exclamations(n-1) else Exclamation) | Equals n -> Equal, (if n > 0 then Equals(n-1) else Equal) | Greaterthans n -> Greaterthan, (if n > 0 then Greaterthans(n-1) else Greaterthan) | Hashs n -> Hash, (if n > 0 then Hashs(n-1) else Hash) | Lessthans n -> Lessthan, (if n > 0 then Lessthans(n-1) else Lessthan) | Minuss n -> Minus, (if n > 0 then Minuss(n-1) else Minus) | Newlines n -> Newline, (if n > 0 then Newlines(n-1) else Newline) | Obraces n -> Obrace, (if n > 0 then Obraces(n-1) else Obrace) | Oparenthesiss n -> Oparenthesis, (if n > 0 then Oparenthesiss(n-1) else Oparenthesis) | Obrackets n -> Obracket, (if n > 0 then Obrackets(n-1) else Obracket) | Percents n -> Percent, (if n > 0 then Percents(n-1) else Percent) | Pluss n -> Plus, (if n > 0 then Pluss(n-1) else Plus) | Questions n -> Question, (if n > 0 then Questions(n-1) else Question) | Quotes n -> Quote, (if n > 0 then Quotes(n-1) else Quote) | Semicolons n -> Semicolon, (if n > 0 then Semicolons(n-1) else Semicolon) | Slashs n -> Slash, (if n > 0 then Slashs(n-1) else Slash) | Spaces n -> Space, (if n > 0 then Spaces(n-1) else Space) | Stars n -> Star, (if n > 0 then Stars(n-1) else Star) | Tabs n -> Tab, (if n > 0 then Tabs(n-1) else Tab) | Tildes n -> Tilde, (if n > 0 then Tildes(n-1) else Tilde) | Underscores n -> Underscore, (if n > 0 then Underscores(n-1) else Underscore) | Ampersand | At | Backquote | Backslash | Bar | Caret | Cbrace | Colon | Comma | Cparenthesis | Cbracket | Dollar | Dot | Doublequote | Exclamation | Equal | Greaterthan | Hash | Lessthan | Minus | Newline | Number _ | Obrace | Oparenthesis | Obracket | Percent | Plus | Question | Quote | Semicolon | Slash | Space | Star | Tab | Tilde | Underscore | Tag _ | Word _ -> invalid_arg "Omd_lexer.split_first" module type Input = sig type t val length : t -> int val get : t -> int -> char val sub : t -> pos:int -> len:int -> string end module Lex(I : Input) : sig val lex : I.t -> t end = struct let lex (s : I.t) = let result = ref [] in let i = ref 0 in let l = I.length s in let rcount c = (* [rcount c] returns the number of immediate consecutive occurrences of [c]. By side-effect, it increases the reference counter [i]. *) let rec loop r = if !i = l then r else if I.get s !i = c then (incr i; loop (r+1)) else r in loop 1 in let word () = let start = !i in let rec loop () = begin if !i = l then Word(I.sub s ~pos:start ~len:(!i-start)) else match I.get s !i with | ' ' | '\t' | '\n' | '\r' | '#' | '*' | '-' | '+' | '`' | '\'' | '"' | '\\' | '_' | '[' | ']' | '{' | '}' | '(' | ')' | ':' | ';' | '>' | '~' | '<' | '@' | '&' | '|' | '^' | '.' | '/' | '$' | '%' | '!' | '?' | '=' -> Word(I.sub s ~pos:start ~len:(!i-start)) | c -> incr i; loop() end in loop() in let maybe_number () = let start = !i in while !i < l && match I.get s !i with | '0' .. '9' -> true | _ -> false do incr i done; if !i = l then Number(I.sub s ~pos:start ~len:(!i-start)) else begin match I.get s !i with | ' ' | '\t' | '\n' | '\r' | '#' | '*' | '-' | '+' | '`' | '\'' | '"' | '\\' | '_' | '[' | ']' | '{' | '}' | '(' | ')' | ':' | ';' | '>' | '~' | '<' | '@' | '&' | '|' | '^' | '.' | '/' | '$' | '%' | '!' | '?' | '=' -> Number(I.sub s ~pos:start ~len:(!i-start)) | _ -> i := start; word() end in let n_occ c = incr i; rcount c in while !i < l do let c = I.get s !i in let w = match c with | ' ' -> let n = n_occ c in if n = 1 then Space else Spaces (n-2) | '\t' -> let n = n_occ c in if n = 1 then Spaces(2) else Spaces(4*n-2) | '\n' -> let n = n_occ c in if n = 1 then Newline else Newlines (n-2) | '\r' -> (* eliminating \r by converting all styles to unix style *) incr i; let rec count_rn x = if !i < l && I.get s (!i) = '\n' then if !i + 1 < l && I.get s (!i+1) = '\r' then (i := !i + 2; count_rn (x+1)) else x else x in let rn = 1 + count_rn 0 in if rn = 1 then match n_occ c with | 1 -> Newline | x -> assert(x>=2); Newlines(x-2) else (assert(rn>=2);Newlines(rn-2)) | '#' -> let n = n_occ c in if n = 1 then Hash else Hashs (n-2) | '*' -> let n = n_occ c in if n = 1 then Star else Stars (n-2) | '-' -> let n = n_occ c in if n = 1 then Minus else Minuss (n-2) | '+' -> let n = n_occ c in if n = 1 then Plus else Pluss (n-2) | '`' -> let n = n_occ c in if n = 1 then Backquote else Backquotes (n-2) | '\'' -> let n = n_occ c in if n = 1 then Quote else Quotes (n-2) | '"' -> let n = n_occ c in if n = 1 then Doublequote else Doublequotes (n-2) | '\\' -> let n = n_occ c in if n = 1 then Backslash else Backslashs (n-2) | '_' -> let n = n_occ c in if n = 1 then Underscore else Underscores (n-2) | '[' -> let n = n_occ c in if n = 1 then Obracket else Obrackets (n-2) | ']' -> let n = n_occ c in if n = 1 then Cbracket else Cbrackets (n-2) | '{' -> let n = n_occ c in if n = 1 then Obrace else Obraces (n-2) | '}' -> let n = n_occ c in if n = 1 then Cbrace else Cbraces (n-2) | '(' -> let n = n_occ c in if n = 1 then Oparenthesis else Oparenthesiss (n-2) | ')' -> let n = n_occ c in if n = 1 then Cparenthesis else Cparenthesiss (n-2) | ':' -> let n = n_occ c in if n = 1 then Colon else Colons (n-2) | ';' -> let n = n_occ c in if n = 1 then Semicolon else Semicolons (n-2) | '>' -> let n = n_occ c in if n = 1 then Greaterthan else Greaterthans (n-2) | '~' -> let n = n_occ c in if n = 1 then Tilde else Tildes (n-2) | '<' -> let n = n_occ c in if n = 1 then Lessthan else Lessthans (n-2) | '@' -> let n = n_occ c in if n = 1 then At else Ats (n-2) | '&' -> let n = n_occ c in if n = 1 then Ampersand else Ampersands (n-2) | '|' -> let n = n_occ c in if n = 1 then Bar else Bars (n-2) | '^' -> let n = n_occ c in if n = 1 then Caret else Carets (n-2) | ',' -> let n = n_occ c in if n = 1 then Comma else Commas (n-2) | '.' -> let n = n_occ c in if n = 1 then Dot else Dots (n-2) | '/' -> let n = n_occ c in if n = 1 then Slash else Slashs (n-2) | '$' -> let n = n_occ c in if n = 1 then Dollar else Dollars (n-2) | '%' -> let n = n_occ c in if n = 1 then Percent else Percents (n-2) | '=' -> let n = n_occ c in if n = 1 then Equal else Equals (n-2) | '!' -> let n = n_occ c in if n = 1 then Exclamation else Exclamations (n-2) | '?' -> let n = n_occ c in if n = 1 then Question else Questions (n-2) | '0' .. '9' -> maybe_number() | c -> word() in result := w :: !result done; List.rev !result end module Lex_string = Lex(StringLabels) let lex = Lex_string.lex type bigstring = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t module Bigarray_input : Input with type t = bigstring = struct module BA = Bigarray type t = bigstring let get = BA.Array1.get let length = BA.Array1.dim let sub arr ~pos ~len = if len < 0 || pos < 0 || pos + len > BA.Array1.dim arr then invalid_arg "Bigarray_input.sub"; let s = Bytes.create len in for i = 0 to len - 1 do Bytes.unsafe_set s i (BA.Array1.unsafe_get arr (i + pos)) done; Bytes.unsafe_to_string s end module Lex_bigarray = Lex(Bigarray_input) let lex_bigarray = Lex_bigarray.lex let make_space = function | 0 -> invalid_arg "Omd_lexer.make_space" | 1 -> Space | n -> if n < 0 then invalid_arg "Omd_lexer.make_space" else Spaces (n-2) (* (** [string_of_tl l] returns the string representation of l. [estring_of_tl l] returns the escaped string representation of l (same semantics as [String.escaped (string_of_tl l)]). *) let string_of_tl, estring_of_tl = let g escaped tl = let b = Buffer.create 42 in let rec loop : 'a t list -> unit = function | e::tl -> Buffer.add_string b (if escaped then String.escaped (string_of_t e) else string_of_t e); loop tl | [] -> () in Buffer.contents (loop tl; b) in g false, g true *) let string_of_tokens tl = let b = Buffer.create 128 in List.iter (fun e -> Buffer.add_string b (string_of_token e)) tl; Buffer.contents b let destring_of_tokens ?(limit=max_int) tl = let b = Buffer.create 1024 in let rec loop (i:int) (tlist:tok list) : unit = match tlist with | e::tl -> if limit = i then loop i [] else begin Buffer.add_string b (String.escaped (string_of_token e)); Buffer.add_string b "::"; loop (succ i) tl end | [] -> Buffer.add_string b "[]" in Buffer.contents (loop 0 tl; b) omd-1.3.2/src/omd_lexer.mli000066400000000000000000000031111425763206400155510ustar00rootroot00000000000000type token = Omd_representation.tok type t = token list val lex : string -> t (** Translate a raw string into tokens for the parser. To implement an extension to the lexer, one may process its result before giving it to the parser. To implement an extension to the parser, one may extend it using the constructor [Tag] from type [tok] and/or using the extensions mechanism of the parser (cf. the optional argument [extensions]). The main difference is that [Tag] is processed by the parser in highest priority whereas functions in [extensions] are applied with lowest priority. *) type bigstring = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t val lex_bigarray : bigstring -> t (** As {!lex}, but read input from a bigarray rather than from a string. *) val string_of_tokens : t -> string (** [string_of_tokens t] return the string corresponding to the token list [t]. *) val length : token -> int (** [length t] number of characters of the string represented as [t] (i.e. [String.length(string_of_token t)]). *) val string_of_token : token -> string (** [string_of_token tk] return the string corresponding to the token [tk]. *) val make_space : int -> token val split_first : token -> token * token (** [split_first(Xs n)] returns [(X, X(n-1))] where [X] is a token carrying an int count. @raise Invalid_argument is passed a single token. *) val destring_of_tokens : ?limit:int -> t -> string (** Converts the tokens to a simple string representation useful for debugging. *) omd-1.3.2/src/omd_lexer_fs.ml000066400000000000000000000015341425763206400160770ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) (** You should either use this module or Omd_lexer, not both. This module includes Omd_lexer. *) include Omd_lexer let lex_from_inchannel ic = (* Maintenance-easiness-driven implementation. *) let ic_content = let b = Buffer.create 64 in try while true do Buffer.add_char b (input_char ic) done; assert false with End_of_file -> Buffer.contents b in lex ic_content omd-1.3.2/src/omd_lexer_fs.mli000066400000000000000000000010421425763206400162420ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) include module type of Omd_lexer val lex_from_inchannel : in_channel -> Omd_representation.tok list omd-1.3.2/src/omd_main.ml000066400000000000000000000342321425763206400152150ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) (** This module implements an end-user interface for OMD. Treatments that are not specific to Markdown (such as table of contents generation) are done here. If you want to build an alternative end-user Markdown tool using OMD, you might want to fork this file or get inspiration from it. Happy coding! *) open Omd let remove_comments l = let open Omd_representation in let rec loop = function | true, Exclamations n :: tl when n > 0 -> loop (true, Omd_utils.eat (function Newline|Newlines _ -> false|_-> true) tl) | _, (Newline|Newlines _ as e)::tl -> e::loop (true, tl) | _, e::tl -> e::loop (false, tl) | _, [] -> [] in loop (true, l) let remove_endline_comments l = let open Omd_representation in let rec loop = function | Backslash :: (Exclamations n as e) :: tl when n > 0 -> e :: loop tl | Backslashs b :: (Exclamations n as e) :: tl when n > 0 && b mod 2 = 1 -> Backslashs(b-1) :: e :: loop tl | Exclamations n :: tl when n > 0 -> loop (Omd_utils.eat (function Newline|Newlines _ -> false|_-> true) tl) | e::tl -> e::loop tl | [] -> [] in loop l let preprocess_functions = ref [] (** [a += b] is a shortcut for [a := b :: !a] // NON-EXPORTED *) let (+=) a b = a := b :: !a let preprocess l = List.fold_left (fun r e -> e r) l !preprocess_functions let otoc = ref false let toc = ref false let omarkdown = ref false let notags = ref false let toc_depth = ref 2 let toc_start = ref([]: int list) let nl2br = ref false let protect_html_comments = ref false let code_stylist = let module M = Map.Make(String) in object val mutable stylists = M.empty method style ~lang code = try (M.find lang stylists) code with Not_found -> try (M.find "_" stylists) code with Not_found -> code method register ~lang stylist = stylists <- M.add lang stylist stylists end let code_stylist_of_program p = fun code -> let tmp1 = Filename.temp_file "code" "bef" in let tmp2 = Filename.temp_file "code" "aft" in let () = at_exit (fun () -> Sys.remove tmp1; Sys.remove tmp2) in let otmp1 = open_out_bin tmp1 in Printf.fprintf otmp1 "%s%!" code; close_out otmp1; match Sys.command (Printf.sprintf "( cat %s | %s ) > %s" tmp1 p tmp2) with | 0 -> let cat f = let ic = open_in f in let b = Buffer.create 64 in try while true do Buffer.add_char b (input_char ic) done; assert false with End_of_file -> Buffer.contents b in cat tmp2 | _ -> code let register_code_stylist_of_program x = try let i = String.index x '=' in code_stylist#register ~lang:(String.sub x 0 i) (code_stylist_of_program (String.sub x (i+1) (String.length x - (i+1)))) with Not_found | Invalid_argument _ -> Printf.eprintf "Error: Something wrong with [-r %s]\n" x; exit 1 let register_default_language l = Omd_backend.default_language := l (* HTML comments might contain some double-dash (--) that are not well treated by HTML parsers. For instance "" should be translated to "" when we want to ensure that the generated HTML is correct! *) let patch_html_comments l = let htmlcomments s = let b = Buffer.create (String.length s) in for i = 0 to 3 do Buffer.add_char b s.[i] done; for i = 4 to String.length s - 4 do match s.[i] with | '-' as c -> if (i > 4 && s.[i-1] = '-') || (i < String.length s - 5 && s.[i+1] = '-') then Printf.bprintf b "&#%d;" (int_of_char c) else Buffer.add_char b c | c -> Buffer.add_char b c done; for i = String.length s - 3 to String.length s - 1 do Buffer.add_char b s.[i] done; Buffer.contents b in let rec loop accu = function | Html_comment s :: tl -> loop (Html_comment(htmlcomments s)::accu) tl | e :: tl -> loop (e :: accu) tl | [] -> List.rev accu in loop [] l let tag_toc l = let open Omd_representation in let x = object(self) (* [shield] is used to prevent endless loops. If one wants to use system threads at some point, and calls methods of this object concurrently, then there is a real problem. *) val remove = fun e md -> visit (function X(v) when v==e-> Some[] | _ -> None) md method name = "toc" method to_html ?indent:_ f md = let r = f (Omd.toc(remove self md)) in Some r method to_sexpr f md = let r = f (Omd.toc(remove self md)) in Some r method to_t md = let r = (Omd.toc(remove self md)) in Some r end in let rec loop = function | Star:: Word "Table"::Space:: Word "of"::Space:: Word "contents"::Star::tl -> Tag("tag_toc", object method parser_extension r p l = Some(X(x)::r,p,l) method to_string = "" end ) :: loop tl | e::tl -> e::loop tl | [] -> [] in loop l let split_comma_int_list s = if s = "" then [] else ( let l = ref [] in let i = ref 0 in try while true do let j = String.index_from s !i ',' in l := int_of_string(String.sub s !i (j - !i)) :: !l; i := j + 1 done; assert false with Not_found -> l := (int_of_string(String.sub s !i (String.length s - !i))) :: !l; List.rev !l ) module E = Omd_parser.Default_env(struct end) let omd_gh_uemph_or_bold_style = ref E.gh_uemph_or_bold_style let omd_blind_html = ref E.blind_html let omd_strict_html = ref E.strict_html let omd_warning = ref E.warning let omd_warn_error = ref E.warn_error let list_html_tags ~inline = let module Parser = Omd_parser.Make(E) in if inline then Omd_utils.StringSet.iter (fun e -> print_string e; print_char '\n') Parser.inline_htmltags_set else Omd_utils.StringSet.iter (fun e -> print_string e; print_char '\n') Parser.htmltags_set let verbatim_start = ref "" let verbatim_end = ref "" let lex_with_verb_extension s = if !verbatim_start = "" || !verbatim_end = "" then Omd_lexer.lex s else begin let module M = struct type t = Verb of string | To_lex of string end in let open M in let sl = String.length s and stl = String.length !verbatim_start and enl = String.length !verbatim_end in let rec seek_start accu from i = if i + stl + enl > sl then To_lex(String.sub s from (sl - from))::accu else if String.sub s i stl = !verbatim_start then seek_end (To_lex(String.sub s from (i - from))::accu) (i+stl) (i+stl) else seek_start accu from (i+1) and seek_end accu from i = if i + enl > sl then To_lex(String.sub s from (sl - from))::accu else if String.sub s i enl = !verbatim_end then seek_start (Verb(String.sub s from (i - from))::accu) (i+enl) (i+enl) else seek_end accu from (i+1) in let first_pass () = seek_start [] 0 0 in let second_pass l = List.rev_map (function | To_lex x -> Omd_lexer.lex x | Verb x -> [Omd_representation.Tag( "raw", object method parser_extension r p l = match p with | [] | [Omd_representation.Newlines _] -> Some(Raw_block x :: r, [Omd_representation.Space], l) | _ -> Some(Raw x :: r, [Omd_representation.Space], l) method to_string = x end )] ) l in List.flatten(second_pass(first_pass())) end let main () = let input = ref [] and output = ref "" in Arg.( parse (align[ "-o", Set_string output, "file.html Specify the output file (default is stdout)."; "--", Rest(fun s -> input := s :: !input), " Consider all remaining arguments as input file names."; "-u", Clear(omd_gh_uemph_or_bold_style), " Use standard Markdown style for emph/bold when using `_'."; "-c", Unit(fun () -> preprocess_functions += remove_endline_comments), " Ignore lines that start with `!!!' (3 or more exclamation points)."; "-C", Unit(fun () -> preprocess_functions += remove_comments), " Ignore everything on a line after `!!!' \ (3 or more exclamation points)."; "-m", Set(omarkdown), " Output Markdown instead of HTML."; "-notags", Set(notags), " Output without the HTML tags."; "-toc", Set(toc), " Replace `*Table of contents*' by the table of contents."; "-otoc", Set(otoc), " Output only the table of contents."; "-ts", String(fun l -> toc_start := split_comma_int_list l), "f Section for the Table of contents (default: all)."; "-td", Set_int(toc_depth), "f Table of contents depth (default is 2)."; "-H", Set(protect_html_comments), " Protect HTML comments."; "-r", String(register_code_stylist_of_program), "l=p Register program p as a code highlighter for language l."; "-R", String(register_default_language), "l Registers unknown languages to be l instead of void."; "-nl2br", Set(nl2br), " Convert new lines to
."; "-x", String(ignore), "ext Activate extension ext (not yet implemented)."; "-l", Unit ignore, " List available extensions ext (not yet implemented)."; "-b", Set(omd_blind_html), " Don't check validity of HTML tag names."; "-s", Set(omd_strict_html), " (might not work as expected yet) Block HTML only in block HTML, \ inline HTML only in inline HTML \ (semantics undefined if use both -b and -s)."; "-LHTML", Unit(fun () -> list_html_tags ~inline:false; exit 0), " List all known HTML tags"; "-LHTMLi", Unit(fun () -> list_html_tags ~inline:true; exit 0), " List all known inline HTML tags"; "-version", Unit(fun () -> print_endline "This is version VERSION."; exit 0), " Print version."; "-VS", Set_string(verbatim_start), "start Set the start token to use to declare a verbatim section. \ If you use -VE, you must use -VS, and both must be non-empty."; "-VE", Set_string(verbatim_end), "end Set the end token to use to declare a verbatim section. \ If you use -VE, you must use -VS, and both must be non-empty."; "-w", Set(omd_warning), " Activate warnings (beta)."; "-W", Set(omd_warn_error), " Convert warnings to errors, implies -w (beta)."; ]) (fun s -> input := s :: !input) "omd [options] [inputfile1 .. inputfileN] [options]" ); let input_files = if !input = [] then [stdin] else List.rev_map (open_in) !input in let output = if !output = "" then stdout else open_out_bin !output in List.iter (fun ic -> let b = Buffer.create 64 in try while true do Buffer.add_char b (input_char ic) done; assert false with End_of_file -> let lexed = lex_with_verb_extension(Buffer.contents b) in let preprocessed = preprocess (if !toc then tag_toc lexed else lexed) in let module E = Omd_parser.Default_env(struct end) in let module Parser = Omd_parser.Make( struct include E let warning = !omd_warning || !omd_warn_error let warn_error = !omd_warn_error let gh_uemph_or_bold_style = !omd_gh_uemph_or_bold_style let blind_html = !omd_blind_html let strict_html = !omd_strict_html end) in let parsed1 = Parser.parse preprocessed in let parsed2 = if !protect_html_comments then patch_html_comments parsed1 else parsed1 in let parsed = parsed2 in let o1 = (* make either TOC or paragraphs, or leave as it is *) (if !otoc then Omd.toc ~start:!toc_start ~depth:!toc_depth else Parser.make_paragraphs) parsed in let o2 = (* output either Text or HTML, or markdown *) if !notags then to_text o1 else if !omarkdown then to_markdown o1 else if !toc && not !otoc then to_html ~pindent:true ~nl2br:false ~cs:code_stylist#style (* FIXME: this is a quick fix for -toc which doesn't work if to_html is directly applied to o1, and that seems to have something to do with Parser.make_paragraphs, which seems to prevent tag_toc from working properly when using to_html! *) (Parser.make_paragraphs(Parser.parse(Omd_lexer.lex(to_markdown o1)))) else to_html ~pindent:true ~nl2br:false ~cs:code_stylist#style (* The normal behaviour is to convert directly, like this. *) o1 in output_string output o2; if o2 <> "" && o2.[String.length o2 - 1] <> '\n' then output_char output '\n'; flush output; if false && Omd_utils.debug then print_endline (Omd_backend.sexpr_of_md (Omd_parser.default_parse (preprocess(Omd_lexer.lex (Buffer.contents b))))); ) input_files (* call the main function *) let () = try main () with | Omd_utils.Error msg when not Omd_utils.debug -> Printf.eprintf "(OMD) Error: %s\n" msg; exit 1 | Sys_error msg -> Printf.eprintf "Error: %s\n" msg; exit 1 omd-1.3.2/src/omd_main.mli000066400000000000000000000056521425763206400153720ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) val remove_comments : Omd_representation.tok list -> Omd_representation.tok list (** [remove_comments l] returns [l] without OMD comments. *) val remove_endline_comments : Omd_representation.tok list -> Omd_representation.tok list (** [remove_endline_comments l] returns [l] without OMD endline-comments. *) val preprocess_functions : (Omd_representation.tok list -> Omd_representation.tok list) list ref (** [preprocess_functions] contains the list of preprocessing functions *) val preprocess : Omd_representation.tok list -> Omd_representation.tok list (** [preprocess l] returns [l] to which all preprocessing functions (in reference [preprocess_functions]) have been applied. *) val otoc : bool ref (** flag: output the table of contents only. *) val toc : bool ref (** flag: replace "*Table of contents*" by the table of contents. *) val omarkdown : bool ref (** flag: output Markdown instead of HTML. *) val notags : bool ref (** flag: output HTML but without HTML tags, so it's not really HTML anymore. *) val toc_depth : int ref (** flag: depth of table of contents *) val toc_start : int list ref (** flag: first header level for table of contents *) val nl2br : bool ref (** flag: convert newlines to "
" when output is HTML *) val omd_gh_uemph_or_bold_style : bool ref (** flag: set on the command line, used for instanciating the functor Omd_parser.Make *) val omd_blind_html : bool ref (** flag: set on the command line, used for instanciating the functor Omd_parser.Make *) val omd_strict_html : bool ref (** flag: set on the command line, used for instanciating the functor Omd_parser.Make *) val protect_html_comments : bool ref (** flag: for multiple dashes in HTML comments, replace dashes by - *) val patch_html_comments : Omd.element list -> Omd.element list (** [patch_html_comments l] returns the list [l] where all [Html_comments s] have been converted to [Html_comments s'], where [s'] means [s] with dashes replaced by - except for single dashes (which are left untouched). N.B. It seems that it's not valid to have double dashes inside HTML comments (cf. http://validator.w3.org/check). So one way to make life somewhat easier is to patch the comments and transform inner dashed to -. *) val tag_toc : Omd_representation.tok list -> Omd_representation.tok list (** [tag_toc l] returns [l] where *Table of contents* has been replaced by a tag that can generate a table of contents. *) val main : unit -> unit (** main function *) omd-1.3.2/src/omd_parser.ml000066400000000000000000005242101425763206400155650ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013-2014 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) let sdebug = true open Printf open Omd_representation open Omd_utils module L = Omd_lexer type r = Omd_representation.t (** accumulator (beware, reversed tokens) *) and p = Omd_representation.tok list (** context information: previous elements *) and l = Omd_representation.tok list (** tokens to parse *) and main_loop = ?html:bool -> r -> (* accumulator (beware, reversed tokens) *) p -> (* info: previous elements *) l -> (* tokens to parse *) Omd_representation.t (* final result *) (** most important loop *) (** N.B. Please do not use tabulations in your Markdown file! *) module type Env = sig val rc: Omd_representation.ref_container val extensions : Omd_representation.extensions val default_lang : string val gh_uemph_or_bold_style : bool val blind_html : bool val strict_html : bool val warning : bool val warn_error : bool end module Unit = struct end module Default_env (Unit:sig end) : Env = struct let rc = new Omd_representation.ref_container let extensions = [] let default_lang = "" let gh_uemph_or_bold_style = true let blind_html = false let strict_html = false let warning = false let warn_error = false end module Make (Env:Env) = struct include Env let warn = Omd_utils.warn ~we:warn_error (** set of known HTML codes *) let htmlcodes_set = StringSet.of_list (* This list should be checked... *) (* list extracted from: http://www.w3.org/TR/html4/charset.html *) [ "AElig"; "Aacute"; "Acirc"; "Agrave"; "Alpha"; "Aring"; "Atilde"; "Auml"; "Beta"; "Ccedil"; "Chi"; "Dagger"; "Delta"; "ETH"; "Eacute"; "Ecirc"; "Egrave"; "Epsilon"; "Eta"; "Euml"; "Gamma"; "Iacute"; "Icirc"; "Igrave"; "Iota"; "Iuml"; "Kappa"; "Lambda"; "Mu"; "Ntilde"; "Nu"; "OElig"; "Oacute"; "Ocirc"; "Ograve"; "Omega"; "Omicron"; "Oslash"; "Otilde"; "Ouml"; "Phi"; "Pi"; "Prime"; "Psi"; "Rho"; "Scaron"; "Sigma"; "THORN"; "Tau"; "Theta"; "Uacute"; "Ucirc"; "Ugrave"; "Upsilon"; "Uuml"; "Xi"; "Yacute"; "Yuml"; "Zeta"; "aacute"; "acirc"; "acute"; "aelig"; "agrave"; "alefsym"; "alpha"; "amp"; "and"; "ang"; "aring"; "asymp"; "atilde"; "auml"; "bdquo"; "beta"; "brvbar"; "bull"; "cap"; "ccedil"; "cedil"; "cent"; "chi"; "circ"; "clubs"; "cong"; "copy"; "crarr"; "cup"; "curren"; "dArr"; "dagger"; "darr"; "deg"; "delta"; "diams"; "divide"; "eacute"; "ecirc"; "egrave"; "empty"; "emsp"; "ensp"; "epsilon"; "equiv"; "eta"; "eth"; "euml"; "euro"; "exist"; "fnof"; "forall"; "frac12"; "frac14"; "frac34"; "frasl"; "gamma"; "ge"; "gt"; "hArr"; "harr"; "hearts"; "hellip"; "iacute"; "icirc"; "iexcl"; "igrave"; "image"; "infin"; "int"; "iota"; "iquest"; "isin"; "iuml"; "kappa"; "lArr"; "lambda"; "lang"; "laquo"; "larr"; "lceil"; "ldquo"; "le"; "lfloor"; "lowast"; "loz"; "lrm"; "lsaquo"; "lsquo"; "lt"; "macr"; "mdash"; "micro"; "middot"; "minus"; "mu"; "nabla"; "nbsp"; "ndash"; "ne"; "ni"; "not"; "notin"; "nsub"; "ntilde"; "nu"; "oacute"; "ocirc"; "oelig"; "ograve"; "oline"; "omega"; "omicron"; "oplus"; "or"; "ordf"; "ordm"; "oslash"; "otilde"; "otimes"; "ouml"; "para"; "part"; "permil"; "perp"; "phi"; "pi"; "piv"; "plusmn"; "pound"; "prime"; "prod"; "prop"; "psi"; "quot"; "rArr"; "radic"; "rang"; "raquo"; "rarr"; "rceil"; "rdquo"; "real"; "reg"; "rfloor"; "rho"; "rlm"; "rsaquo"; "rsquo"; "sbquo"; "scaron"; "sdot"; "sect"; "shy"; "sigma"; "sigmaf"; "sim"; "spades"; "sub"; "sube"; "sum"; "sup"; "sup1"; "sup2"; "sup3"; "supe"; "szlig"; "tau"; "there4"; "theta"; "thetasym"; "thinsp"; "thorn"; "tilde"; "times"; "trade"; "uArr"; "uacute"; "uarr"; "ucirc"; "ugrave"; "uml"; "upsih"; "upsilon"; "uuml"; "weierp"; "xi"; "yacute"; "yen"; "yuml"; "zeta"; "zwj"; "zwnj"; ] (** set of known inline HTML tags *) let inline_htmltags_set = (StringSet.of_list (* from https://developer.mozilla.org/en-US/docs/HTML/Inline_elements *) [ "b";"big";"i";"small";"tt"; "abbr";"acronym";"cite";"code";"dfn";"em";"kbd";"strong";"samp";"var"; "a";"bdo";"br";"img";"map";"object";"q";"span";"sub";"sup"; "button";"input";"label";"select";"textarea";]) (** N.B. it seems that there is no clear distinction between inline tags and block-level tags: in HTML4 it was not clear, in HTML5 it's even more complicated. So, the choice *here* is to specify a set of tags considered as "inline", cf. [inline_htmltags_set]. So there will be inline tags, non-inline tags, and unknown tags.*) (** set of HTML tags that may appear out of a body *) let notinbodytags = StringSet.of_list [ "title"; "link"; "meta"; "style"; "html"; "head"; "body"; ] (** All known HTML tags *) let htmltags_set = StringSet.union notinbodytags (StringSet.union inline_htmltags_set (StringSet.of_list [ "a";"abbr";"acronym";"address";"applet";"area";"article";"aside" ;"audio";"b";"base";"basefont";"bdi";"bdo";"big";"blockquote" ;"br";"button";"canvas";"caption";"center";"cite";"code";"col" ;"colgroup";"command";"datalist";"dd";"del";"details";"dfn" ;"dialog";"dir";"div";"dl";"dt";"em";"embed";"fieldset" ;"figcaption";"figure";"font";"footer";"form";"frame";"frameset" ;"h2";"h3";"h4";"h5";"h6" ;"h1";"header";"hr";"i";"iframe";"img";"input";"ins";"kbd" ;"keygen";"label";"legend";"li";"map";"mark";"menu";"meter";"nav" ;"noframes";"noscript";"object";"ol";"optgroup";"option";"output" ;"p";"param";"pre";"progress";"q";"rp";"rt";"ruby";"s";"samp" ;"script";"section";"select";"small";"source";"span";"strike" ;"strong";"style";"sub";"summary";"sup";"table";"tbody";"td" ;"textarea";"tfoot";"th";"thead";"time";"tr";"track";"tt";"u" ;"ul";"var";"video";"wbr" ])) (** This functions fixes bad lexing trees, which may be built when extraction a portion of another lexing tree. *) let fix l = let rec loop accu = function (* code to generate what follows... List.iter (fun e -> Printf.printf " | %s::%s::tl -> if trackfix then eprintf \"%s 1\\n%!\"; loop accu (%ss 0::tl) | %ss n::%s::tl -> if trackfix then eprintf \"%s 2\\n%!\"; loop accu (%ss(n+1)::tl) | %s::%ss n::tl -> if trackfix then eprintf \"%s 3\\n%!\"; loop accu (%ss(n+1)::tl) | %ss a::%ss b::tl -> if trackfix then eprintf \"%s 4\\n%!\"; loop accu (%ss(a+b+2)::tl)" e e e e e e e e e e e e e e e e) ["Ampersand"; "At"; "Backquote"; "Backslash"; "Bar"; "Caret"; "Cbrace"; "Colon"; "Comma"; "Cparenthesis"; "Cbracket"; "Dollar"; "Dot"; "Doublequote"; "Exclamation"; "Equal"; "Greaterthan"; "Hash"; "Lessthan"; "Minus"; "Newline"; "Obrace"; "Oparenthesis"; "Obracket"; "Percent"; "Plus"; "Question"; "Quote"; "Semicolon"; "Slash"; "Space"; "Star"; "Tab"; "Tilde"; "Underscore"]; print_string "| x::tl -> loop (x::accu) tl\n| [] -> List.rev accu\n"; *) | Ampersand::Ampersand::tl -> if trackfix then eprintf "(OMD) Ampersand 1\n"; loop accu (Ampersands 0::tl) | Ampersands n::Ampersand::tl -> if trackfix then eprintf "(OMD) Ampersand 2\n"; loop accu (Ampersands(n+1)::tl) | Ampersand::Ampersands n::tl -> if trackfix then eprintf "(OMD) Ampersand 3\n"; loop accu (Ampersands(n+1)::tl) | Ampersands a::Ampersands b::tl -> if trackfix then eprintf "(OMD) Ampersand 4\n"; loop accu (Ampersands(a+b+2)::tl) | At::At::tl -> if trackfix then eprintf "(OMD) At 1\n"; loop accu (Ats 0::tl) | Ats n::At::tl -> if trackfix then eprintf "(OMD) At 2\n"; loop accu (Ats(n+1)::tl) | At::Ats n::tl -> if trackfix then eprintf "(OMD) At 3\n"; loop accu (Ats(n+1)::tl) | Ats a::Ats b::tl -> if trackfix then eprintf "(OMD) At 4\n"; loop accu (Ats(a+b+2)::tl) | Backquote::Backquote::tl -> if trackfix then eprintf "(OMD) Backquote 1\n"; loop accu (Backquotes 0::tl) | Backquotes n::Backquote::tl -> if trackfix then eprintf "(OMD) Backquote 2\n"; loop accu (Backquotes(n+1)::tl) | Backquote::Backquotes n::tl -> if trackfix then eprintf "(OMD) Backquote 3\n"; loop accu (Backquotes(n+1)::tl) | Backquotes a::Backquotes b::tl -> if trackfix then eprintf "(OMD) Backquote 4\n"; loop accu (Backquotes(a+b+2)::tl) | Backslash::Backslash::tl -> if trackfix then eprintf "(OMD) Backslash 1\n"; loop accu (Backslashs 0::tl) | Backslashs n::Backslash::tl -> if trackfix then eprintf "(OMD) Backslash 2\n"; loop accu (Backslashs(n+1)::tl) | Backslash::Backslashs n::tl -> if trackfix then eprintf "(OMD) Backslash 3\n"; loop accu (Backslashs(n+1)::tl) | Backslashs a::Backslashs b::tl -> if trackfix then eprintf "(OMD) Backslash 4\n"; loop accu (Backslashs(a+b+2)::tl) | Bar::Bar::tl -> if trackfix then eprintf "(OMD) Bar 1\n"; loop accu (Bars 0::tl) | Bars n::Bar::tl -> if trackfix then eprintf "(OMD) Bar 2\n"; loop accu (Bars(n+1)::tl) | Bar::Bars n::tl -> if trackfix then eprintf "(OMD) Bar 3\n"; loop accu (Bars(n+1)::tl) | Bars a::Bars b::tl -> if trackfix then eprintf "(OMD) Bar 4\n"; loop accu (Bars(a+b+2)::tl) | Caret::Caret::tl -> if trackfix then eprintf "(OMD) Caret 1\n"; loop accu (Carets 0::tl) | Carets n::Caret::tl -> if trackfix then eprintf "(OMD) Caret 2\n"; loop accu (Carets(n+1)::tl) | Caret::Carets n::tl -> if trackfix then eprintf "(OMD) Caret 3\n"; loop accu (Carets(n+1)::tl) | Carets a::Carets b::tl -> if trackfix then eprintf "(OMD) Caret 4\n"; loop accu (Carets(a+b+2)::tl) | Cbrace::Cbrace::tl -> if trackfix then eprintf "(OMD) Cbrace 1\n"; loop accu (Cbraces 0::tl) | Cbraces n::Cbrace::tl -> if trackfix then eprintf "(OMD) Cbrace 2\n"; loop accu (Cbraces(n+1)::tl) | Cbrace::Cbraces n::tl -> if trackfix then eprintf "(OMD) Cbrace 3\n"; loop accu (Cbraces(n+1)::tl) | Cbraces a::Cbraces b::tl -> if trackfix then eprintf "(OMD) Cbrace 4\n"; loop accu (Cbraces(a+b+2)::tl) | Colon::Colon::tl -> if trackfix then eprintf "(OMD) Colon 1\n"; loop accu (Colons 0::tl) | Colons n::Colon::tl -> if trackfix then eprintf "(OMD) Colon 2\n"; loop accu (Colons(n+1)::tl) | Colon::Colons n::tl -> if trackfix then eprintf "(OMD) Colon 3\n"; loop accu (Colons(n+1)::tl) | Colons a::Colons b::tl -> if trackfix then eprintf "(OMD) Colon 4\n"; loop accu (Colons(a+b+2)::tl) | Comma::Comma::tl -> if trackfix then eprintf "(OMD) Comma 1\n"; loop accu (Commas 0::tl) | Commas n::Comma::tl -> if trackfix then eprintf "(OMD) Comma 2\n"; loop accu (Commas(n+1)::tl) | Comma::Commas n::tl -> if trackfix then eprintf "(OMD) Comma 3\n"; loop accu (Commas(n+1)::tl) | Commas a::Commas b::tl -> if trackfix then eprintf "(OMD) Comma 4\n"; loop accu (Commas(a+b+2)::tl) | Cparenthesis::Cparenthesis::tl -> if trackfix then eprintf "(OMD) Cparenthesis 1\n"; loop accu (Cparenthesiss 0::tl) | Cparenthesiss n::Cparenthesis::tl -> if trackfix then eprintf "(OMD) Cparenthesis 2\n"; loop accu (Cparenthesiss(n+1)::tl) | Cparenthesis::Cparenthesiss n::tl -> if trackfix then eprintf "(OMD) Cparenthesis 3\n"; loop accu (Cparenthesiss(n+1)::tl) | Cparenthesiss a::Cparenthesiss b::tl -> if trackfix then eprintf "(OMD) Cparenthesis 4\n"; loop accu (Cparenthesiss(a+b+2)::tl) | Cbracket::Cbracket::tl -> if trackfix then eprintf "(OMD) Cbracket 1\n"; loop accu (Cbrackets 0::tl) | Cbrackets n::Cbracket::tl -> if trackfix then eprintf "(OMD) Cbracket 2\n"; loop accu (Cbrackets(n+1)::tl) | Cbracket::Cbrackets n::tl -> if trackfix then eprintf "(OMD) Cbracket 3\n"; loop accu (Cbrackets(n+1)::tl) | Cbrackets a::Cbrackets b::tl -> if trackfix then eprintf "(OMD) Cbracket 4\n"; loop accu (Cbrackets(a+b+2)::tl) | Dollar::Dollar::tl -> if trackfix then eprintf "(OMD) Dollar 1\n"; loop accu (Dollars 0::tl) | Dollars n::Dollar::tl -> if trackfix then eprintf "(OMD) Dollar 2\n"; loop accu (Dollars(n+1)::tl) | Dollar::Dollars n::tl -> if trackfix then eprintf "(OMD) Dollar 3\n"; loop accu (Dollars(n+1)::tl) | Dollars a::Dollars b::tl -> if trackfix then eprintf "(OMD) Dollar 4\n"; loop accu (Dollars(a+b+2)::tl) | Dot::Dot::tl -> if trackfix then eprintf "(OMD) Dot 1\n"; loop accu (Dots 0::tl) | Dots n::Dot::tl -> if trackfix then eprintf "(OMD) Dot 2\n"; loop accu (Dots(n+1)::tl) | Dot::Dots n::tl -> if trackfix then eprintf "(OMD) Dot 3\n"; loop accu (Dots(n+1)::tl) | Dots a::Dots b::tl -> if trackfix then eprintf "(OMD) Dot 4\n"; loop accu (Dots(a+b+2)::tl) | Doublequote::Doublequote::tl -> if trackfix then eprintf "(OMD) Doublequote 1\n"; loop accu (Doublequotes 0::tl) | Doublequotes n::Doublequote::tl -> if trackfix then eprintf "(OMD) Doublequote 2\n"; loop accu (Doublequotes(n+1)::tl) | Doublequote::Doublequotes n::tl -> if trackfix then eprintf "(OMD) Doublequote 3\n"; loop accu (Doublequotes(n+1)::tl) | Doublequotes a::Doublequotes b::tl -> if trackfix then eprintf "(OMD) Doublequote 4\n"; loop accu (Doublequotes(a+b+2)::tl) | Exclamation::Exclamation::tl -> if trackfix then eprintf "(OMD) Exclamation 1\n"; loop accu (Exclamations 0::tl) | Exclamations n::Exclamation::tl -> if trackfix then eprintf "(OMD) Exclamation 2\n"; loop accu (Exclamations(n+1)::tl) | Exclamation::Exclamations n::tl -> if trackfix then eprintf "(OMD) Exclamation 3\n"; loop accu (Exclamations(n+1)::tl) | Exclamations a::Exclamations b::tl -> if trackfix then eprintf "(OMD) Exclamation 4\n"; loop accu (Exclamations(a+b+2)::tl) | Equal::Equal::tl -> if trackfix then eprintf "(OMD) Equal 1\n"; loop accu (Equals 0::tl) | Equals n::Equal::tl -> if trackfix then eprintf "(OMD) Equal 2\n"; loop accu (Equals(n+1)::tl) | Equal::Equals n::tl -> if trackfix then eprintf "(OMD) Equal 3\n"; loop accu (Equals(n+1)::tl) | Equals a::Equals b::tl -> if trackfix then eprintf "(OMD) Equal 4\n"; loop accu (Equals(a+b+2)::tl) | Greaterthan::Greaterthan::tl -> if trackfix then eprintf "(OMD) Greaterthan 1\n"; loop accu (Greaterthans 0::tl) | Greaterthans n::Greaterthan::tl -> if trackfix then eprintf "(OMD) Greaterthan 2\n"; loop accu (Greaterthans(n+1)::tl) | Greaterthan::Greaterthans n::tl -> if trackfix then eprintf "(OMD) Greaterthan 3\n"; loop accu (Greaterthans(n+1)::tl) | Greaterthans a::Greaterthans b::tl -> if trackfix then eprintf "(OMD) Greaterthan 4\n"; loop accu (Greaterthans(a+b+2)::tl) | Hash::Hash::tl -> if trackfix then eprintf "(OMD) Hash 1\n"; loop accu (Hashs 0::tl) | Hashs n::Hash::tl -> if trackfix then eprintf "(OMD) Hash 2\n"; loop accu (Hashs(n+1)::tl) | Hash::Hashs n::tl -> if trackfix then eprintf "(OMD) Hash 3\n"; loop accu (Hashs(n+1)::tl) | Hashs a::Hashs b::tl -> if trackfix then eprintf "(OMD) Hash 4\n"; loop accu (Hashs(a+b+2)::tl) | Lessthan::Lessthan::tl -> if trackfix then eprintf "(OMD) Lessthan 1\n"; loop accu (Lessthans 0::tl) | Lessthans n::Lessthan::tl -> if trackfix then eprintf "(OMD) Lessthan 2\n"; loop accu (Lessthans(n+1)::tl) | Lessthan::Lessthans n::tl -> if trackfix then eprintf "(OMD) Lessthan 3\n"; loop accu (Lessthans(n+1)::tl) | Lessthans a::Lessthans b::tl -> if trackfix then eprintf "(OMD) Lessthan 4\n"; loop accu (Lessthans(a+b+2)::tl) | Minus::Minus::tl -> if trackfix then eprintf "(OMD) Minus 1\n"; loop accu (Minuss 0::tl) | Minuss n::Minus::tl -> if trackfix then eprintf "(OMD) Minus 2\n"; loop accu (Minuss(n+1)::tl) | Minus::Minuss n::tl -> if trackfix then eprintf "(OMD) Minus 3\n"; loop accu (Minuss(n+1)::tl) | Minuss a::Minuss b::tl -> if trackfix then eprintf "(OMD) Minus 4\n"; loop accu (Minuss(a+b+2)::tl) | Newline::Newline::tl -> if trackfix then eprintf "(OMD) Newline 1\n"; loop accu (Newlines 0::tl) | Newlines n::Newline::tl -> if trackfix then eprintf "(OMD) Newline 2\n"; loop accu (Newlines(n+1)::tl) | Newline::Newlines n::tl -> if trackfix then eprintf "(OMD) Newline 3\n"; loop accu (Newlines(n+1)::tl) | Newlines a::Newlines b::tl -> if trackfix then eprintf "(OMD) Newline 4\n"; loop accu (Newlines(a+b+2)::tl) | Obrace::Obrace::tl -> if trackfix then eprintf "(OMD) Obrace 1\n"; loop accu (Obraces 0::tl) | Obraces n::Obrace::tl -> if trackfix then eprintf "(OMD) Obrace 2\n"; loop accu (Obraces(n+1)::tl) | Obrace::Obraces n::tl -> if trackfix then eprintf "(OMD) Obrace 3\n"; loop accu (Obraces(n+1)::tl) | Obraces a::Obraces b::tl -> if trackfix then eprintf "(OMD) Obrace 4\n"; loop accu (Obraces(a+b+2)::tl) | Oparenthesis::Oparenthesis::tl -> if trackfix then eprintf "(OMD) Oparenthesis 1\n"; loop accu (Oparenthesiss 0::tl) | Oparenthesiss n::Oparenthesis::tl -> if trackfix then eprintf "(OMD) Oparenthesis 2\n"; loop accu (Oparenthesiss(n+1)::tl) | Oparenthesis::Oparenthesiss n::tl -> if trackfix then eprintf "(OMD) Oparenthesis 3\n"; loop accu (Oparenthesiss(n+1)::tl) | Oparenthesiss a::Oparenthesiss b::tl -> if trackfix then eprintf "(OMD) Oparenthesis 4\n"; loop accu (Oparenthesiss(a+b+2)::tl) | Obracket::Obracket::tl -> if trackfix then eprintf "(OMD) Obracket 1\n"; loop accu (Obrackets 0::tl) | Obrackets n::Obracket::tl -> if trackfix then eprintf "(OMD) Obracket 2\n"; loop accu (Obrackets(n+1)::tl) | Obracket::Obrackets n::tl -> if trackfix then eprintf "(OMD) Obracket 3\n"; loop accu (Obrackets(n+1)::tl) | Obrackets a::Obrackets b::tl -> if trackfix then eprintf "(OMD) Obracket 4\n"; loop accu (Obrackets(a+b+2)::tl) | Percent::Percent::tl -> if trackfix then eprintf "(OMD) Percent 1\n"; loop accu (Percents 0::tl) | Percents n::Percent::tl -> if trackfix then eprintf "(OMD) Percent 2\n"; loop accu (Percents(n+1)::tl) | Percent::Percents n::tl -> if trackfix then eprintf "(OMD) Percent 3\n"; loop accu (Percents(n+1)::tl) | Percents a::Percents b::tl -> if trackfix then eprintf "(OMD) Percent 4\n"; loop accu (Percents(a+b+2)::tl) | Plus::Plus::tl -> if trackfix then eprintf "(OMD) Plus 1\n"; loop accu (Pluss 0::tl) | Pluss n::Plus::tl -> if trackfix then eprintf "(OMD) Plus 2\n"; loop accu (Pluss(n+1)::tl) | Plus::Pluss n::tl -> if trackfix then eprintf "(OMD) Plus 3\n"; loop accu (Pluss(n+1)::tl) | Pluss a::Pluss b::tl -> if trackfix then eprintf "(OMD) Plus 4\n"; loop accu (Pluss(a+b+2)::tl) | Question::Question::tl -> if trackfix then eprintf "(OMD) Question 1\n"; loop accu (Questions 0::tl) | Questions n::Question::tl -> if trackfix then eprintf "(OMD) Question 2\n"; loop accu (Questions(n+1)::tl) | Question::Questions n::tl -> if trackfix then eprintf "(OMD) Question 3\n"; loop accu (Questions(n+1)::tl) | Questions a::Questions b::tl -> if trackfix then eprintf "(OMD) Question 4\n"; loop accu (Questions(a+b+2)::tl) | Quote::Quote::tl -> if trackfix then eprintf "(OMD) Quote 1\n"; loop accu (Quotes 0::tl) | Quotes n::Quote::tl -> if trackfix then eprintf "(OMD) Quote 2\n"; loop accu (Quotes(n+1)::tl) | Quote::Quotes n::tl -> if trackfix then eprintf "(OMD) Quote 3\n"; loop accu (Quotes(n+1)::tl) | Quotes a::Quotes b::tl -> if trackfix then eprintf "(OMD) Quote 4\n"; loop accu (Quotes(a+b+2)::tl) | Semicolon::Semicolon::tl -> if trackfix then eprintf "(OMD) Semicolon 1\n"; loop accu (Semicolons 0::tl) | Semicolons n::Semicolon::tl -> if trackfix then eprintf "(OMD) Semicolon 2\n"; loop accu (Semicolons(n+1)::tl) | Semicolon::Semicolons n::tl -> if trackfix then eprintf "(OMD) Semicolon 3\n"; loop accu (Semicolons(n+1)::tl) | Semicolons a::Semicolons b::tl -> if trackfix then eprintf "(OMD) Semicolon 4\n"; loop accu (Semicolons(a+b+2)::tl) | Slash::Slash::tl -> if trackfix then eprintf "(OMD) Slash 1\n"; loop accu (Slashs 0::tl) | Slashs n::Slash::tl -> if trackfix then eprintf "(OMD) Slash 2\n"; loop accu (Slashs(n+1)::tl) | Slash::Slashs n::tl -> if trackfix then eprintf "(OMD) Slash 3\n"; loop accu (Slashs(n+1)::tl) | Slashs a::Slashs b::tl -> if trackfix then eprintf "(OMD) Slash 4\n"; loop accu (Slashs(a+b+2)::tl) | Space::Space::tl -> if trackfix then eprintf "(OMD) Space 1\n"; loop accu (Spaces 0::tl) | Spaces n::Space::tl -> if trackfix then eprintf "(OMD) Space 2\n"; loop accu (Spaces(n+1)::tl) | Space::Spaces n::tl -> if trackfix then eprintf "(OMD) Space 3\n"; loop accu (Spaces(n+1)::tl) | Spaces a::Spaces b::tl -> if trackfix then eprintf "(OMD) Space 4\n"; loop accu (Spaces(a+b+2)::tl) | Star::Star::tl -> if trackfix then eprintf "(OMD) Star 1\n"; loop accu (Stars 0::tl) | Stars n::Star::tl -> if trackfix then eprintf "(OMD) Star 2\n"; loop accu (Stars(n+1)::tl) | Star::Stars n::tl -> if trackfix then eprintf "(OMD) Star 3\n"; loop accu (Stars(n+1)::tl) | Stars a::Stars b::tl -> if trackfix then eprintf "(OMD) Star 4\n"; loop accu (Stars(a+b+2)::tl) | Tab::Tab::tl -> if trackfix then eprintf "(OMD) Tab 1\n"; loop accu (Tabs 0::tl) | Tabs n::Tab::tl -> if trackfix then eprintf "(OMD) Tab 2\n"; loop accu (Tabs(n+1)::tl) | Tab::Tabs n::tl -> if trackfix then eprintf "(OMD) Tab 3\n"; loop accu (Tabs(n+1)::tl) | Tabs a::Tabs b::tl -> if trackfix then eprintf "(OMD) Tab 4\n"; loop accu (Tabs(a+b+2)::tl) | Tilde::Tilde::tl -> if trackfix then eprintf "(OMD) Tilde 1\n"; loop accu (Tildes 0::tl) | Tildes n::Tilde::tl -> if trackfix then eprintf "(OMD) Tilde 2\n"; loop accu (Tildes(n+1)::tl) | Tilde::Tildes n::tl -> if trackfix then eprintf "(OMD) Tilde 3\n"; loop accu (Tildes(n+1)::tl) | Tildes a::Tildes b::tl -> if trackfix then eprintf "(OMD) Tilde 4\n"; loop accu (Tildes(a+b+2)::tl) | Underscore::Underscore::tl -> if trackfix then eprintf "(OMD) Underscore 1\n"; loop accu (Underscores 0::tl) | Underscores n::Underscore::tl -> if trackfix then eprintf "(OMD) Underscore 2\n"; loop accu (Underscores(n+1)::tl) | Underscore::Underscores n::tl -> if trackfix then eprintf "(OMD) Underscore 3\n"; loop accu (Underscores(n+1)::tl) | Underscores a::Underscores b::tl -> if trackfix then eprintf "(OMD) Underscore 4\n"; loop accu (Underscores(a+b+2)::tl)| x::tl -> loop (x::accu) tl | [] -> List.rev accu in loop [] l (* Remove all [NL] and [Br] at the beginning. *) let rec remove_initial_newlines = function | [] -> [] | (NL | Br) :: tl -> remove_initial_newlines tl | l -> l (** - recognizes paragraphs - glues following blockquotes *) let make_paragraphs md = let rec loop cp accu = function (* cp means current paragraph *) | [] -> let accu = match cp with | [] | [NL] | [Br] -> accu | (NL|Br)::cp -> Paragraph(List.rev cp)::accu | cp -> Paragraph(List.rev cp)::accu in List.rev accu | Blockquote b1 :: Blockquote b2 :: tl -> loop cp accu (Blockquote(b1@b2):: tl) | Blockquote b :: tl -> let e = Blockquote(loop [] [] b) in (match cp with | [] | [NL] | [Br] -> loop cp (e::accu) tl | _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl) | (Ulp b) :: tl -> let e = Ulp(List.map (fun li -> loop [] [] li) b) in (match cp with | [] | [NL] | [Br] -> loop cp (e::accu) tl | _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl) | (Olp b) :: tl -> let e = Olp(List.map (fun li -> loop [] [] li) b) in (match cp with | [] | [NL] | [Br] -> loop cp (e::accu) tl | _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl) | Html_comment _ as e :: tl -> (match cp with | [] -> loop [] (e::accu) tl | [NL] | [Br] -> loop [] (e::NL::accu) tl | _ -> loop (e::cp) accu tl) | (Raw_block _ | Html_block _) as e :: tl -> (match cp with | [] | [NL] | [Br] -> loop cp (e::cp@accu) tl | _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl) | (Code_block _ | H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _ | Ol _ | Ul _) as e :: tl -> (match cp with | [] | [NL] | [Br] -> loop cp (e::accu) tl | _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl) | Text "\n" :: _ | Paragraph _ :: _ -> invalid_arg "Omd_parser.make_paragraphs" | (NL|Br) :: (NL|Br) :: tl -> let tl = remove_initial_newlines tl in begin match cp with | [] | [NL] | [Br] -> loop [] (NL::NL::accu) tl | _ -> loop [] (Paragraph(List.rev cp)::accu) tl end | X(x) as e :: tl -> (* If the extension returns a block as first element, then consider the extension as a block. However don't take its contents as it is yet, the contents of the extension shall be considered final as late as possible. *) begin match x#to_t md with | None -> loop (e::cp) accu tl | Some(t) -> match t with | ( H1 _ | H2 _ | H3 _ | H4 _ | H5 _ | H6 _ | Paragraph _ | Ul _ | Ol _ | Ulp _ | Olp _ | Code_block _ | Hr | Html_block _ | Raw_block _ | Blockquote _ ) :: _ -> (match cp with | [] | [NL] | [Br] -> loop cp (e::accu) tl | _ -> loop [] (e::Paragraph(List.rev cp)::accu) tl) | _ -> loop (e::cp) accu tl end | e::tl -> loop (e::cp) accu tl in let remove_white_crumbs l = let rec loop = function | [] -> [] | Text " " :: tl | NL::tl | Br::tl -> loop tl | l -> l in List.rev (loop (List.rev l)) in let rec clean_paragraphs = if debug then eprintf "(OMD) clean_paragraphs\n"; function | [] -> [] | Paragraph[]::tl -> tl | Paragraph(p) :: tl -> Paragraph(clean_paragraphs (remove_initial_newlines (remove_white_crumbs(normalise_md p)))) :: clean_paragraphs tl | H1 v :: tl -> H1(clean_paragraphs v) :: clean_paragraphs tl | H2 v :: tl -> H2(clean_paragraphs v) :: clean_paragraphs tl | H3 v :: tl -> H3(clean_paragraphs v) :: clean_paragraphs tl | H4 v :: tl -> H4(clean_paragraphs v) :: clean_paragraphs tl | H5 v :: tl -> H5(clean_paragraphs v) :: clean_paragraphs tl | H6 v :: tl -> H6(clean_paragraphs v) :: clean_paragraphs tl | Emph v :: tl -> Emph(clean_paragraphs v) :: clean_paragraphs tl | Bold v :: tl -> Bold(clean_paragraphs v) :: clean_paragraphs tl | Ul v :: tl -> Ul(List.map clean_paragraphs v) :: clean_paragraphs tl | Ol v :: tl -> Ol(List.map clean_paragraphs v) :: clean_paragraphs tl | Ulp v :: tl -> Ulp(List.map clean_paragraphs v) :: clean_paragraphs tl | Olp v :: tl -> Olp(List.map clean_paragraphs v) :: clean_paragraphs tl | Blockquote v :: tl -> Blockquote(clean_paragraphs v) :: clean_paragraphs tl | Url(href,v,title) :: tl -> Url(href,(clean_paragraphs v),title) :: clean_paragraphs tl | Text _ | Code _ | Code_block _ | Br | Hr | NL | Ref _ | Img_ref _ | Raw _ | Raw_block _ | Html _ | Html_block _ | Html_comment _ | Img _ | X _ as v :: tl -> v :: clean_paragraphs tl in let r = clean_paragraphs(loop [] [] md) in if debug then eprintf "(OMD) clean_paragraphs %S --> %S\n%!" (Omd_backend.sexpr_of_md md) (Omd_backend.sexpr_of_md r); r (** [assert_well_formed] is a developer's function that helps to track badly constructed token lists. This function has an effect only if [trackfix] is [true]. *) let assert_well_formed (l:tok list) : unit = if trackfix then let rec equiv l1 l2 = match l1, l2 with | [], [] -> true | Tag _::tl1, Tag _::tl2-> equiv tl1 tl2 | e1::tl1, e2::tl2 -> e1 = e2 && equiv tl1 tl2 | _ -> false in assert(equiv (fix l) l); () (** Generate fallback for references. *) let extract_fallback main_loop remains l = if debug then eprintf "(OMD) Omd_parser.extract_fallback\n%!"; let rec loop accu = function | [] -> List.rev accu | e::tl as r -> if r == remains then List.rev accu else match e, remains with | Cbrackets 0, Cbracket::r when tl = r -> let accu = Word "]" :: accu in List.rev accu | Cbrackets n, Cbrackets m::r when m + 1 = n && tl = r -> let accu = Word "]" :: accu in List.rev accu | _ -> loop (e::accu) tl in let a = loop [] l in object method to_string = L.string_of_tokens a method to_t = [Text(L.string_of_tokens a)] end let unindent_rev n lexemes = if debug then eprintf "(OMD) CALL: Omd_parser.unindent_rev\n%!"; assert_well_formed lexemes; let rec loop accu cl = function | Newlines x::(Space|Spaces _)::Newlines y::tl -> loop accu cl (Newlines(x+y+2)::tl) | Newline::(Space|Spaces _)::Newlines x::tl -> loop accu cl (Newlines(1+x)::tl) | Newlines x::(Space|Spaces _)::Newline::tl -> loop accu cl (Newlines(1+x)::tl) | Newline::(Space|Spaces _)::Newline::tl -> loop accu cl (Newlines(0)::tl) | (Newline|Newlines 0 as nl)::(Space|Spaces _ as s)::( (Number _::Dot::(Space|Spaces _)::_) | ((Star|Plus|Minus)::(Space|Spaces _)::_) as tl) as l -> if n = L.length s then loop (nl::cl@accu) [] tl else (cl@accu), l | (Newline|Newlines 0 as nl)::(Space|Spaces _ as s)::tl -> let x = L.length s - n in loop (nl::cl@accu) (if x > 0 then [L.make_space x] else []) tl | Newlines(_)::_ as l -> (cl@accu), l | Newline::_ as l -> (cl@accu), l | e::tl -> loop accu (e::cl) tl | [] as l -> (cl@accu), l in match loop [] [] lexemes with | [], right -> [], right | l, right -> assert_well_formed l; l, right let unindent n lexemes = let fst, snd = unindent_rev n lexemes in List.rev fst, snd let rec is_blank = function | (Space | Spaces _ | Newline | Newlines _) :: tl -> is_blank tl | [] -> true | _ -> false let semph_or_bold (n:int) (l:l) = (* FIXME: use rpl call/return convention *) assert_well_formed l; assert (n>0 && n<4); match fsplit ~excl:(function Newlines _ :: _ -> true | _ -> false) ~f:(function | Backslash::Star::tl -> Continue_with([Star;Backslash],tl) | Backslash::Stars 0::tl -> Continue_with([Star;Backslash],Star::tl) | Backslash::Stars n::tl -> Continue_with([Star;Backslash],Stars(n-1)::tl) | (Backslashs b as x)::Star::tl -> if b mod 2 = 0 then Continue_with([x],Star::tl) else Continue_with([Star;x],tl) | (Backslashs b as x)::(Stars 0 as s)::tl -> if b mod 2 = 0 then Continue_with([x],s::tl) else Continue_with([Star;x],Star::tl) | (Backslashs b as x)::(Stars n as s)::tl -> if b mod 2 = 0 then Continue_with([x],s::tl) else Continue_with([Star;x],Stars(n-1)::tl) | (Space|Spaces _ as x)::(Star|Stars _ as s)::tl -> Continue_with([s;x],tl) | (Star|Stars _ as s)::tl -> if L.length s = n then Split([],tl) else Continue | _ -> Continue) l with | None -> None | Some(left,right) -> if is_blank left then None else Some(left,right) let sm_uemph_or_bold (n:int) (l:l) = assert_well_formed l; (* FIXME: use rpl call/return convention *) assert (n>0 && n<4); match fsplit ~excl:(function Newlines _ :: _ -> true | _ -> false) ~f:(function | Backslash::Underscore::tl -> Continue_with([Underscore;Backslash],tl) | Backslash::Underscores 0::tl -> Continue_with([Underscore;Backslash],Underscore::tl) | Backslash::Underscores n::tl -> Continue_with([Underscore;Backslash],Underscores(n-1)::tl) | (Backslashs b as x)::Underscore::tl -> if b mod 2 = 0 then Continue_with([x],Underscore::tl) else Continue_with([Underscore;x],tl) | (Backslashs b as x)::(Underscores 0 as s)::tl -> if b mod 2 = 0 then Continue_with([x],s::tl) else Continue_with([Underscore;x],Underscore::tl) | (Backslashs b as x)::(Underscores n as s)::tl -> if b mod 2 = 0 then Continue_with([x],s::tl) else Continue_with([Underscore;x],Underscores(n-1)::tl) | (Space|Spaces _ as x)::(Underscore|Underscores _ as s)::tl -> Continue_with([s;x],tl) | (Underscore|Underscores _ as s)::tl -> if L.length s = n then Split([],tl) else Continue | _ -> Continue) l with | None -> None | Some(left,right) -> if is_blank left then None else Some(left,right) let gh_uemph_or_bold (n:int) (l:l) = assert_well_formed l; (* FIXME: use rpl call/return convention *) assert (n>0 && n<4); match fsplit ~excl:(function Newlines _ :: _ -> true | _ -> false) ~f:(function | Backslash::Underscore::tl -> Continue_with([Underscore;Backslash],tl) | Backslash::Underscores 0::tl -> Continue_with([Underscore;Backslash],Underscore::tl) | Backslash::Underscores n::tl -> Continue_with([Underscore;Backslash],Underscores(n-1)::tl) | (Backslashs b as x)::Underscore::tl -> if b mod 2 = 0 then Continue_with([x],Underscore::tl) else Continue_with([Underscore;x],tl) | (Backslashs b as x)::(Underscores 0 as s)::tl -> if b mod 2 = 0 then Continue_with([x],s::tl) else Continue_with([Underscore;x],Underscore::tl) | (Backslashs b as x)::(Underscores n as s)::tl -> if b mod 2 = 0 then Continue_with([x],s::tl) else Continue_with([Underscore;x],Underscores(n-1)::tl) | (Space|Spaces _ as x)::(Underscore|Underscores _ as s)::tl -> Continue_with([s;x],tl) | (Underscore|Underscores _ as s)::(Word _|Number _ as w):: tl -> Continue_with([w;s],tl) | (Underscore|Underscores _ as s)::tl -> if L.length s = n then Split([],tl) else Continue | _ -> Continue) l with | None -> None | Some(left,right) -> if is_blank left then None else Some(left,right) let uemph_or_bold n l = assert_well_formed l; (* FIXME: use rpl call/return convention *) if gh_uemph_or_bold_style then gh_uemph_or_bold n l else sm_uemph_or_bold n l let eat_blank = eat (function |Space|Spaces _|Newline|Newlines _ -> true| _ -> false) (* used by tag__maybe_h1 and tag__maybe_h2 *) let setext_title main_loop (l:l) : (Omd_representation.tok list * l) option = assert_well_formed l; let rec detect_balanced_bqs n r l = (* If there's a balanced (complete) backquote-started code block then it should be "ignored", else it means the line that follows is part of a code block, so it's not defining a setext-style title. *) if debug then eprintf "(OMD) detect_balanced_bqs n=%d r=%S l=%S\n%!" n (L.string_of_tokens r) (L.string_of_tokens l); match l with | [] -> None | (Newline|Newlines _)::_ -> None | Backslash::Backquote::tl -> detect_balanced_bqs n (Backquote::Backslash::r) tl | Backslash::Backquotes 0::tl -> detect_balanced_bqs n (Backquote::Backslash::r) (Backquote::tl) | Backslash::Backquotes x::tl -> detect_balanced_bqs n (Backquote::Backslash::r) (Backquotes(x-1)::tl) | Backslashs(m) as b::Backquote::tl when m mod 2 = 1 -> detect_balanced_bqs n (Backquote::b::r) tl | Backslashs(m) as b::Backquotes 0::tl when m mod 2 = 1 -> detect_balanced_bqs n (Backquote::b::r) (Backquote::tl) | Backslashs(m) as b::Backquotes x::tl when m mod 2 = 1 -> detect_balanced_bqs n (Backquote::b::r) (Backquotes(x-1)::tl) | (Backquote as b)::tl when n = 1 -> Some(List.rev (b::r), tl) | (Backquotes x as b)::tl when n = x+2 -> Some(List.rev (b::r), tl) | e::tl -> detect_balanced_bqs n (e::r) tl in let rec loop r = function | [] -> if r = [] then None else Some(List.rev r, []) | Backslash::Backquote::tl -> loop (Backquote::Backslash::r) tl | Backslashs(m) as b::Backquote::tl when m mod 2 = 1 -> loop (Backquote::b::r) tl | Backslash::Backquotes 0::tl -> loop (Backquote::Backslash::r) (Backquote::tl) | Backslash::Backquotes x::tl -> loop (Backquote::Backslash::r) (Backquotes(x-1)::tl) | Backslashs(m) as b::Backquotes 0::tl when m mod 2 = 1 -> loop (Backquote::b::r) (Backquote::tl) | Backslashs(m) as b::Backquotes x::tl when m mod 2 = 1 -> loop (Backquote::b::r) (Backquotes(x-1)::tl) | Backquote::tl -> begin match detect_balanced_bqs 1 [] tl with | Some(bl,tl) -> loop (bl@r) tl | _ -> None end | Backquotes(x)::tl -> begin match detect_balanced_bqs (x+2) [] tl with | Some(bl,tl) -> loop (bl@r) tl | _ -> None end | Newline::(Equal|Equals _|Minus|Minuss _)::tl -> if r = [] then None else Some(List.rev r, tl) | (Newline|Newlines _)::_ -> if debug then eprintf "(OMD) Omd_parser.setext_title is wrongly used!\n%!"; None | e::tl -> loop (e::r) tl in if match l with | Lessthan::Word _::_ -> begin match main_loop [] [] l with | (Html_block _ | Code_block _ | Raw_block _)::_ -> true | _ -> false end | _ -> false then None else let result = loop [] l in if debug then eprintf "(OMD) setext_title l=%S result=%S,%S\n%!" (L.string_of_tokens l) (match result with | None -> "" | Some (x,tl) -> L.string_of_tokens x) (match result with | None -> "" | Some (x,tl) -> L.string_of_tokens tl); result let tag__maybe_h1 (main_loop:main_loop) = Tag("tag__maybe_h1", object method parser_extension r p l = match p with | ([]|[Newline|Newlines _]) -> begin match setext_title main_loop l with | None -> None | Some(title, tl) -> let title = H1(main_loop [] [] title) in Some((title::r), [Newline], tl) end | _ -> if debug then eprintf "(OMD) Warning: Omd_parser.tag__maybe_h1 is wrongly \ used (p=%S)!\n" (L.string_of_tokens p); None method to_string = "" end ) let tag__maybe_h2 (main_loop:main_loop) = Tag("tag__maybe_h2", object method parser_extension r p l = match p with | ([]|[Newline|Newlines _]) -> begin match setext_title main_loop l with | None -> None | Some(title, tl) -> let title = H2(main_loop [] [] title) in Some((title::r), [Newline], tl) end | _ -> if debug then eprintf "(OMD) Warning: Omd_parser.tag__maybe_h2 is wrongly \ used (p=%S)!\n" (L.string_of_tokens p); None method to_string = "" end ) let tag__md md = (* [md] should be in reverse *) Tag("tag__md", object method parser_extension r p l = Some(md@r, [], l) method to_string = "" end ) (* Let's tag the lines that *might* be titles using setext-style. "might" because if they are, for instance, in a code section, then they are not titles at all. *) let tag_setext main_loop lexemes = assert_well_formed lexemes; let rec loop pl res = function | [] | [Newline|Newlines _] -> pl@res | (Newline as e1)::(Equal|Equals _ as e2)::tl -> (* might be a H1. *) begin match fsplit_rev ~f:(function | (Space|Spaces _|Equal|Equals _)::tl -> Continue | [] -> Split([],[]) | _::_ as l -> Split([], l)) tl with | Some(rleft, (([]|(Newline|Newlines _)::_) as right)) -> loop [] (rleft@(e2::e1::pl@tag__maybe_h1 main_loop::res)) right | Some(rleft, right) -> loop [] (rleft@(e2::e1::pl@res)) right | None -> loop [] (e2::e1::pl@res) [] end | (Newline as e1)::(Minus|Minuss _ as e2)::tl -> (* might be a H2. *) begin match fsplit_rev ~f:(function | (Space|Spaces _|Minus|Minuss _)::tl -> Continue | [] -> Split([],[]) | _::_ as l -> Split([], l)) tl with | Some(rleft, (([]|(Newline|Newlines _)::_) as right)) -> loop [] (rleft@(e2::e1::pl@tag__maybe_h2 main_loop::res)) right | Some(rleft, right) -> loop [] (rleft@(e2::e1::pl@res)) right | None -> loop [] (e2::e1::pl@res) [] end | (Newline | Newlines _ as e1)::tl -> loop [] (e1::pl@res) tl | e::tl -> loop (e::pl) res tl in List.rev (loop [] [] lexemes) let hr_m l = assert_well_formed l; let rec loop n = function | ((Newlines _|Newline)::tl) | ([] as tl) -> if n >= 3 then Some tl else None | (Space|Spaces _)::tl -> loop n tl | Minus::tl -> loop (n+1) tl | Minuss x::tl -> loop (x+2+n) tl | _::_ -> None in loop 0 l let hr_s l = assert_well_formed l; let rec loop n = function | ((Newline|Newlines _)::tl) | ([] as tl) -> if n >= 3 then Some tl else None | (Space|Spaces _)::tl -> loop n tl | Star::tl -> loop (n+1) tl | Stars x::tl -> loop (x+2+n) tl | _::_ -> None in loop 0 l let hr l = match hr_m l with | None -> hr_s l | Some _ as tl -> tl (** [bcode] parses code that's delimited by backquote(s) *) let bcode ?(default_lang=default_lang) r p l = assert_well_formed l; let e, tl = match l with | (Backquote|Backquotes _ as e)::tl -> e, tl | _ -> failwith "Omd_parser.bcode is wrongly called" in let rec code_block accu = function | [] -> None | Backquote::tl -> if e = Backquote then match accu with | Newline::accu -> Some(List.rev accu, tl) | _ -> Some(List.rev accu, tl) else code_block (Backquote::accu) tl | (Backquotes n as b)::tl -> if e = b then match accu with | Newline::accu -> Some(List.rev accu, tl) | _ -> Some(List.rev accu, tl) else code_block (b::accu) tl | Tag(_, _)::tl -> code_block accu tl | e::tl -> code_block (e::accu) tl in match code_block [] tl with | None -> None | Some(cb, l) -> if List.exists (function (Newline|Newlines _) -> true | _ -> false) cb && (match p with []|[Newline|Newlines _] -> true | _ -> false) && (match e with Backquotes n when n > 0 -> true | _ -> false) then match cb with | Word lang :: (Space|Spaces _) :: Newline :: tl | Word lang :: Newline :: tl -> let code = L.string_of_tokens tl in Some(Code_block(lang, code) :: r, [Backquote], l) | Word lang :: (Space|Spaces _) :: Newlines 0 :: tl | Word lang :: Newlines 0 :: tl -> let code = L.string_of_tokens(Newline::tl) in Some(Code_block(lang, code) :: r, [Backquote], l) | Word lang :: (Space|Spaces _) :: Newlines n :: tl | Word lang :: Newlines n :: tl -> let code = L.string_of_tokens (Newlines(n-1)::tl) in Some(Code_block(lang, code) :: r, [Backquote], l) | Newline :: tl -> let code = L.string_of_tokens tl in Some(Code_block(default_lang, code) :: r, [Backquote], l) | _ -> let code = L.string_of_tokens cb in Some(Code_block(default_lang, code) :: r, [Backquote], l) else let clean_bcode s = let rec loop1 i = if i = String.length s then 0 else match s.[i] with | ' ' -> loop1(i+1) | _ -> i in let rec loop2 i = if i = -1 then String.length s else match s.[i] with | ' ' -> loop2(i-1) | _ -> i+1 in match loop1 0, loop2 (String.length s - 1) with | 0, n when n = String.length s - 1 -> s | i, n -> String.sub s i (n-i) in let code = L.string_of_tokens cb in if debug then eprintf "(OMD) clean_bcode %S => %S\n%!" code (clean_bcode code); Some(Code(default_lang, clean_bcode code) :: r, [Backquote], l) exception NL_exception exception Premature_ending (* !!DO NOT DELETE THIS!! The program that generates the generated part that follows right after. List.iter (fun (a,b,c) -> print_endline ("let read_until_"^a^" ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: ("^b^" as b) :: tl -> loop (b::accu) n tl | Backslash :: ("^b^"s 0) :: tl -> loop ("^b^"::accu) n ("^b^"::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl" ^(if c<>"" then " | Backslash :: ("^c^" as b) :: tl -> loop (b::accu) n tl | Backslash :: ("^c^"s 0) :: tl -> loop ("^c^"::accu) n ("^c^"::tl) | "^c^" as e :: tl -> loop (e::accu) (n+1) tl | "^c^"s x as e :: tl -> loop (e::accu) (n+x+2) tl " else "")^ " | "^b^" as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | "^b^"s 0 :: tl -> if n = 0 then List.rev accu, "^b^"::tl else loop ("^b^"::accu) (n-1) ("^b^"::tl) | "^b^"s x :: tl -> if n = 0 then List.rev accu, "^b^"s(x-1)::tl else loop (match accu with | "^b^"::accu -> "^b^"s(0)::accu | "^b^"s x::accu -> "^b^"s(x+1)::accu | _ -> "^b^"::accu) (n-1) ("^b^"s(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf \"Omd_parser.read_until_"^a^" %S bq=%b no_nl=%b\\n%!\" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf \"Omd_parser.read_until_"^a^" %S bq=%b no_nl=%b => %S\\n%!\" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res ")) [ "gt", "Greaterthan", "Lessthan"; "lt", "Lessthan", ""; "cparenth", "Cparenthesis", "Oparenthesis"; "oparenth", "Oparenthesis", ""; "dq", "Doublequote", ""; "q", "Quote", ""; "obracket", "Obracket", ""; "cbracket", "Cbracket", "Obracket"; "space", "Space", ""; ] *) (* begin generated part *) let read_until_gt ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Greaterthan as b) :: tl -> loop (b::accu) n tl | Backslash :: (Greaterthans 0) :: tl -> loop (Greaterthan::accu) n (Greaterthan::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Backslash :: (Lessthan as b) :: tl -> loop (b::accu) n tl | Backslash :: (Lessthans 0) :: tl -> loop (Lessthan::accu) n (Lessthan::tl) | Lessthan as e :: tl -> loop (e::accu) (n+1) tl | Lessthans x as e :: tl -> loop (e::accu) (n+x+2) tl | Greaterthan as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Greaterthans 0 :: tl -> if n = 0 then List.rev accu, Greaterthan::tl else loop (Greaterthan::accu) (n-1) (Greaterthan::tl) | Greaterthans x :: tl -> if n = 0 then List.rev accu, Greaterthans(x-1)::tl else loop (match accu with | Greaterthan::accu -> Greaterthans(0)::accu | Greaterthans x::accu -> Greaterthans(x+1)::accu | _ -> Greaterthan::accu) (n-1) (Greaterthans(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_gt %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_gt %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_lt ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Lessthan as b) :: tl -> loop (b::accu) n tl | Backslash :: (Lessthans 0) :: tl -> loop (Lessthan::accu) n (Lessthan::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Lessthan as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Lessthans 0 :: tl -> if n = 0 then List.rev accu, Lessthan::tl else loop (Lessthan::accu) (n-1) (Lessthan::tl) | Lessthans x :: tl -> if n = 0 then List.rev accu, Lessthans(x-1)::tl else loop (match accu with | Lessthan::accu -> Lessthans(0)::accu | Lessthans x::accu -> Lessthans(x+1)::accu | _ -> Lessthan::accu) (n-1) (Lessthans(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_lt %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_lt %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_cparenth ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Cparenthesis as b) :: tl -> loop (b::accu) n tl | Backslash :: (Cparenthesiss 0) :: tl -> loop (Cparenthesis::accu) n (Cparenthesis::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Backslash :: (Oparenthesis as b) :: tl -> loop (b::accu) n tl | Backslash :: (Oparenthesiss 0) :: tl -> loop (Oparenthesis::accu) n (Oparenthesis::tl) | Oparenthesis as e :: tl -> loop (e::accu) (n+1) tl | Oparenthesiss x as e :: tl -> loop (e::accu) (n+x+2) tl | Cparenthesis as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Cparenthesiss 0 :: tl -> if n = 0 then List.rev accu, Cparenthesis::tl else loop (Cparenthesis::accu) (n-1) (Cparenthesis::tl) | Cparenthesiss x :: tl -> if n = 0 then List.rev accu, Cparenthesiss(x-1)::tl else loop (match accu with | Cparenthesis::accu -> Cparenthesiss(0)::accu | Cparenthesiss x::accu -> Cparenthesiss(x+1)::accu | _ -> Cparenthesis::accu) (n-1) (Cparenthesiss(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_cparenth %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_cparenth %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_oparenth ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Oparenthesis as b) :: tl -> loop (b::accu) n tl | Backslash :: (Oparenthesiss 0) :: tl -> loop (Oparenthesis::accu) n (Oparenthesis::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Oparenthesis as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Oparenthesiss 0 :: tl -> if n = 0 then List.rev accu, Oparenthesis::tl else loop (Oparenthesis::accu) (n-1) (Oparenthesis::tl) | Oparenthesiss x :: tl -> if n = 0 then List.rev accu, Oparenthesiss(x-1)::tl else loop (match accu with | Oparenthesis::accu -> Oparenthesiss(0)::accu | Oparenthesiss x::accu -> Oparenthesiss(x+1)::accu | _ -> Oparenthesis::accu) (n-1) (Oparenthesiss(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_oparenth %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_oparenth %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_dq ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Doublequote as b) :: tl -> loop (b::accu) n tl | Backslash :: (Doublequotes 0) :: tl -> loop (Doublequote::accu) n (Doublequote::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Doublequote as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Doublequotes 0 :: tl -> if n = 0 then List.rev accu, Doublequote::tl else loop (Doublequote::accu) (n-1) (Doublequote::tl) | Doublequotes x :: tl -> if n = 0 then List.rev accu, Doublequotes(x-1)::tl else loop (match accu with | Doublequote::accu -> Doublequotes(0)::accu | Doublequotes x::accu -> Doublequotes(x+1)::accu | _ -> Doublequote::accu) (n-1) (Doublequotes(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_dq %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_dq %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_q ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Quote as b) :: tl -> loop (b::accu) n tl | Backslash :: (Quotes 0) :: tl -> loop (Quote::accu) n (Quote::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Quote as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Quotes 0 :: tl -> if n = 0 then List.rev accu, Quote::tl else loop (Quote::accu) (n-1) (Quote::tl) | Quotes x :: tl -> if n = 0 then List.rev accu, Quotes(x-1)::tl else loop (match accu with | Quote::accu -> Quotes(0)::accu | Quotes x::accu -> Quotes(x+1)::accu | _ -> Quote::accu) (n-1) (Quotes(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_q %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_q %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_obracket ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Obracket as b) :: tl -> loop (b::accu) n tl | Backslash :: (Obrackets 0) :: tl -> loop (Obracket::accu) n (Obracket::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Obracket as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Obrackets 0 :: tl -> if n = 0 then List.rev accu, Obracket::tl else loop (Obracket::accu) (n-1) (Obracket::tl) | Obrackets x :: tl -> if n = 0 then List.rev accu, Obrackets(x-1)::tl else loop (match accu with | Obracket::accu -> Obrackets(0)::accu | Obrackets x::accu -> Obrackets(x+1)::accu | _ -> Obracket::accu) (n-1) (Obrackets(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_obracket %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_obracket %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_cbracket ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Cbracket as b) :: tl -> loop (b::accu) n tl | Backslash :: (Cbrackets 0) :: tl -> loop (Cbracket::accu) n (Cbracket::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Backslash :: (Obracket as b) :: tl -> loop (b::accu) n tl | Backslash :: (Obrackets 0) :: tl -> loop (Obracket::accu) n (Obracket::tl) | Obracket as e :: tl -> loop (e::accu) (n+1) tl | Obrackets x as e :: tl -> loop (e::accu) (n+x+2) tl | Cbracket as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Cbrackets 0 :: tl -> if n = 0 then List.rev accu, Cbracket::tl else loop (Cbracket::accu) (n-1) (Cbracket::tl) | Cbrackets x :: tl -> if n = 0 then List.rev accu, Cbrackets(x-1)::tl else loop (match accu with | Cbracket::accu -> Cbrackets(0)::accu | Cbrackets x::accu -> Cbrackets(x+1)::accu | _ -> Cbracket::accu) (n-1) (Cbrackets(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_cbracket %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_cbracket %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res let read_until_space ?(bq=false) ?(no_nl=false) l = assert_well_formed l; let rec loop accu n = function | Backslash :: (Space as b) :: tl -> loop (b::accu) n tl | Backslash :: (Spaces 0) :: tl -> loop (Space::accu) n (Space::tl) | Backslashs 0 :: tl -> loop (Backslash::accu) n tl | Backslashs 1 :: tl -> loop (Backslash::accu) n (Backslash::tl) | Backslashs 2 :: tl -> loop (Backslashs 0::accu) n tl | (Backslashs x) :: tl -> if x mod 2 = 0 then loop (Backslashs(x/2-1)::accu) n tl else loop (Backslashs(x/2-1)::accu) n (Backslash::tl) | (Backquote|Backquotes _ as e)::tl as l -> if bq then match bcode [] [] l with | None -> loop (e::accu) n tl | Some (r, _, tl) -> loop (* not very pretty kind of hack *) (List.rev(L.lex(Omd_backend.markdown_of_md r))@accu) n tl else loop (e::accu) n tl | Space as e :: tl -> if n = 0 then List.rev accu, tl else loop (e::accu) (n-1) tl | Spaces 0 :: tl -> if n = 0 then List.rev accu, Space::tl else loop (Space::accu) (n-1) (Space::tl) | Spaces x :: tl -> if n = 0 then List.rev accu, Spaces(x-1)::tl else loop (match accu with | Space::accu -> Spaces(0)::accu | Spaces x::accu -> Spaces(x+1)::accu | _ -> Space::accu) (n-1) (Spaces(x-1)::tl) | (Newline|Newlines _ as e)::tl -> if no_nl then raise NL_exception else loop (e::accu) n tl | e::tl -> loop (e::accu) n tl | [] -> raise Premature_ending in if debug then eprintf "Omd_parser.read_until_space %S bq=%b no_nl=%b\n%!" (L.string_of_tokens l) bq no_nl; let res = loop [] 0 l in if debug then eprintf "Omd_parser.read_until_space %S bq=%b no_nl=%b => %S\n%!" (L.string_of_tokens l) bq no_nl (L.string_of_tokens (fst res)); res (* /end generated part *) let read_until_newline l = assert_well_formed l; let rec loop accu n = function | ((Backslash as a)) :: ((Newline as b)) :: tl -> loop (b :: a :: accu) n tl | Backslash :: Newlines 0 :: tl -> loop (Newline :: Backslash :: accu) n (Newline :: tl) | ((Backslashs 0 as e)) :: tl -> loop (e :: accu) n tl | ((Backslashs x as e)) :: tl -> if (x mod 2) = 0 then loop (e :: accu) n tl else loop ((Backslashs (x - 1)) :: accu) n (Backslash :: tl) | ((Newline as e)) :: tl -> if n = 0 then ((List.rev accu), tl) else loop (e :: accu) (n - 1) tl | Newlines 0 :: tl -> if n = 0 then ((List.rev accu), (Newline :: tl)) else loop (Newline :: accu) (n - 1) (Newline :: tl) | Newlines n :: tl -> ((List.rev accu), ((Newlines (n - 1)) :: tl)) | e :: tl -> loop (e :: accu) n tl | [] -> raise Premature_ending in loop [] 0 l (* H1, H2, H3, ... *) let read_title (main_loop:main_loop) n r _previous lexemes = let title, rest = let rec loop accu = function | Backslash::Hash::tl -> loop (Hash::Backslash::accu) tl | Backslashs(n)::Hash::tl when n mod 2 = 1 -> loop (Hash::Backslashs(n-1)::accu) tl | Backslash::Hashs(h)::tl -> begin match tl with | [] | (Space|Spaces _)::(Newline|Newlines _)::_ | (Newline|Newlines _)::_ -> loop (Hash::Backslash::accu) ((if h = 0 then Hash else Hashs(h-1))::tl) | _ -> loop (Hashs(h)::Backslash::accu) tl end | Backslashs(n)::Hashs(h)::tl when n mod 2 = 1 -> begin match tl with | [] | (Space|Spaces _)::(Newline|Newlines _)::_ | (Newline|Newlines _)::_ -> loop (Hash::Backslashs(n)::accu) ((if h = 0 then Hash else Hashs(h-1))::tl) | _ -> loop (Hashs(h)::Backslashs(n)::accu) tl end | (Hash|Hashs _) :: ((Newline|Newlines _) :: _ as l) | (Hash|Hashs _) :: (Space|Spaces _) :: ((Newline|Newlines _)::_ as l) | ((Newline|Newlines _) :: _ as l) | ([] as l) | (Space|Spaces _) :: (Hash|Hashs _) :: ((Newline|Newlines _) :: _ as l) | (Space|Spaces _) :: (Hash|Hashs _) :: (Space|Spaces _) :: ((Newline|Newlines _)::_ as l) | (Space|Spaces _) :: ((Newline|Newlines _) :: _ as l) | (Space|Spaces _) :: ([] as l) -> main_loop [] [] (List.rev accu), l | [Hash|Hashs _] | [(Space|Spaces _); Hash|Hashs _] | [(Space|Spaces _); (Hash|Hashs _); (Space|Spaces _)] -> main_loop [] [] (List.rev accu), [] | x::tl -> loop (x::accu) tl in loop [] lexemes in match n with | 1 -> Some(H1 title :: r, [Newline], rest) | 2 -> Some(H2 title :: r, [Newline], rest) | 3 -> Some(H3 title :: r, [Newline], rest) | 4 -> Some(H4 title :: r, [Newline], rest) | 5 -> Some(H5 title :: r, [Newline], rest) | 6 -> Some(H6 title :: r, [Newline], rest) | _ -> None let maybe_extension extensions r p l = match extensions with | [] -> None | _ -> List.fold_left (function | None -> (fun f -> f#parser_extension r p l) | Some(nr, np, nl) as e -> (fun f -> match f#parser_extension nr np nl with | None -> e | Some _ as k -> k) ) None extensions (* blockquotes *) let emailstyle_quoting (main_loop:main_loop) r _p lexemes = assert_well_formed lexemes; let rec loop block cl = function | Newline::Greaterthan::(Newline::_ as tl) -> loop (Newline::cl@block) [] tl | Newline::Greaterthan::Space::tl -> loop (Newline::cl@block) [] tl | Newline::Greaterthan::Spaces 0::tl -> loop (Newline::cl@block) [Space] tl | Newline::Greaterthan::Spaces n::tl -> assert(n>0); loop (Newline::cl@block) [Spaces(n-1)] tl (* multi paragraph blockquotes with empty lines *) | Newlines 0::Greaterthan::Space::tl -> loop (Newlines 0::cl@block) [] tl | Newlines 0::Greaterthan::Spaces 0::tl -> loop (Newlines 0::cl@block) [Space] tl | Newlines 0::Greaterthan::Spaces n::tl -> assert(n>0); loop (Newlines 0::cl@block) [Spaces(n-1)] tl | (Newlines _::_ as l) | ([] as l) -> fix(List.rev(cl@block)), l | e::tl -> loop block (e::cl) tl in match loop [] [] lexemes with | (Newline|Newlines _)::block, tl -> if debug then eprintf "(OMD) Omd_parser.emailstyle_quoting %S\n%!" (L.string_of_tokens block); Some((Blockquote(main_loop [] [] block)::r), [Newline], tl) | _ -> None (* maybe a reference *) let maybe_reference (main_loop:main_loop) rc r _p l = assert_well_formed l; (* this function is called when we know it's not a link although it started with a '[' *) (* So it could be a reference or a link definition. *) let rec maybe_ref l = let text, remains = read_until_cbracket ~bq:true l in (* check that there is no ill-placed open bracket *) if (try ignore(read_until_obracket ~bq:true text); true with Premature_ending -> false) then raise Premature_ending; (* <-- ill-placed open bracket *) let blank, remains = read_until_obracket ~bq:true remains in (* check that there are no unwanted characters between CB and OB. *) if eat (let flag = ref true in function (* allow only a space, multiple spaces, or a newline *) | Newline -> !flag && (flag := false; true) | (Space|Spaces _) -> !flag && (flag := false; true) | _ -> false) blank <> [] then raise Premature_ending (* <-- not a regular reference *) else match read_until_cbracket ~bq:true remains with | [], remains -> let fallback = extract_fallback main_loop remains (Obracket::l) in let id = L.string_of_tokens text in (* implicit anchor *) Some(((Ref(rc, id, id, fallback))::r), [Cbracket], remains) | id, remains -> let fallback = extract_fallback main_loop remains (Obracket::l) in Some(((Ref(rc, L.string_of_tokens id, L.string_of_tokens text, fallback))::r), [Cbracket], remains) in let rec maybe_nonregular_ref l = let text, remains = read_until_cbracket ~bq:true l in (* check that there is no ill-placed open bracket *) if (try ignore(read_until_obracket ~bq:true text); true with Premature_ending -> false) then raise Premature_ending; (* <-- ill-placed open bracket *) let fallback = extract_fallback main_loop remains (Obracket::l) in let id = L.string_of_tokens text in (* implicit anchor *) Some(((Ref(rc, id, id, fallback))::r), [Cbracket], remains) in let rec maybe_def l = match read_until_cbracket ~bq:true l with | _, [] -> raise Premature_ending | id, (Colon::(Space|Spaces _)::remains) | id, (Colon::remains) -> begin match fsplit ~f:(function | (Space|Spaces _|Newline|Newlines _):: _ as l -> Split([], l) | e::tl -> Continue | [] -> Split([],[])) remains with | None | Some([], _) -> raise Premature_ending | Some(url, remains) -> let title, remains = match eat (function | (Space|Spaces _|Newline|Newlines _) -> true | _ -> false) remains with | Doublequotes(0)::tl -> [], tl | Doublequote::tl -> read_until_dq ~bq:true tl | Quotes(0)::tl -> [], tl | Quote::tl -> read_until_q ~bq:true tl | Oparenthesis::tl-> read_until_cparenth ~bq:true tl | l -> [], l in let url = let url = L.string_of_tokens url in if String.length url > 2 && url.[0] = '<' && url.[String.length url - 1] = '>' then String.sub url 1 (String.length url - 2) else url in rc#add_ref (L.string_of_tokens id) (L.string_of_tokens title) url; Some(r, [Newline], remains) end | _ -> raise Premature_ending in try maybe_ref l with | Premature_ending | NL_exception -> try maybe_def l with | Premature_ending | NL_exception -> try maybe_nonregular_ref l with | Premature_ending | NL_exception -> None (** maybe a link *) let maybe_link (main_loop:main_loop) r _p l = if debug then eprintf "(OMD) # maybe_link\n"; assert_well_formed l; let read_url name l = if debug then eprintf "(OMD) # maybe_link>read_url %S\n" (L.string_of_tokens l); try let l_cp, r_cp = read_until_cparenth ~no_nl:true ~bq:false l in if debug then eprintf "(OMD) maybe_link >> l_cp=%S r_cp=%S\n%!" (L.string_of_tokens l_cp) (L.string_of_tokens r_cp); try let l_dq, r_dq = read_until_dq ~no_nl:true ~bq:false l in if debug then eprintf "(OMD) maybe_link >> l_dq=%S r_dq=%S\n%!" (L.string_of_tokens l_dq) (L.string_of_tokens r_dq); (* maybe title *) if List.length l_cp > List.length l_dq then (* title *) begin if debug then eprintf "(OMD) maybe_link >> title\n%!"; let url = match List.rev l_dq with | (Newline|Space|Spaces _)::(Newline|Space|Spaces _)::tl | (Newline|Space|Spaces _)::tl -> L.string_of_tokens (List.rev tl) | _ -> L.string_of_tokens l_dq in let title, rest = read_until_dq ~no_nl:false ~bq:false r_dq in let rest = snd(read_until_cparenth rest) in let title = L.string_of_tokens title in Some(Url(url, name, title) :: r, [Cparenthesis], rest) end else (* no title *) raise Premature_ending with NL_exception | Premature_ending -> (* no title *) begin if debug then eprintf "(OMD) maybe_link >> no title\n%!"; let url = match List.rev l_cp with | (Newline|Space|Spaces _)::(Newline|Space|Spaces _)::tl | (Newline|Space|Spaces _)::tl -> List.rev tl | _ -> l_cp in let title, rest = [], r_cp in let url = L.string_of_tokens url in let title = L.string_of_tokens title in Some(Url(url, name, title) :: r, [Cparenthesis], rest) end with NL_exception | Premature_ending -> None in let read_name l = (* it's not really the "name" of a URL but what corresponds to the inner HTML of an HTML 'A' tag *) if debug then eprintf "(OMD) # maybe_link> read_name\n"; try match read_until_cbracket ~bq:true l with | name, (Oparenthesis::tl) -> read_url (main_loop [] [Obracket] name) (eat_blank tl) | name, (Oparenthesiss 0::tl) -> read_url (main_loop [] [Obracket] name) (Oparenthesis::tl) | name, (Oparenthesiss n::tl) -> read_url (main_loop [] [Obracket] name) (Oparenthesiss(n-1)::tl) | _ -> None with Premature_ending | NL_exception -> None in read_name l let has_paragraphs l = (* Has at least 2 consecutive newlines. *) List.exists (function Newlines _ -> true | _ -> false) l let parse_list (main_loop:main_loop) r _p l = assert_well_formed l; if debug then begin eprintf "(OMD) parse_list r=(%s) p=(%s) l=(%s)\n%!" "" (* (Omd_backend.sexpr_of_md (List.rev r)) *) "" (* (destring_of_tl p) *) (L.destring_of_tokens ~limit:40 l); end; let module UO = struct type ordered = O | U end in let open UO in if debug then eprintf "(OMD) parse_list: l=(%s)\n%!" (L.destring_of_tokens l); let end_of_item (indent:int) l : tok split_action = match l with | [] -> Split([],[]) | Newlines 0 :: ((Spaces n) :: Greaterthan :: (Space | Spaces _) :: tl as s) -> assert(n>=0); if n+2 = indent+4 then (* blockquote *) match unindent (n+2) (Newline::s) with | Newline::block, rest -> Continue_with(List.rev(Newlines(1)::block), rest) | Newlines n::block, rest -> Continue_with(List.rev(Newlines(n+2)::block), rest) | block, rest -> Continue_with(Newlines 0::block, rest) else if n+2 >= indent+8 then (* code inside item *) match unindent (indent+4) (Newline::s) with | Newline::block, rest -> Continue_with(List.rev(Newlines(1)::block), rest) | Newlines n::block, rest -> Continue_with(List.rev(Newlines(n+2)::block), rest) | block, rest -> Continue_with(Newlines 0::block, rest) else Split([], l) | Newlines 0 :: (Spaces n :: tl as s) -> assert(n>=0); if n+2 >= indent+8 then (* code inside item *) match unindent (indent+4) (Newline::s) with | Newline::block, rest -> Continue_with(List.rev(Newlines(0)::block), rest) | Newlines n::block, rest -> Continue_with(List.rev(Newlines(n+1)::block), rest) | block, rest -> Continue_with(Newline::block, rest) else if n+2 >= indent+4 then (* new paragraph inside item *) match unindent (indent+4) (Newline::s) with | Newline::block, rest -> Continue_with(List.rev(Newlines(1)::block), rest) | Newlines n::block, rest -> Continue_with(List.rev(Newlines(n+2)::block), rest) | block, rest -> Continue_with(Newlines 0::block, rest) else Split([], l) | (Newlines _) :: _ -> (* n > 0 *) (* End of item, stop *) Split([], l) | Newline :: ( ((Space|Spaces _) :: (Star|Minus|Plus) :: (Space|Spaces _):: _) | ((Space|Spaces _) :: Number _ :: Dot :: (Space|Spaces _) :: _) | ((Star|Minus|Plus) :: (Space|Spaces _):: _) | (Number _ :: Dot :: (Space|Spaces _) :: _) as tl) -> Split([Newline], tl) | Newline :: (Space | Spaces _) :: Newline :: tl -> (* A line with spaces shouldn't interfere here, which is about exactly 2 consecutive newlines, so we rewrite the head of the lexing stream. *) Continue_with([], Newlines 0 :: tl) | Newline :: (Space | Spaces _) :: (Newlines _) :: _ -> (* A line with spaces shouldn't interfere here, which is about at least 3 consecutive newlines, so we stop. *) Split([], l) | Newline :: (Spaces _ as s) :: tl -> Continue_with ([s; Tag("parse_list/remember spaces", object method parser_extension r p = function Spaces _::tl -> Some(r,p,Space::tl) | _ -> None method to_string = "" end); Newline], tl) | Newline :: (Space as s) :: tl -> Continue_with ([s; Tag("parse_list/remember space", object method parser_extension r p = function (Space|Spaces _)::tl -> Some(r,p,Space::tl) | _ -> None method to_string = "" end); Newline], tl) | _::_ -> Continue in let rev_to_t l = assert_well_formed l; (* Newlines at the end of items have no meaning (except to end the item which is expressed by the constructor already). *) let l = match l with (Newline | Newlines _) :: tl -> tl | _ -> l in main_loop [] [Newline] (List.rev l) in let add (sublist:element) items = if debug then eprintf "(OMD) add\n%!"; match items with | [] -> assert false | (O,indents,item)::tl -> (O,indents,(item@[sublist]))::tl | (U,indents,item)::tl -> (U,indents,(item@[sublist]))::tl in let make_up ~p items : Omd_representation.element = if debug then eprintf "(OMD) make_up p=%b\n%!" p; let items = List.rev items in match items with | (U,_,item)::_ -> if p then Ulp(List.map (fun (_,_,i) -> i) items) else Ul(List.map (fun (_,_,i) -> i) items) | (O,_,item)::_ -> if p then Olp(List.map (fun (_,_,i) -> i) items) else Ol(List.map (fun (_,_,i) -> i) items) | [] -> failwith "make_up called with []" (* assert false *) in let rec list_items ~p indents items l = if debug then eprintf "(OMD) list_items: p=%b l=(%s)\n%!" p (L.destring_of_tokens l); match l with (* no more list items *) | [] -> make_up p items, l (* more list items *) (* new unordered items *) | (Star|Minus|Plus)::(Space|Spaces _)::tl -> begin match fsplit_rev ~f:(end_of_item 0) tl with | None -> make_up p items, l | Some(new_item, rest) -> let p = p || has_paragraphs new_item in if debug then eprintf "(OMD) (2346) new_item=%S\n%!" (L.destring_of_tokens new_item); match indents with | [] -> assert(items = []); list_items ~p [0] ((U,[0], rev_to_t new_item)::items) rest | 0::_ -> list_items ~p indents ((U,indents,rev_to_t new_item)::items) rest | _::_ -> make_up p items, l end | Space::(Star|Minus|Plus)::(Space|Spaces _)::tl -> begin match fsplit_rev ~f:(end_of_item 1) tl with | None -> make_up p items, l | Some(new_item, rest) -> let p = p || has_paragraphs new_item in match indents with | [] -> assert(items = []); list_items ~p [1] ((U,[1],rev_to_t new_item)::items) rest | 1::_ -> list_items ~p indents ((U,indents,rev_to_t new_item)::items) rest | i::_ -> if i > 1 then make_up p items, l else (* i < 1 : new sub list*) let sublist, remains = list_items ~p (1::indents) [(U,1::indents,rev_to_t new_item)] rest in list_items ~p indents (add sublist items) remains end | Spaces n::(Star|Minus|Plus)::(Space|Spaces _)::tl -> begin match fsplit_rev ~f:(end_of_item (n+2)) tl with | None -> make_up p items, l | Some(new_item, rest) -> let p = p || has_paragraphs new_item in match indents with | [] -> if debug then eprintf "(OMD) spaces[] l=(%S)\n%!" (L.string_of_tokens l); assert(items = []); (* ae... listes mal formes ?! *) list_items ~p [n+2] ((U,[n+2],rev_to_t new_item)::items) rest | i::_ -> if debug then eprintf "(OMD) spaces(%d::_) n=%d l=(%S)\n%!" i n (L.string_of_tokens l); if i = n + 2 then let items = (U,indents,rev_to_t new_item) :: items in list_items ~p indents items rest else if i < n + 2 then let sublist, remains = list_items ~p ((n+2)::indents) [(U,(n+2)::indents,rev_to_t new_item)] rest in list_items ~p indents (add sublist items) remains else (* i > n + 2 *) make_up p items, l end (* new ordered items *) | Number _::Dot::(Space|Spaces _)::tl -> begin match fsplit_rev ~f:(end_of_item 0) tl with | None -> make_up p items, l | Some(new_item, rest) -> let p = p || has_paragraphs new_item in assert_well_formed new_item; match indents with | [] -> assert(items = []); list_items ~p [0] ((O,[0],rev_to_t new_item)::items) rest | 0::_ -> list_items ~p indents ((O,indents,rev_to_t new_item)::items) rest | _::_ -> make_up p items, l end | Space::Number _::Dot::(Space|Spaces _)::tl -> begin match fsplit_rev ~f:(end_of_item 1) tl with | None -> make_up p items, l | Some(new_item, rest) -> let p = p || has_paragraphs new_item in match indents with | [] -> assert(items = []); list_items ~p [1] ((O,[1],rev_to_t new_item)::items) rest | 1::_ -> list_items ~p indents ((O,indents,rev_to_t new_item)::items) rest | i::_ -> if i > 1 then make_up p items, l else (* i < 1 : new sub list*) let sublist, remains = list_items ~p (1::indents) [(O,1::indents,rev_to_t new_item)] rest in list_items ~p:p indents (add sublist items) remains end | Spaces n::Number _::Dot::(Space|Spaces _)::tl -> begin match fsplit_rev ~f:(end_of_item (n+2)) tl with | None -> make_up p items, l | Some(new_item, rest) -> let p = p || has_paragraphs new_item in match indents with | [] -> if debug then eprintf "(OMD) spaces[] l=(%S)\n%!" (L.string_of_tokens l); assert(items = []); (* ae... listes mal formes ?! *) list_items ~p [n+2] ((O,[n+2],rev_to_t new_item)::items) rest | i::_ -> if debug then eprintf "(OMD) spaces(%d::_) n=%d l=(%S)\n%!" i n (L.string_of_tokens l); if i = n + 2 then list_items ~p indents ((O,indents,rev_to_t new_item)::items) rest else if i < n + 2 then let sublist, remains = list_items ~p ((n+2)::indents) [(O,(n+2)::indents,rev_to_t new_item)] rest in list_items ~p:p indents (add sublist items) remains else (* i > n + 2 *) make_up p items, l end (* *) | Newlines 0::((Star|Minus|Plus)::(Space|Spaces _)::_ as l) | Newlines 0::(Number _::Dot::(Space|Spaces _)::_ as l) | Newlines 0::((Space|Spaces _)::Star::(Space|Spaces _)::_ as l) | Newlines 0::((Space|Spaces _)::Number _::Dot::(Space|Spaces _)::_ as l) -> list_items ~p:true indents items l | _ -> if debug then begin let rec string_of_items items = match items with | [] -> "" | (O,indent::_,item)::tl -> sprintf "(O,i=%d,%S)" (indent) (Omd_backend.html_of_md item) ^ string_of_items tl | (U,indent::_,item)::tl -> sprintf "(U,i=%d,%S)" (indent) (Omd_backend.html_of_md item) ^ string_of_items tl | _ -> "(weird)" in eprintf "(OMD) NALI parse_list: l=(%S) items=%s\n%!" (L.string_of_tokens l) (string_of_items items) end; (* not a list item *) make_up p items, l in match list_items ~p:false [] [] l with | rp, l -> rp::r, [Newline], l let icode ?(default_lang=default_lang) r _p l = assert_well_formed l; (* indented code: returns (r,p,l) where r is the result, p is the last thing read, l is the remains *) let dummy_tag = Tag("dummy_tag", object method to_string = "" method parser_extension = fun r p l -> None end) in let accu = Buffer.create 64 in let rec loop s tl = match s, tl with | (Newline|Newlines _ as p), (Space|Spaces(0|1))::_ -> (* 1, 2 or 3 spaces. *) (* -> Return what's been found as code because what follows isn't. *) Code_block(default_lang, Buffer.contents accu) :: r, [p], tl | (Newline|Newlines _ as p), Spaces(n)::tl -> assert(n>0); (* At least 4 spaces, it's still code. *) Buffer.add_string accu (L.string_of_token p); loop (if n >= 4 then Spaces(n-4) else if n = 3 then Space else dummy_tag) tl | (Newline|Newlines _ as p), (not_spaces::_ as tl) -> (* stop *) Code_block(default_lang, Buffer.contents accu) :: r, [p], tl (* -> Return what's been found as code because it's no more code. *) | p, e::tl -> Buffer.add_string accu (L.string_of_token p); (* html entities are to be converted later! *) loop e tl | p, [] -> Buffer.add_string accu (L.string_of_token p); Code_block(default_lang, Buffer.contents accu)::r, [p], [] in match l with | Spaces n::tl -> if n >= 4 then Some(loop (Spaces(n-4)) tl) else if n = 3 then Some(loop Space tl) else Some(loop dummy_tag tl) | _ -> assert false (* Returns [(r,p,l)] where [r] is the result, [p] is the last thing read, and [l] is what remains. *) let spaces_at_beginning_of_line main_loop default_lang n r previous lexemes = assert_well_formed lexemes; assert (n > 0); if n <= 3 then ( match lexemes with | (Star|Minus|Plus) :: (Space|Spaces _) :: _ -> (* unordered list *) parse_list main_loop r [] (L.make_space n::lexemes) | (Number _)::Dot::(Space|Spaces _)::tl -> (* ordered list *) parse_list main_loop r [] (L.make_space n::lexemes) | [] | (Newline|Newlines _) :: _ -> (* blank line, skip spaces *) r, previous, lexemes | _::_ -> Text (" ")::r, previous, lexemes ) else ( (* n>=4, blank line or indented code *) match lexemes with | [] | (Newline|Newlines _) :: _ -> r, previous, lexemes | _ -> match icode ~default_lang r [Newline] (L.make_space n :: lexemes) with | Some(r,p,l) -> r,p,l | None -> if debug then eprintf "(OMD) Omd_parser.icode or \ Omd_parser.main_loop is broken\n%!"; assert false ) let spaces_not_at_beginning_of_line ?(html=false) n r lexemes = assert_well_formed lexemes; assert (n > 0); if n = 1 then (Text " "::r), [Space], lexemes else ( match lexemes with | Newline :: tl when not html -> if debug then eprintf "(OMD) 2 or more spaces before a newline, eat the newline\n%!"; Br::r, [Spaces(n-2)], tl | Newlines k :: tl when not html -> if debug then eprintf "(OMD) 2 or more spaces before a newline, eat 1 newline"; let newlines = if k = 0 then Newline else Newlines(k-1) in Br::r, [Spaces(n-2)], newlines :: tl | _ -> assert (n>1); (Text (String.make n ' ')::r), [Spaces(n-2)], lexemes ) let maybe_autoemail r p l = assert_well_formed l; match l with | Lessthan::tl -> begin match fsplit ~excl:(function (Newline|Newlines _|Space|Spaces _) :: _-> true | [] -> true | _ -> false) ~f:(function At::tl -> Split([],tl) | _ -> Continue) tl with | None -> None | Some(left, right) -> match fsplit ~excl:(function | (Newline|Newlines _|Space|Spaces _) :: _-> true | [] -> true | _ -> false) ~f:(function Greaterthan::tl -> Split([],tl) | Greaterthans 0::tl -> Split([],Greaterthan::tl) | Greaterthans n::tl -> Split([],Greaterthans(n-1)::tl) | _ -> Continue) right with | None -> None | Some(domain, tl) -> let email = L.string_of_tokens left ^ "@" ^ L.string_of_tokens domain in Some(Url("mailto:"^email,[Text email],"")::r,[Greaterthan],tl) end | _ -> failwith "Omd_parser.maybe_autoemail: wrong use of the function." let is_hex s = String.length s > 1 && (s.[0] = 'X' || s.[0] = 'x') && (let rec loop i = i = String.length s || (match s.[i] with | '0' .. '9' | 'A' .. 'F' | 'a' .. 'f' -> loop (succ i) | _ -> false) in loop 1) let mediatypetextomd : string list ref = ref [] let filter_text_omd_rev l = let rec loop b r = function | [] -> if b then r else l | ("media:type", Some "text/omd")::tl -> loop true r tl | e::tl -> loop b (e::r) tl in loop false [] l exception Orphan_closing of string * l * l let rec main_impl_rev ~html (r:r) (previous:p) (lexemes:l) = (* if debug then eprintf "(OMD) main_impl_rev html=%b\n%!" html; *) assert_well_formed lexemes; if debug then eprintf "(OMD) main_impl_rev html=%b r=%s p=(%s) l=(%s)\n%!" html (Omd_backend.sexpr_of_md (List.rev r)) (L.destring_of_tokens previous) (L.destring_of_tokens lexemes); match previous, lexemes with (* no more to process *) | _, [] -> (* return the result (/!\ it has to be reversed as some point) *) r (* Tag: tag system $\cup$ high-priority extension mechanism *) | _, Tag(_name, e) :: tl -> begin match e#parser_extension r previous tl with | Some(r, p, l) -> main_impl_rev ~html r p l | None -> main_impl_rev ~html r previous tl end (* HTML comments *) | _, (Lessthan as t)::(Exclamation::Minuss 0::c as tl) -> begin let f = function | (Minuss _ as m)::(Greaterthan|Greaterthans _ as g)::tl -> Split([g;m], tl) | _ -> Continue in match fsplit ~f:f lexemes with | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end | Some (comments, new_tl) -> let r = Html_comment(L.string_of_tokens comments) :: r in main_impl_rev ~html r [Newline] new_tl end (* email-style quoting / blockquote *) | ([]|[Newline|Newlines _]), Greaterthan::(Space|Spaces _)::_ -> begin match emailstyle_quoting main_loop r previous (Newline::lexemes) with | Some(r,p,l) -> main_impl_rev ~html r p l | None -> if debug then eprintf "(OMD) Omd_parser.emailstyle_quoting or \ Omd_parser.main_loop is broken\n%!"; assert false end (* email-style quoting, with lines starting with spaces! *) | ([]|[Newline|Newlines _]), (Space|Spaces(0|1) as s) :: Greaterthan :: (Space|Spaces _)::_ -> (* It's 1, 2 or 3 spaces, not more because it wouldn't mean quoting anymore but code. *) begin let new_r, p, rest = let foo, rest = match unindent (L.length s) (Newline::lexemes) with | (Newline|Newlines _)::foo, rest -> foo, rest | res -> res in match emailstyle_quoting main_loop [] previous (Newline::foo) with | Some(new_r, p, []) -> new_r, p, rest | _ -> if debug then eprintf "(OMD) Omd_parser.emailstyle_quoting or \ Omd_parser.main_loop is broken\n%!"; assert false in main_impl_rev ~html (new_r@r) [Newline] rest end (* minus *) | ([]|[Newline|Newlines _]), (Minus|Minuss _ as t) :: ((Space|Spaces _)::_ as tl) -> (* maybe hr *) begin match hr_m lexemes with | None -> (* no hr, so it could be a list *) begin match t with | Minus -> (* it's a list *) let md, new_p, new_l = parse_list main_loop r [] lexemes in main_impl_rev ~html md new_p new_l | _ -> (* not a list *) begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end end | Some l -> (* hr *) main_impl_rev ~html (Hr::r) [Newline] l end | ([]|[Newline|Newlines _]), (Minus|Minuss _ as t)::tl -> begin match hr_m lexemes with | None -> (* no hr, and it's not a list either because it's not followed by spaces *) begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end | Some l -> (* hr *) main_impl_rev ~html (Hr::r) [Newline] l end (* hashes *) | ([]|[(Newline|Newlines _)]), (Hashs n as t) :: ((Space|Spaces _) :: ttl as tl) | ([]|[(Newline|Newlines _)]), (Hashs n as t) :: (ttl as tl) -> (* hash titles *) if n <= 4 then match read_title main_loop (n+2) r previous ttl with | Some(r, p, l) -> main_impl_rev ~html r p l | None -> if debug then eprintf "(OMD) Omd_parser.read_title or \ Omd_parser.main_loop is broken\n%!"; assert false else begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end | ([]|[(Newline|Newlines _)]), Hash :: (Space|Spaces _) :: tl | ([]|[(Newline|Newlines _)]), Hash :: tl -> (* hash titles *) begin match read_title main_loop 1 r previous tl with | Some(r, p, l) -> main_impl_rev ~html r p l | None -> if debug then eprintf "(OMD) Omd_parser.read_title or \ Omd_parser.main_loop is broken\n%!"; assert false end | _, (Hash|Hashs _ as t) :: tl -> (* hash -- no title *) begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end (* spaces after a newline: could lead to hr *) | ([]|[Newline|Newlines _]), ((Space|Spaces _) as sp) :: tl -> begin match hr tl with | None -> (* No [Hr], but maybe [Ul], [Ol], code,... *) let n = L.length sp in let r, p, l = spaces_at_beginning_of_line main_loop default_lang n r previous tl in main_impl_rev ~html r p l | Some tl -> main_impl_rev ~html (Hr::r) [Newline] tl end (* spaces anywhere *) | _, ((Space|Spaces _) as t) :: tl -> (* too many cases to be handled here *) let n = L.length t in let r, p, l = spaces_not_at_beginning_of_line ~html n r tl in main_impl_rev ~html r p l (* underscores *) | _, (Underscore as t) :: tl -> (* one "orphan" underscore, or emph *) (match uemph_or_bold 1 tl with | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end | Some(x, new_tl) -> main_impl_rev ~html (Emph(main_impl ~html [] [t] x) :: r) [t] new_tl ) | _, (Underscores((0|1) as n) as t) :: tl -> (* 2 or 3 "orphan" underscores, or emph/bold *) (match uemph_or_bold (n+2) tl with | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end | Some(x, new_tl) -> if n = 0 then (* 1 underscore *) main_impl_rev ~html (Bold(main_impl ~html [] [t] x) :: r) [t] new_tl else (* 2 underscores *) main_impl_rev ~html (Emph([Bold(main_impl ~html [] [t] x)]) :: r) [t] new_tl ) (* enumerated lists *) | ([]|[Newline|Newlines _]), (Number _) :: Dot :: (Space|Spaces _) :: tl -> let md, new_p, new_l = parse_list main_loop r [] lexemes in main_impl_rev ~html md new_p new_l (* plus *) | ([]|[(Newline|Newlines _)]), Plus :: (Space|Spaces _) :: _ -> let md, new_p, new_l = parse_list main_loop r [] lexemes in main_impl_rev ~html md new_p new_l (* stars *) | ([]|[(Newline|Newlines _)]), Star :: (Space|Spaces _) :: _ -> (* maybe hr or new list *) begin match hr_s lexemes with | Some l -> main_impl_rev ~html (Hr::r) [Newline] l | None -> let md, new_p, new_l = parse_list main_loop r [] lexemes in main_impl_rev ~html md new_p new_l end | ([]|[(Newline|Newlines _)]), Stars _ :: _ when hr_s lexemes <> None -> (* hr *) (match hr_s lexemes with | Some l -> main_impl_rev ~html (Hr::r) [Newline] l | None -> assert false ) | ([]|[(Newline|Newlines _)]), (Star as t) :: tl -> (* maybe hr *) begin match hr_s lexemes with | Some l -> main_impl_rev ~html (Hr::r) [Newline] l | None -> (match semph_or_bold 1 tl with | Some(x, new_tl) -> main_impl_rev ~html (Emph(main_impl ~html [] [t] x) :: r) [t] new_tl | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) end | _, (Star as t) :: tl -> (* one "orphan" star, or emph // can't be hr *) (match semph_or_bold 1 tl with | Some(x, new_tl) -> main_impl_rev ~html (Emph(main_impl ~html [] [t] x) :: r) [t] new_tl | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) | _, (Stars((0|1) as n) as t) :: tl -> (* 2 or 3 "orphan" stars, or emph/bold *) (match semph_or_bold (n+2) tl with | Some(x, new_tl) -> if n = 0 then main_impl_rev ~html (Bold(main_impl ~html [] [t] x) :: r) [t] new_tl else main_impl_rev ~html (Emph([Bold(main_impl ~html [] [t] x)]) :: r) [t] new_tl | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) (* backslashes *) | _, Backslash :: (Newline as t) :: tl -> (* \\n *) main_impl_rev ~html (Br :: r) [t] tl | _, Backslash :: Newlines 0 :: tl -> (* \\n\n\n\n... *) main_impl_rev ~html (Br :: r) [Backslash; Newline] (Newline :: tl) | _, Backslash :: Newlines n :: tl -> assert (n >= 0); (* \\n\n\n\n... *) main_impl_rev ~html (Br :: r) [Backslash; Newline] (Newlines (n-1) :: tl) | _, Backslash :: (Backquote as t) :: tl -> (* \` *) main_impl_rev ~html (Text ("`") :: r) [t] tl | _, Backslash :: Backquotes 0 :: tl -> (* \````... *) main_impl_rev ~html (Text ("`") :: r) [Backslash; Backquote] (Backquote :: tl) | _, Backslash :: Backquotes n :: tl -> assert (n >= 0); (* \````... *) main_impl_rev ~html (Text ("`") :: r) [Backslash; Backquote] (Backquotes (n-1) :: tl) | _, Backslash :: (Star as t) :: tl -> (* \* *) main_impl_rev ~html (Text ("*") :: r) [t] tl | _, Backslash :: Stars 0 :: tl -> (* \****... *) main_impl_rev ~html (Text ("*") :: r) [Backslash; Star] (Star :: tl) | _, Backslash :: Stars n :: tl -> assert (n >= 0); (* \****... *) main_impl_rev ~html (Text ("*") :: r) [Backslash; Star] (Stars (n-1) :: tl) | _, Backslash :: (Underscore as t) :: tl -> (* \_ *) main_impl_rev ~html (Text ("_") :: r) [t] tl | _, Backslash :: Underscores 0 :: tl -> (* \___... *) main_impl_rev ~html (Text ("_") :: r) [Backslash; Underscore] (Underscore :: tl) | _, Backslash :: Underscores n :: tl -> assert (n >= 0); (* \___... *) main_impl_rev ~html (Text ("_") :: r) [Backslash; Underscore] (Underscores (n-1) :: tl) | _, Backslash :: (Obrace as t) :: tl -> (* \{ *) main_impl_rev ~html (Text ("{") :: r) [t] tl | _, Backslash :: Obraces 0 :: tl -> (* \{{{... *) main_impl_rev ~html (Text ("{") :: r) [Backslash; Obrace] (Obrace :: tl) | _, Backslash :: Obraces n :: tl -> assert (n >= 0); (* \{{{... *) main_impl_rev ~html (Text ("{") :: r) [Backslash; Obrace] (Obraces (n-1) :: tl) | _, Backslash :: (Cbrace as t) :: tl -> (* \} *) main_impl_rev ~html (Text ("}") :: r) [t] tl | _, Backslash :: Cbraces 0 :: tl -> (* \}}}... *) main_impl_rev ~html (Text ("}") :: r) [Backslash; Cbrace] (Cbrace :: tl) | _, Backslash :: Cbraces n :: tl -> assert (n >= 0); (* \}}}... *) main_impl_rev ~html (Text ("}") :: r) [Backslash; Cbrace] (Cbraces (n-1) :: tl) | _, Backslash :: (Obracket as t) :: tl -> (* \[ *) main_impl_rev ~html (Text ("[") :: r) [t] tl | _, Backslash :: Obrackets 0 :: tl -> (* \[[[... *) main_impl_rev ~html (Text ("[") :: r) [Backslash; Obracket] (Obracket :: tl) | _, Backslash :: Obrackets n :: tl -> assert (n >= 0); (* \[[[... *) main_impl_rev ~html (Text ("[") :: r) [Backslash; Obracket] (Obrackets (n-1) :: tl) | _, Backslash :: (Cbracket as t) :: tl -> (* \} *) main_impl_rev ~html (Text ("]") :: r) [t] tl | _, Backslash :: Cbrackets 0 :: tl -> (* \}}}... *) main_impl_rev ~html (Text ("]") :: r) [Backslash; Cbracket] (Cbracket :: tl) | _, Backslash :: Cbrackets n :: tl -> assert (n >= 0); (* \}}}... *) main_impl_rev ~html (Text ("]") :: r) [Backslash; Cbracket] (Cbrackets (n-1) :: tl) | _, Backslash :: (Oparenthesis as t) :: tl -> (* \( *) main_impl_rev ~html (Text ("(") :: r) [t] tl | _, Backslash :: Oparenthesiss 0 :: tl -> (* \(((... *) main_impl_rev ~html (Text ("(") :: r) [Backslash; Oparenthesis] (Oparenthesis :: tl) | _, Backslash :: Oparenthesiss n :: tl -> assert (n >= 0); (* \(((... *) main_impl_rev ~html (Text ("(") :: r) [Backslash; Oparenthesis] (Oparenthesiss (n-1) :: tl) | _, Backslash :: (Cparenthesis as t) :: tl -> (* \) *) main_impl_rev ~html (Text (")") :: r) [t] tl | _, Backslash :: Cparenthesiss 0 :: tl -> (* \)))... *) main_impl_rev ~html (Text (")") :: r) [Backslash; Cparenthesis] (Cparenthesis :: tl) | _, Backslash :: Cparenthesiss n :: tl -> assert (n >= 0); (* \)))... *) main_impl_rev ~html (Text (")") :: r) [Backslash; Cparenthesis] (Cparenthesiss (n-1) :: tl) | _, Backslash :: (Plus as t) :: tl -> (* \+ *) main_impl_rev ~html (Text ("+") :: r) [t] tl | _, Backslash :: Pluss 0 :: tl -> (* \+++... *) main_impl_rev ~html (Text ("+") :: r) [Backslash; Plus] (Plus :: tl) | _, Backslash :: Pluss n :: tl -> assert (n >= 0); (* \+++... *) main_impl_rev ~html (Text ("+") :: r) [Backslash; Plus] (Pluss (n-1) :: tl) | _, Backslash :: (Minus as t) :: tl -> (* \- *) main_impl_rev ~html (Text ("-") :: r) [t] tl | _, Backslash :: Minuss 0 :: tl -> (* \---... *) main_impl_rev ~html (Text ("-") :: r) [Backslash; Minus] (Minus :: tl) | _, Backslash :: Minuss n :: tl -> assert (n >= 0); (* \---... *) main_impl_rev ~html (Text ("-") :: r) [Backslash; Minus] (Minuss (n-1) :: tl) | _, Backslash :: (Dot as t) :: tl -> (* \. *) main_impl_rev ~html (Text (".") :: r) [t] tl | _, Backslash :: Dots 0 :: tl -> (* \....... *) main_impl_rev ~html (Text (".") :: r) [Backslash; Dot] (Dot :: tl) | _, Backslash :: Dots n :: tl -> assert (n >= 0); (* \....... *) main_impl_rev ~html (Text (".") :: r) [Backslash; Dot] (Dots (n-1) :: tl) | _, Backslash :: (Exclamation as t) :: tl -> (* \! *) main_impl_rev ~html (Text ("!") :: r) [t] tl | _, Backslash :: Exclamations 0 :: tl -> (* \!!!... *) main_impl_rev ~html (Text ("!") :: r) [Backslash; Exclamation] (Exclamation :: tl) | _, Backslash :: Exclamations n :: tl -> assert (n >= 0); (* \!!!... *) main_impl_rev ~html (Text ("!") :: r) [Backslash; Exclamation] (Exclamations (n-1) :: tl) | _, Backslash :: (Hash as t) :: tl -> (* \# *) main_impl_rev ~html (Text ("#") :: r) [t] tl | _, Backslash :: Hashs 0 :: tl -> (* \###... *) main_impl_rev ~html (Text ("#") :: r) [Backslash; Hash] (Hash :: tl) | _, Backslash :: Hashs n :: tl -> assert (n >= 0); (* \###... *) main_impl_rev ~html (Text ("#") :: r) [Backslash; Hash] (Hashs (n-1) :: tl) | _, Backslash :: (Greaterthan as t) :: tl -> (* \> *) main_impl_rev ~html (Text (">") :: r) [t] tl | _, Backslash :: Greaterthans 0 :: tl -> (* \>>>... *) main_impl_rev ~html (Text (">") :: r) [Backslash; Greaterthan] (Greaterthan :: tl) | _, Backslash :: Greaterthans n :: tl -> assert (n >= 0); (* \>>>... *) main_impl_rev ~html (Text (">") :: r) [Backslash; Greaterthan] (Greaterthans (n-1) :: tl) | _, Backslash :: (Lessthan as t) :: tl -> (* \< *) main_impl_rev ~html (Text ("<") :: r) [t] tl | _, Backslash :: Lessthans 0 :: tl -> (* \<<<... *) main_impl_rev ~html (Text ("<") :: r) [Backslash; Lessthan] (Lessthan :: tl) | _, Backslash :: Lessthans n :: tl -> assert (n >= 0); (* \<<<... *) main_impl_rev ~html (Text ("<") :: r) [Backslash; Lessthan] (Lessthans (n-1) :: tl) | _, (Backslashs 0 as t) :: tl -> (* \\\\... *) main_impl_rev ~html (Text ("\\") :: r) [t] tl | _, (Backslashs n as t) :: tl -> (* \\\\... *) if n mod 2 = 0 then main_impl_rev ~html (Text(String.make ((n+2)/2) '\\') :: r) [t] tl else main_impl_rev ~html (Text(String.make ((n+2)/2) '\\') :: r) [t] (Backslash :: tl) | _, Backslash::[] -> main_impl_rev ~html (Text "\\" :: r) [] [] | _, Backslash::tl -> main_impl_rev ~html (Text "\\" :: r) [Backslash] tl (* < *) | _, (Lessthan|Lessthans _ as t) :: (Word("http"|"https"|"ftp"|"ftps"|"ssh"|"afp"|"imap") as w) :: Colon::Slashs(n)::tl -> (* "semi-automatic" URLs *) let rec read_url accu = function | (Newline|Newlines _)::tl -> None | Greaterthan::tl -> let url = (L.string_of_token w) ^ "://" ^ (if n = 0 then "" else String.make (n-1) '/') ^ L.string_of_tokens (List.rev accu) in Some(url, tl) | x::tl -> read_url (x::accu) tl | [] -> None in begin match read_url [] tl with | Some(url, new_tl) -> let r = match t with | Lessthans 0 -> Text "<" :: r | Lessthans n -> Text(String.make (n+1) '<') :: r | _ -> r in main_impl_rev ~html (Url(url,[Text url],"")::r) [] new_tl | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end end (* Word(w) *) | _, Word w::tl -> main_impl_rev ~html (Text w :: r) [Word w] tl (* newline at the end *) | _, [Newline] -> NL::r (* named html entity *) | _, Ampersand::((Word w::((Semicolon|Semicolons _) as s)::tl) as tl2) -> if StringSet.mem w htmlcodes_set then begin match s with | Semicolon -> main_impl_rev ~html (Raw("&"^w^";")::r) [s] tl | Semicolons 0 -> main_impl_rev ~html (Raw("&"^w^";")::r) [s] (Semicolon::tl) | Semicolons n -> main_impl_rev ~html (Raw("&"^w^";")::r) [s] (Semicolons(n-1)::tl) | _ -> assert false end else main_impl_rev ~html (Raw("&")::r) [] tl2 (* digit-coded html entity *) | _, Ampersand::((Hash::Number w::((Semicolon|Semicolons _) as s)::tl) as tl2) -> if String.length w <= 4 then begin match s with | Semicolon -> main_impl_rev ~html (Raw("&#"^w^";")::r) [s] tl | Semicolons 0 -> main_impl_rev ~html (Raw("&#"^w^";")::r) [s] (Semicolon::tl) | Semicolons n -> main_impl_rev ~html (Raw("&#"^w^";")::r) [s] (Semicolons(n-1)::tl) | _ -> assert false end else main_impl_rev ~html (Raw("&")::r) [] tl2 (* maybe hex digit-coded html entity *) | _, Ampersand::((Hash::Word w::((Semicolon|Semicolons _) as s)::tl) as tl2) when is_hex w -> if String.length w <= 4 then begin match s with | Semicolon -> main_impl_rev ~html (Raw("&#"^w^";")::r) [s] tl | Semicolons 0 -> main_impl_rev ~html (Raw("&#"^w^";")::r) [s] (Semicolon::tl) | Semicolons n -> main_impl_rev ~html (Raw("&#"^w^";")::r) [s] (Semicolons(n-1)::tl) | _ -> assert false end else main_impl_rev ~html (Raw("&")::r) [] tl2 (* Ampersand *) | _, Ampersand::tl -> main_impl_rev ~html (Raw("&")::r) [Ampersand] tl (* 2 Ampersands *) | _, Ampersands(0)::tl -> main_impl_rev ~html (Raw("&")::r) [] (Ampersand::tl) (* Several Ampersands (more than 2) *) | _, Ampersands(n)::tl -> main_impl_rev ~html (Raw("&")::r) [] (Ampersands(n-1)::tl) (* backquotes *) | _, (Backquote|Backquotes _ as t)::tl -> begin match bcode ~default_lang r previous lexemes with | Some(r, p, l) -> main_impl_rev ~html r p l | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end end (* HTML *) (*
and
with or without space(s) *) | _, (Lessthan::Word("br"|"hr" as w)::Slash ::(Greaterthan|Greaterthans _ as g)::tl) | _, (Lessthan::Word("br"|"hr" as w)::(Space|Spaces _)::Slash ::(Greaterthan|Greaterthans _ as g)::tl) -> begin match g with | Greaterthans 0 -> main_impl_rev ~html (Raw("<"^w^" />")::r) [Greaterthan] (Greaterthan::tl) | Greaterthans n -> main_impl_rev ~html (Raw("<"^w^" />")::r) [Greaterthan] (Greaterthans(n-1)::tl) | _ -> main_impl_rev ~html (Raw("<"^w^" />")::r) [Greaterthan] tl end (* awaited orphan html closing tag *) | _, Lessthan::Slash::Word(w)::(Greaterthan|Greaterthans _ as g)::tl when !mediatypetextomd <> [] -> raise (Orphan_closing(w, lexemes, (match g with | Greaterthans 0 -> Greaterthan::tl | Greaterthans n -> Greaterthans(n-1)::tl | _ -> tl))) (* block html *) | ([] | [Newline|Newlines _|Tag("HTMLBLOCK", _)]), (Lessthan as t) ::((Word(tagnametop) as w) ::((Space|Spaces _|Greaterthan|Greaterthans _) ::_ as html_stuff) as tlx) -> if StringSet.mem tagnametop inline_htmltags_set then main_impl_rev ~html r [Word ""] lexemes else if not (blind_html || StringSet.mem tagnametop htmltags_set) then begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tlx | Some(r, p, l) -> main_impl_rev ~html r p l end else let read_html() = let module T = struct type t = | Awaiting of string | Open of string type interm = | HTML of string * (string * string option) list * interm list | FTOKENS of L.t | RTOKENS of L.t | MD of Omd_representation.t let rec md_of_interm_list html l = let md_of_interm_list ?(html=html) l = md_of_interm_list html l in match l with | [] -> [] | HTML(t, a, c)::tl -> ( let f_a = filter_text_omd_rev a in if f_a != a then Html_block (t, f_a, make_paragraphs (md_of_interm_list ~html:false (List.rev c))) :: md_of_interm_list tl else Html_block (t, f_a, md_of_interm_list ~html:true (List.rev c)) :: md_of_interm_list tl ) | MD md::tl -> md@md_of_interm_list tl | RTOKENS t1::FTOKENS t2::tl -> md_of_interm_list (FTOKENS(List.rev_append t1 t2)::tl) | RTOKENS t1::RTOKENS t2::tl -> md_of_interm_list (FTOKENS(List.rev_append t1 (List.rev t2))::tl) | FTOKENS t1::FTOKENS t2::tl -> md_of_interm_list (FTOKENS(t1@t2)::tl) | FTOKENS t :: tl -> if html then Raw(L.string_of_tokens t) :: md_of_interm_list tl else main_loop ~html [] [Word ""] t @ md_of_interm_list tl | RTOKENS t :: tl -> md_of_interm_list (FTOKENS(List.rev t) :: tl) let md_of_interm_list l = md_of_interm_list true l let string_of_tagstatus tagstatus = let b = Buffer.create 64 in List.iter (function | Open t -> bprintf b "{B/Open %s}" t | Awaiting t -> bprintf b "{B/Awaiting %s}" t ) tagstatus; Buffer.contents b end in let add_token_to_body x body = match body with | T.RTOKENS r :: body -> T.RTOKENS(x::r)::body | _ -> T.RTOKENS[x] :: body in let rec loop (body:T.interm list) attrs tagstatus tokens = if debug then eprintf "(OMD) 3333 BHTML loop body=%S tagstatus=%S %S\n%!" (Omd_backend.sexpr_of_md(T.md_of_interm_list body)) (T.string_of_tagstatus tagstatus) (L.destring_of_tokens tokens); match tokens with | [] -> begin match tagstatus with | [] -> Some(body, tokens) | T.Open t :: _ when StringSet.mem t html_void_elements -> Some(body, tokens) | _ -> if debug then eprintf "(OMD) 3401 BHTML Not enough to read\n%!"; None end | Lessthans n::tokens -> begin match tagstatus with | T.Awaiting _ :: _ -> None | _ -> if debug then eprintf "(OMD) 3408 BHTML loop\n%!"; loop (add_token_to_body (if n = 0 then Lessthan else Lessthans(n-1)) body) attrs tagstatus (Lessthan::tokens) end (* self-closing tags *) | Slash::Greaterthan::tokens -> begin match tagstatus with | T.Awaiting(tagname) :: tagstatus when StringSet.mem tagname html_void_elements -> loop [T.HTML(tagname, attrs, [])] [] tagstatus tokens | _ -> if debug then eprintf "(OMD) 3419 BHTML loop\n%!"; loop (add_token_to_body Slash (add_token_to_body Greaterthan body)) attrs tagstatus tokens end (* closing the tag opener *) | Lessthan::Slash::(Word(tagname) as w) ::(Greaterthan|Greaterthans _ as g)::tokens -> begin match tagstatus with | T.Open t :: _ when t = tagname -> if debug then eprintf "(OMD) 3375 BHTML properly closing %S\n%!" t; Some(body, (match g with | Greaterthans 0 -> Greaterthan :: tokens | Greaterthans n -> Greaterthans(n-1) :: tokens | _ -> tokens)) | T.Open t :: _ -> if debug then eprintf "(OMD) 3379 BHTML wrongly closing %S with %S 1\n%!" t tagname; loop (T.RTOKENS[g;w;Slash;Lessthan]::body) [] tagstatus tokens | T.Awaiting t :: _ -> if debug then eprintf "(OMD) 3383 BHTML wrongly closing %S with %S 2\n%!" t tagname; if !mediatypetextomd <> [] then raise (Orphan_closing(t, lexemes, (match g with | Greaterthans 0 -> Greaterthan::tokens | Greaterthans n -> Greaterthans(n-1)::tokens | _ -> tokens))) else None | [] -> if debug then eprintf "(OMD) BHTML wrongly closing %S 3\n%!" tagname; None end (* tag *) | Lessthan::(Word(tagname) as word)::tokens when blind_html || StringSet.mem tagname htmltags_set -> if debug then eprintf "(OMD) 3489 BHTML tagname && StringSet.mem t html_void_elements -> None | T.Awaiting _ :: _ -> None | _ -> if attrs <> [] then begin if debug then eprintf "(OMD) 3496 BHTML tag %S but attrs <> []\n%!" tagname; None end else begin if debug then eprintf "(OMD) 3421 BHTML tag %S, tagstatus=%S, \ attrs=[], tokens=%S\n%!" tagname (T.string_of_tagstatus tagstatus) (L.destring_of_tokens tokens); match loop [] [] (T.Awaiting tagname::tagstatus) tokens with | None -> if debug then eprintf "(OMD) 3489 BHTML loop\n%!"; loop (add_token_to_body word (add_token_to_body Lessthan body)) attrs tagstatus tokens | Some(b, tokens) -> if debug then begin eprintf "(OMD) 3433 BHTML tagstatus=%S tokens=%S\n%!" (T.string_of_tagstatus tagstatus) (L.string_of_tokens tokens) end; Some(b@body, tokens) end end (* end of opening tag *) | Greaterthan::tokens -> begin match tagstatus with | T.Awaiting t :: tagstatus -> if List.mem ("media:type", Some "text/omd") attrs then ( mediatypetextomd := t :: !mediatypetextomd; try ignore(main_impl_rev ~html [] [] tokens); if debug then eprintf "(OMD) 3524 BHTML closing tag not found \ in %S\n%!" (L.destring_of_tokens tokens); warn (sprintf "Closing tag `%s' not found for text/omd zone." t); mediatypetextomd := List.tl !mediatypetextomd; None with Orphan_closing(tagname, delimiter, after) -> let before = let rec f r = function | Lessthans n as e :: tl -> begin match delimiter with | Lessthan::_ -> if Lessthan::tl = delimiter then List.rev (if n = 0 then Lessthan::r else Lessthans(n-1)::r) else f (e::r) tl | _ -> if tl == delimiter || tl = delimiter then List.rev r else f (e::r) tl end | e::tl as l -> if l == delimiter || l = delimiter then List.rev r else if tl == delimiter || tl = delimiter then List.rev (e::r) else f (e::r) tl | [] -> List.rev r in f [] tokens in if debug then eprintf "(OMD) 3552 BHTML tokens=%s delimiter=%s \ after=%s before=%s (tagname=t)=%b\n%!" (L.destring_of_tokens tokens) (L.destring_of_tokens delimiter) (L.destring_of_tokens after) (L.destring_of_tokens before) (tagname = t); (match !mediatypetextomd with | _ :: tl -> mediatypetextomd := tl | [] -> assert false); if tagname = t then loop [T.HTML (t, attrs, [T.MD (main_impl ~html [] [] (tag_setext main_loop before))])] [] tagstatus after else None ) else begin if debug then eprintf "(OMD) 3571 BHTML loop\n%!"; match loop body [] (T.Open t::tagstatus) tokens with | None -> if debug then eprintf "(OMD) 3519 BHTML \ Couldn't find an closing tag for %S\n%!" t; None | Some(body, l) -> if debug then eprintf "(OMD) 3498 BHTML Found a closing tag %s\n%!" t; match tagstatus with | _ :: _ -> loop [T.HTML(t, attrs, body)] [] tagstatus l | [] -> Some([T.HTML(t, attrs, body)], l) end | T.Open t :: _ -> if debug then eprintf "(OMD) 3591 BHTML Some `>` isn't for an opening tag\n%!"; loop (add_token_to_body Greaterthan body) attrs tagstatus tokens | [] -> if debug then eprintf "(OMD) 3542 BHTML tagstatus=[]\n%!"; None end (* maybe attribute *) | (Colon|Colons _|Underscore|Underscores _|Word _ as t)::tokens | (Space|Spaces _) ::(Colon|Colons _|Underscore|Underscores _|Word _ as t) ::tokens when (match tagstatus with | T.Awaiting _ :: _ -> true | _ -> false) -> begin let module Attribute_value = struct type t = Empty of name | Named of name | Void and name = string end in let open Attribute_value in let rec extract_attribute accu = function | (Space | Spaces _ | Newline) :: tokens-> Empty(L.string_of_tokens(List.rev accu)), tokens | (Greaterthan|Greaterthans _) :: _ as tokens-> Empty(L.string_of_tokens(List.rev accu)), tokens | Equal :: tokens -> Named(L.string_of_tokens(List.rev accu)), tokens | Colon | Colons _ | Underscore | Underscores _ | Word _ | Number _ | Minus | Minuss _ | Dot | Dots _ as t :: tokens -> extract_attribute (t::accu) tokens | tokens -> Void, tokens in match extract_attribute [t] tokens with | Empty attributename, tokens -> (* attribute with no explicit value *) if debug then eprintf "(OMD) 3628 BHTML loop\n%!"; loop body ((attributename, None)::attrs) tagstatus tokens | Named attributename, tokens -> begin match tokens with | Quotes 0 :: tokens -> if debug then eprintf "(OMD) 3661 BHTML empty attribute 1 %S\n%!" (L.string_of_tokens tokens); loop body ((attributename, Some "")::attrs) tagstatus tokens | Quote :: tokens -> begin if debug then eprintf "(OMD) 3668 BHTML non empty attribute 1 %S\n%!" (L.string_of_tokens tokens); match fsplit ~excl:(function | Quotes _ :: _ -> true | _ -> false) ~f:(function | Quote::tl -> Split([], tl) | _ -> Continue) tokens with | None -> None | Some(at_val, tokens) -> if debug then eprintf "(OMD) 3654 BHTML loop\n%!"; loop body ((attributename, Some(L.string_of_tokens at_val)) ::attrs) tagstatus tokens end | Doublequotes 0 :: tokens -> begin if debug then eprintf "(OMD) 3690 BHTML empty attribute 2 %S\n%!" (L.string_of_tokens tokens); loop body ((attributename, Some "")::attrs) tagstatus tokens end | Doublequote :: tokens -> begin if debug then eprintf "(OMD) 3698 BHTML non empty attribute 2 %S\n%!" (L.string_of_tokens tokens); match fsplit ~excl:(function | Doublequotes _ :: _ -> true | _ -> false) ~f:(function | Doublequote::tl -> Split([], tl) | _ -> Continue) tokens with | None -> None | Some(at_val, tokens) -> if debug then eprintf "(OMD) 3622 BHTML %s=%S %s\n%!" attributename (L.string_of_tokens at_val) (L.destring_of_tokens tokens); loop body ((attributename, Some(L.string_of_tokens at_val)) ::attrs) tagstatus tokens end | _ -> None end | Void, _ -> None end | x::tokens as dgts when (match tagstatus with T.Open _ :: _ -> true | _ -> false) -> begin if debug then eprintf "(OMD) 3620 BHTML general %S\n%!" (L.string_of_tokens dgts); loop (add_token_to_body x body) attrs tagstatus tokens end | (Newline | Space | Spaces _) :: tokens when (match tagstatus with T.Awaiting _ :: _ -> true | _ -> false) -> begin if debug then eprintf "(OMD) 3737 BHTML spaces\n%!"; loop body attrs tagstatus tokens end | (Newlines _ as x) :: tokens when (match tagstatus with T.Awaiting _ :: _ -> true | _ -> false) -> begin if debug then eprintf "(OMD) 3827 BHTML newlines\n%!"; warn "there are empty lines in what may be an HTML block"; loop (add_token_to_body x body) attrs tagstatus tokens end | _ -> if debug then eprintf "(OMD) 3742 BHTML fallback with \ tokens=%s and tagstatus=%s\n%!" (L.destring_of_tokens tokens) (match tagstatus with | [] -> "None" | T.Awaiting _ :: _ -> "Awaiting" | T.Open _ :: _ -> "Open (can't be)"); (match tagstatus with | [] -> Some(body, tokens) | T.Awaiting tag :: _ -> warn (sprintf "expected to read an open HTML tag (%s), \ but found nothing" tag); None | T.Open tag :: _ -> warn (sprintf "expected to find the closing HTML tag for %s, \ but found nothing" tag); None) in if debug then eprintf "(OMD) 3408 BHTML loop\n%!"; match loop [] [] [] lexemes with | Some(h, rest) -> Some(T.md_of_interm_list h, rest) | None -> None in begin match read_html() with | Some(h, rest) -> main_impl_rev ~html (h@r) [Tag("HTMLBLOCK", empty_extension)] rest | None -> let text = L.string_of_token t in main_impl_rev ~html (Text(text ^ tagnametop)::r) [w] html_stuff end (* / end of block HTML. *) (* inline HTML *) | _, (Lessthan as t) ::((Word(tagnametop) as w) ::((Space|Spaces _|Greaterthan|Greaterthans _) ::_ as html_stuff) as tlx) -> if (strict_html && not(StringSet.mem tagnametop inline_htmltags_set)) || not(blind_html || StringSet.mem tagnametop htmltags_set) then begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tlx | Some(r, p, l) -> main_impl_rev ~html r p l end else let read_html() = let module T = struct type t = | Awaiting of string | Open of string type interm = | HTML of string * (string * string option) list * interm list | TOKENS of L.t | MD of Omd_representation.t let rec md_of_interm_list = function | [] -> [] | HTML(t, a, c)::tl -> Html(t, a, md_of_interm_list(List.rev c))::md_of_interm_list tl | MD md::tl -> md @ md_of_interm_list tl | TOKENS t1::TOKENS t2::tl -> md_of_interm_list (TOKENS(t1@t2)::tl) | TOKENS t :: tl -> main_impl ~html [] [Word ""] (t) @ md_of_interm_list tl let string_of_tagstatus tagstatus = let b = Buffer.create 64 in List.iter (function | Open t -> bprintf b "{I/Open %s}" t | Awaiting t -> bprintf b "{I/Awaiting %s}" t ) tagstatus; Buffer.contents b end in let add_token_to_body x body = T.TOKENS[x]::body in let rec loop (body:T.interm list) attrs tagstatus tokens = if debug then eprintf "(OMD) 3718 loop tagstatus=(%s) %s\n%!" (* eprintf "(OMD) 3718 loop tagstatus=(%s) body=(%s) %s\n%!" *) (T.string_of_tagstatus tagstatus) (* (Omd_backend.sexpr_of_md(T.md_of_interm_list body)) *) (L.destring_of_tokens tokens); match tokens with | [] -> begin match tagstatus with | [] -> Some(body, tokens) | T.Open(t)::_ when StringSet.mem t html_void_elements -> Some(body, tokens) | _ -> if debug then eprintf "(OMD) Not enough to read for inline HTML\n%!"; None end | Lessthans n::tokens -> begin match tagstatus with | T.Awaiting _ :: _ -> None | _ -> loop (add_token_to_body (if n = 0 then Lessthan else Lessthans(n-1)) body) attrs tagstatus (Lessthan::tokens) end (* self-closing tags *) | Slash::Greaterthan::tokens -> begin match tagstatus with | T.Awaiting(tagname)::tagstatus when StringSet.mem tagname html_void_elements -> loop [T.HTML(tagname, attrs, [])] [] tagstatus tokens | _ -> loop (T.TOKENS[Greaterthan;Slash]::body) attrs tagstatus tokens end (* multiple newlines are not to be seen in inline HTML *) | Newlines _ :: _ -> if debug then eprintf "(OMD) Multiple lines in inline HTML\n%!"; (match tagstatus with | [] -> Some(body, tokens) | _ -> warn "multiple newlines in inline HTML"; None) (* maybe code *) | (Backquote | Backquotes _ as b)::tl -> begin match tagstatus with | T.Awaiting _ :: _ -> if debug then eprintf "(OMD) maybe code in inline HTML: no code\n%!"; None | [] -> if debug then eprintf "(OMD) maybe code in inline HTML: none\n%!"; None | T.Open _ :: _ -> if debug then eprintf "(OMD) maybe code in inline HTML: let's try\n%!"; begin match bcode [] [Space] tokens with | Some (((Code _::_) as c), p, l) -> if debug then eprintf "(OMD) maybe code in inline HTML: \ confirmed\n%!"; loop (T.MD c::body) [] tagstatus l | _ -> if debug then eprintf "(OMD) maybe code in inline HTML: failed\n%!"; loop (T.TOKENS[b]::body) [] tagstatus tl end end (* closing the tag *) | Lessthan::Slash::(Word(tagname) as w) ::(Greaterthan|Greaterthans _ as g)::tokens -> begin match tagstatus with | T.Open t :: _ when t = tagname -> if debug then eprintf "(OMD) 4136 properly closing %S tokens=%s\n%!" t (L.string_of_tokens tokens); Some(body, (match g with | Greaterthans 0 -> Greaterthan :: tokens | Greaterthans n -> Greaterthans(n-1) :: tokens | _ -> tokens)) | T.Open t :: _ -> if debug then eprintf "(OMD) 4144 \ wrongly closing %S with %S 1\n%!" t tagname; loop (T.TOKENS[g;w;Slash;Lessthan]::body) [] tagstatus tokens | T.Awaiting t :: _ -> if debug then eprintf "(OMD) 4149 \ wrongly closing %S with %S 2\n%!" t tagname; None | [] -> if debug then eprintf "(OMD) 4154 \ wrongly closing nothing with %S 3\n%!" tagname; None end (* tag *) | Lessthan::(Word(tagname) as word)::tokens when blind_html || (strict_html && StringSet.mem tagname inline_htmltags_set) || (not strict_html && StringSet.mem tagname htmltags_set) -> if debug then eprintf "(OMD) <%s...\n%!" tagname; begin match tagstatus with | T.Open(t) :: _ when t <> tagname && StringSet.mem t html_void_elements -> None | T.Awaiting _ :: _ -> None | _ -> begin if debug then eprintf "(OMD) 3796 tag %s, attrs=[]\n%!" tagname; match loop [] [] (T.Awaiting tagname::tagstatus) tokens with | None -> loop (T.TOKENS[word;Lessthan]::body) attrs tagstatus tokens | Some(b,tokens) -> Some(b@body, tokens) end end (* end of opening tag *) | Greaterthan::tokens -> if debug then eprintf "(OMD) 4185 end of opening tag tokens=%s \ tagstatus=%s\n%!" (L.string_of_tokens tokens) (T.string_of_tagstatus tagstatus); begin match tagstatus with | T.Awaiting t :: tagstatus as ts -> begin match loop body [] (T.Open t::tagstatus) tokens with | None -> if debug then eprintf "(OMD) 4186 \ Couldn't find an closing tag for %S\n%!" t; None | Some(b, tokens) -> if debug then eprintf "(OMD) 4192 Found a closing tag %s ts=%s \ tokens=%s\n%!" t (T.string_of_tagstatus ts) (L.string_of_tokens tokens); match tagstatus with | [] -> Some(T.HTML(t, attrs, b)::body, tokens) | _ -> (* Note: we don't care about the value of [attrs] here because in we have a [tagstatus] matches [T.Open _ :: _] and there's a corresponding filter that will take care of attrs that will take care of it. *) loop (T.HTML(t, attrs, b)::body) [] tagstatus tokens end | T.Open t :: _ -> if debug then eprintf "(OMD) Turns out an `>` isn't for an opening tag\n%!"; loop (T.TOKENS[Greaterthan]::body) attrs tagstatus tokens | [] -> if debug then eprintf "(OMD) 4202 tagstatus=[]\n%!"; None end (* maybe attribute *) | (Colon|Colons _|Underscore|Underscores _|Word _ as t)::tokens | (Space|Spaces _) ::(Colon|Colons _|Underscore|Underscores _|Word _ as t) ::tokens when (match tagstatus with | T.Awaiting _ :: _ -> true | _ -> false) -> begin let module Attribute_value = struct type t = Empty of name | Named of name | Void and name = string end in let open Attribute_value in let rec extract_attribute accu = function | (Space | Spaces _ | Newline) :: tokens-> Empty(L.string_of_tokens(List.rev accu)), tokens | (Greaterthan|Greaterthans _) :: _ as tokens-> Empty(L.string_of_tokens(List.rev accu)), tokens | Equal :: tokens -> Named(L.string_of_tokens(List.rev accu)), tokens | Colon | Colons _ | Underscore | Underscores _ | Word _ | Number _ | Minus | Minuss _ | Dot | Dots _ as t :: tokens -> extract_attribute (t::accu) tokens | tokens -> Void, tokens in match extract_attribute [t] tokens with | Empty attributename, tokens -> (* attribute with no explicit value *) loop body ((attributename, None)::attrs) tagstatus tokens | Named attributename, tokens -> begin match tokens with | Quotes 0 :: tokens -> if debug then eprintf "(OMD) (IHTML) empty attribute 1 %S\n%!" (L.string_of_tokens tokens); loop body ((attributename, Some "")::attrs) tagstatus tokens | Quote :: tokens -> begin if debug then eprintf "(OMD) (IHTML) non empty attribute 1 %S\n%!" (L.string_of_tokens tokens); match fsplit ~excl:(function | Quotes _ :: _ -> true | _ -> false) ~f:(function | Quote::tl -> Split([], tl) | _ -> Continue) tokens with | None -> None | Some(at_val, tokens) -> loop body ((attributename, Some(L.string_of_tokens at_val)) ::attrs) tagstatus tokens end | Doublequotes 0 :: tokens -> begin if debug then eprintf "(OMD) (IHTML) empty attribute 2 %S\n%!" (L.string_of_tokens tokens); loop body ((attributename, Some "")::attrs) tagstatus tokens end | Doublequote :: tokens -> begin if debug then eprintf "(OMD) (IHTML) non empty attribute 2 %S\n%!" (L.string_of_tokens tokens); match fsplit ~excl:(function | Doublequotes _ :: _ -> true | _ -> false) ~f:(function | Doublequote::tl -> Split([], tl) | _ -> Continue) tokens with | None -> None | Some(at_val, tokens) -> if debug then eprintf "(OMD) (3957) %s=%S %s\n%!" attributename (L.string_of_tokens at_val) (L.destring_of_tokens tokens); loop body ((attributename, Some(L.string_of_tokens at_val)) ::attrs) tagstatus tokens end | _ -> None end | Void, _ -> None end | Backslash::x::tokens when (match tagstatus with T.Open _ :: _ -> true | _ -> false) -> loop (T.TOKENS[Backslash;x]::body) attrs tagstatus tokens | Backslashs(n)::x::tokens when (match tagstatus with T.Open _ :: _ -> true | _ -> false) && n mod 2 = 1 -> loop (T.TOKENS[Backslashs(n);x]::body) attrs tagstatus tokens | x::tokens when (match tagstatus with T.Open _ :: _ -> true | _ -> false) -> begin if debug then eprintf "(OMD) (4161) general %S\n%!" (L.string_of_tokens (x::tokens)); loop (T.TOKENS[x]::body) attrs tagstatus tokens end | (Newline | Space | Spaces _) :: tokens when (match tagstatus with T.Awaiting _ :: _ -> true | _ -> false) -> begin if debug then eprintf "(OMD) (4289) spaces\n%!"; loop body attrs tagstatus tokens end | _ -> if debug then eprintf "(OMD) (4294) \ fallback with tokens=%s and tagstatus=%s\n%!" (L.destring_of_tokens tokens) (T.string_of_tagstatus tagstatus); (match tagstatus with | [] -> Some(body, tokens) | T.Awaiting tag :: _ -> warn (sprintf "expected to read an open HTML tag (%s), \ but found nothing" tag); None | T.Open tag :: _ -> warn (sprintf "expected to find the closing HTML tag for %s, \ but found nothing" tag); None) in match loop [] [] [] lexemes with | Some(html, rest) -> Some(T.md_of_interm_list html, rest) | None -> None in begin match read_html() with | Some(h, rest) -> main_impl_rev ~html (h@r) [Greaterthan] rest | None -> let text = L.string_of_token t in main_impl_rev ~html (Text(text ^ tagnametop)::r) [w] html_stuff end (* / end of inline HTML. *) (* < : emails *) | _, (Lessthan as t)::tl -> begin match maybe_autoemail r previous lexemes with | Some(r,p,l) -> main_impl_rev ~html r p l | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end end (* line breaks *) | _, Newline::tl -> main_impl_rev ~html (NL::r) [Newline] tl | _, Newlines _::tl -> main_impl_rev ~html (NL::NL::r) [Newline] tl (* [ *) | _, (Obracket as t)::tl -> begin match maybe_link main_loop r previous tl with | Some(r, p, l) -> main_impl_rev ~html r p l | None -> match maybe_reference main_loop rc r previous tl with | Some(r, p, l) -> main_impl_rev ~html r p l | None -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end end (* img *) | _, (Exclamation|Exclamations _ as t) ::Obracket::Cbracket::Oparenthesis::tl -> (* image insertion with no "alt" *) (* ![](/path/to/img.jpg) *) (try begin let b, tl = read_until_cparenth ~bq:true ~no_nl:false tl in (* new lines there are allowed *) let r (* updated result *) = match t with | Exclamations 0 -> Text "!" :: r | Exclamations n -> Text(String.make (n+1) '!') :: r | _ -> r in match try Some(read_until_space ~bq:false ~no_nl:true b) with Premature_ending -> None with | Some(url, tls) -> let title, should_be_empty_list = read_until_dq ~bq:true (snd (read_until_dq ~bq:true tls)) in let url = L.string_of_tokens url in let title = L.string_of_tokens title in main_impl_rev ~html (Img("", url, title) :: r) [Cparenthesis] tl | None -> let url = L.string_of_tokens b in main_impl_rev ~html (Img("", url, "") :: r) [Cparenthesis] tl end with | NL_exception -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) (* img ref *) | _, (Exclamation as t) ::Obracket::Cbracket::Obracket::tl -> (* ref image insertion with no "alt" *) (* ![][ref] *) (try let id, tl = read_until_cbracket ~bq:true ~no_nl:true tl in let fallback = extract_fallback main_loop tl lexemes in let id = L.string_of_tokens id in main_impl_rev ~html (Img_ref(rc, id, "", fallback) :: r) [Cbracket] tl with NL_exception -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) (* img *) | _, (Exclamation|Exclamations _ as t)::Obracket::tl -> (* image insertion with "alt" *) (* ![Alt text](/path/to/img.jpg "Optional title") *) (try match read_until_cbracket ~bq:true tl with | alt, Oparenthesis::ntl -> (try let alt = L.string_of_tokens alt in let path_title, rest = read_until_cparenth ~bq:true ~no_nl:false ntl in let path, title = try read_until_space ~bq:true ~no_nl:true path_title with Premature_ending -> path_title, [] in let title, nothing = if title <> [] then read_until_dq ~bq:true (snd(read_until_dq ~bq:true title)) else [], [] in if nothing <> [] then raise NL_exception; (* caught right below *) let r = match t with | Exclamations 0 -> Text "!" :: r | Exclamations n -> Text(String.make (n+1) '!') :: r | _ -> r in let path = L.string_of_tokens path in let title = L.string_of_tokens title in main_impl_rev ~html (Img(alt, path, title) :: r) [Cparenthesis] rest with | NL_exception (* if NL_exception was raised, then fall back to "text" *) | Premature_ending -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) | alt, Obracket::Word(id)::Cbracket::ntl | alt, Obracket::(Space|Spaces _)::Word(id)::Cbracket::ntl | alt, Obracket::(Space|Spaces _)::Word(id)::(Space|Spaces _) ::Cbracket::ntl | alt, Obracket::Word(id)::(Space|Spaces _)::Cbracket::ntl -> let fallback = extract_fallback main_loop ntl lexemes in let alt = L.string_of_tokens alt in main_impl_rev ~html (Img_ref(rc, id, alt, fallback)::r) [Cbracket] ntl | alt, Obracket::((Newline|Space|Spaces _|Word _|Number _)::_ as ntl) -> (try match read_until_cbracket ~bq:true ~no_nl:false ntl with | [], rest -> raise Premature_ending | id, rest -> let fallback = extract_fallback main_loop rest lexemes in let id = L.string_of_tokens id in let alt = L.string_of_tokens alt in main_impl_rev ~html (Img_ref(rc, id, alt, fallback)::r) [Cbracket] rest with | Premature_ending | NL_exception -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) | _ -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end with | Premature_ending -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end ) | _, (At|Bar|Caret|Cbrace|Colon|Comma|Cparenthesis|Cbracket|Dollar |Dot|Doublequote|Exclamation|Equal|Minus|Obrace|Oparenthesis |Percent|Plus|Question|Quote|Semicolon|Slash|Tab|Tilde |Greaterthan as t)::tl -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end | _, (Number _ as t):: tl -> begin match maybe_extension extensions r previous lexemes with | None -> main_impl_rev ~html (Text(L.string_of_token t)::r) [t] tl | Some(r, p, l) -> main_impl_rev ~html r p l end | _, (Ats _ | Bars _ | Carets _ | Cbraces _ | Cbrackets _ | Colons _ | Commas _ | Cparenthesiss _ | Dollars _ | Dots _ | Doublequotes _ | Equals _ | Exclamations _ | Greaterthans _ | Lessthans _ | Minuss _ | Obraces _ | Obrackets _ | Oparenthesiss _ | Percents _ | Pluss _ | Questions _ | Quotes _ | Semicolons _ | Slashs _ | Stars _ | Tabs _ | Tildes _ | Underscores _ as tk) :: tl -> begin match maybe_extension extensions r previous lexemes with | None -> let tk0, tks = L.split_first tk in let text = L.string_of_token tk0 in main_impl_rev ~html (Text text :: r) [tk0] (tks :: tl) | Some(r, p, l) -> main_impl_rev ~html r p l end and main_impl ~html (r:r) (previous:p) (lexemes:l) = (* if debug then eprintf "(OMD) main_impl html=%b\n%!" html; *) assert_well_formed lexemes; List.rev (main_loop_rev ~html r previous lexemes) and main_loop ?(html=false) (r:r) (previous:p) (lexemes:l) = main_impl ~html r previous lexemes and main_loop_rev ?(html=false) (r:r) (previous:p) (lexemes:l) = main_impl_rev ~html r previous lexemes let main_parse lexemes = main_loop [] [] (tag_setext main_loop lexemes) let parse lexemes = main_parse lexemes end let default_parse ?(extensions=[]) ?(default_lang="") lexemes = let e = extensions and d = default_lang in let module E = Default_env(Unit) in let module M = Make(struct include E let extensions = e let default_lang = d end) in M.main_parse lexemes omd-1.3.2/src/omd_parser.mli000066400000000000000000000350641425763206400157420ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) (** Beware: the functions in this module may raise exceptions! If you use them, you should be careful. *) type r = Omd_representation.t (** accumulator (beware, reversed tokens) *) and p = Omd_representation.tok list (** context information: previous elements *) and l = Omd_representation.tok list (** tokens to parse *) and main_loop = ?html:bool -> r -> (* accumulator (beware, reversed tokens) *) p -> (* info: previous elements *) l -> (* tokens to parse *) Omd_representation.t (* final result *) (** most important loop, which has to be given as an argument *) val default_parse : ?extensions:Omd_representation.extensions -> ?default_lang:string -> l -> Omd_representation.t (** Translate tokens to Markdown representation. @param lang language for blocks of code where it was not specified. Default: [""]. *) module type Env = sig val rc: Omd_representation.ref_container (** reference container *) val extensions : Omd_representation.extensions (** list of parser extensions *) val default_lang : string (** default language for code blocks *) val gh_uemph_or_bold_style : bool (** flag: bold/emph using using underscores is by default github-style, which means that underscores inside words are left as underscore, rather than special characters, because it's more convenient. However it is also less expressive because then you can't bold/emph a part of a word. You might want to set this flag to false. *) val blind_html : bool (** flag: if true, will not check whether a used HTML tag actually exists in HTML. *) val strict_html : bool (** flag: if true, will only accept known inline HTML tags in inline HTML. *) val warning : bool (** flag: if true, will output warnings *) val warn_error : bool (** flag: if true, will convert warnings to errors *) end module Default_env : functor (Unit: sig end) -> Env module Make : functor (Env : Env) -> sig val rc: Omd_representation.ref_container (** reference container *) val extensions : Omd_representation.extensions (** list of parser extensions *) val default_lang : string (** default language for code blocks *) val gh_uemph_or_bold_style : bool (** flag: bold/emph using using underscores is by default github-style, which means that underscores inside words are left as underscore, rather than special characters, because it's more convenient. However it is also less expressive because then you can't bold/emph a part of a word. You might want to set this flag to false. *) val blind_html : bool (** flag: if true, will not check whether a used HTML tag actually exists in HTML. *) val strict_html : bool (** flag: if true, will only accept known inline HTML tags in inline HTML. *) val htmlcodes_set : Omd_utils.StringSet.t (** set of known HTML codes *) val inline_htmltags_set : Omd_utils.StringSet.t (** set of known inline HTML tags *) val htmltags_set : Omd_utils.StringSet.t (** All known HTML tags *) val unindent_rev : int -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list (** [unindent_rev n l] returns the same couple as [unindent n l] except that the first element (which is a list) is reversed. This function is used for lists. *) val unindent : int -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list (** [unindent n l] returns [(unindented, rest)] where [unindented] is the consecutive lines of [l] that are indented with at least [n] spaces, and de-indented by [n] spaces. If [l] starts with a line that is indented by less than [n] spaces, then it returns [([], l)]. (* This function is used for lists, so it does not require [n] *) (* spaces on every single line, but only on some specific ones of them. *) This function is used for lists and blockquotes. *) (* val unindent_strict_rev : *) (* int -> *) (* Omd_representation.tok list -> *) (* Omd_representation.tok list * Omd_representation.tok list *) (* (\** [unindent_strict_rev n l] returns the same couple as [unindent n l] *) (* except that the first element (which is a list) is reversed. *) (* This function is used for blockquotes. *\) *) (* val unindent_strict : *) (* int -> *) (* Omd_representation.tok list -> *) (* Omd_representation.tok list * Omd_representation.tok list *) (* (\** [unindent_strict n l] returns [(unindented, rest)] where [unindented] is *) (* the consecutive lines of [l] that are indented with at least [n] *) (* spaces, and de-indented by [n] spaces. If [l] starts with a line *) (* that is indented by less than [n] spaces, then it returns [([], l)]. *) (* This function is used for blockquotes. *) (* *\) *) val is_blank : Omd_representation.tok list -> bool (** [is_blank l] returns [true] if [l] only contains blanks, which are spaces and newlines. *) val semph_or_bold : int -> Omd_representation.tok list -> (Omd_representation.tok list * Omd_representation.tok list) option (** [semph_or_bold n l] returns [None] if [l] doesn't start with a bold/emph phrase (marked using stars), else it returns [Some(x,y)] where [x] is the emph and/or bold phrase at the beginning of [l] and [y] is the rest of [l]. *) val sm_uemph_or_bold : int -> Omd_representation.tok list -> (Omd_representation.tok list * Omd_representation.tok list) option (** [sm_uemph_or_bold n l] returns [None] if [l] doesn't start with a bold/emph phrase (marked using underscores), else it returns [Some(x,y)] where [x] is the emph and/or bold phrase at the beginning of [l] and [y] is the rest of [l]. *) val gh_uemph_or_bold : int -> Omd_representation.tok list -> (Omd_representation.tok list * Omd_representation.tok list) option (** [gh_uemph_or_bold n l] returns [None] if [l] doesn't start with a bold/emph phrase (marked using underscores), else it returns [Some(x,y)] where [x] is the emph and/or bold phrase at the beginning of [l] and [y] is the rest of [l]. *) val uemph_or_bold : int -> Omd_representation.tok list -> (Omd_representation.tok list * Omd_representation.tok list) option (** [uemph_or_bold n l] returns [None] if [l] doesn't start with a bold/emph phrase (marked using underscores), else it returns [Some(x,y)] where [x] is the emph and/or bold phrase at the beginning of [l] and [y] is the rest of [l]. N.B. if [!gh_uemph_or_bold_style] then in Github style (i.e., underscores inside words are considered as underscores). *) val eat_blank : Omd_representation.tok list -> Omd_representation.tok list (** [eat_blank l] returns [l] where all blanks at the beginning of the list have been removed (it stops removing as soon as it meets an element that is not a blank). Blanks are spaces and newlines only. *) val tag__maybe_h1 : main_loop -> Omd_representation.tok (** [tag__maybe_h1 main_loop] is a tag that is injected everywhere that might preceed a H1 title. It needs [main_loop] as argument because it is used to parse the contents of the titles. *) val tag__maybe_h2 : main_loop -> Omd_representation.tok (** [tag__maybe_h2 main_loop] is the same as [tag__maybe_h1 main_loop] but for H2. *) val tag__md : Omd_representation.t -> Omd_representation.tok (** [tag__md md] encapsulates [md] to make it a value of type [tok]. Its purpose is to inject some pre-parsed markdown (i.e., [md] of type [t]) in a yet-to-parse token stream of type [tok]. *) val tag_setext : main_loop -> Omd_representation.tok list -> Omd_representation.tok list (** Tag used for the lines that *might* be titles using setext-style. *) val hr_m : l -> l option (** [hr_m l] returns [Some nl] where [nl] is the remaining of [l] if [l] contains a horizontal rule "drawn" with dashes. If there's no HR, then returns [None].*) val hr_s : l -> l option (** [hr_s l] is the same as [hr_m l] but for horizontal rules "drawn" with stars instead. *) exception NL_exception exception Premature_ending val read_until_gt : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_lt : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_cparenth : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_oparenth : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_dq : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_q : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_obracket : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_cbracket : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_space : ?bq:bool -> ?no_nl:bool -> Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list val read_until_newline : Omd_representation.tok list -> Omd_representation.tok list * Omd_representation.tok list (** [read_until_...] are functions that read from a token list and return two token lists: the first one is the tokens read until a specific token is met, and the second one is the remainder. The particularity of these functions is that they do consider backslash-escaped characters and closing characters. For instance, [read_until_gt "1 < 2 > 3 > 4"] returns ["1 < 2 > 3 ", " 4"]: note that the ">" before " 4" has disappeared and that [read_until_gt] takes a [tok list] (not a string) and returns a couple of [tok list] (not a couple of strings), the string notation is used here for concision. Until otherwise noted, those functions do *not* consider backquote-trapped sections. For instance, [read_until_gt "1 < 2 > 3 `>` 4"] returns ["1 < 2 > 3 `", "` 4"]. If you use these functions, you should make sure that they do what you think they do (i.e., do look at the code). If the expected characters are not found, the exception [Premature_ending] is raised. For instance, [read_until_gt "1 < > 3"] raises [Premature_ending]. If [no_nl] is [true] (default value for [no_nl] is [false]) and ['\n'] occurs before the splitting character, then [NL_exception] is raised. *) val read_title : main_loop -> int -> r -> p -> l -> (r * p * l) option (** [read_title main_loop n r p l] returns [Some(r,p,l)] if it succeeds, [None] otherwise. [read_title main_loop n r p l] expects to read a [n]-level hash-declared title from [l], where the hashes have *already* been *removed*. If [n] is not between 1 and 6 (included), then it returns [None]. [main_loop] is used to parse the contents of the title. [r] and [p] are the classical "result" and "previous" parameters. *) val maybe_extension : Omd_representation.extensions -> r -> p -> l -> (r * p * l) option (** [maybe_extension e r p l] returns [None] if there is no extension or if extensions haven't had any effect, returns [Some(nr, np, nl)] if at least one extension has applied successfully. *) val emailstyle_quoting : main_loop -> r -> p -> l -> (r * p * l) option (** [emailstyle_quoting main_loop r p l] returns [Some(r,p,l)] with [r] being the updated result, [p] being the last parsed token and [l] being the remaining tokens to parse. If [emailstyle_quoting] fails, then it returns [None], in which case its user is advise to investigate why it returns [None] because there's possibly a real problem. *) val maybe_reference : main_loop -> Omd_representation.ref_container -> r -> p -> l -> (r * p * l) option (** [maybe_reference] tries to parse a reference, a reference definition or a github-style short reference (e.g., [foo] as a shortcut for [foo][]), and returns [Some(r,p,l)] if it succeeds, [None] otherwise. *) val maybe_link : main_loop -> r -> p -> l -> (r * p * l) option (** [maybe_link] tries to parse a link, and returns [Some(r,p,l)] if it succeeds, [None] otherwise. *) val parse_list : main_loop -> r -> p -> l -> r * p * l (** [parse_list main_loop r p l] parses a list from [l]. ***Important property*** It is considered in Omd that a sub-list is always more indented than the item that contains it (so, 2 items with different indentations cannot have the direct same parent). *) val make_paragraphs : Omd_representation.t -> Omd_representation.t (** Since [Omd_parser.parse] doesn't build paragraphs, if you want Markdown-style paragraphs, you need to apply this function to the result of [Omd_parser.parse]. *) val bcode : ?default_lang:Omd_representation.name -> r -> p -> l -> (r * p * l) option (** [bcode default_lang r p l] tries to parse some code that's delimited by backquotes, and returns [Some(r,p,l)] if it succeeds, [None] otherwise. *) val icode : ?default_lang:Omd_representation.name -> r -> p -> l -> (r * p * l) option (** [icode default_lang r p l] tries to parse some code that's delimited by space indentation. It should always return [Some(r,p,l)], if it returns [None] it means that it's been misused or there's a bug. *) val main_loop_rev : ?html:bool -> r -> p -> l -> r val main_loop : ?html:bool -> r -> p -> l -> Omd_representation.t val main_parse : Omd_representation.tok list -> Omd_representation.t val parse : Omd_representation.tok list -> Omd_representation.t end omd-1.3.2/src/omd_representation.ml000066400000000000000000000302301425763206400173250ustar00rootroot00000000000000open Omd_utils open Printf (** references, instances created in [Omd_parser.main_parse] and accessed in the [Omd_backend] module. *) module R = Map.Make(String) class ref_container : object val mutable c : (string * string) R.t method add_ref : R.key -> string -> string -> unit method get_ref : R.key -> (string * string) option method get_all : (string * (string * string)) list end = object val mutable c = R.empty val mutable c2 = R.empty method get_all = R.bindings c method add_ref name title url = c <- R.add name (url, title) c; let ln = String.lowercase_ascii name in if ln <> name then c2 <- R.add ln (url, title) c2 method get_ref name = try let (url, title) as r = try R.find name c with Not_found -> let ln = String.lowercase_ascii name in try R.find ln c with Not_found -> R.find ln c2 in Some r with Not_found -> None end type element = | H1 of t | H2 of t | H3 of t | H4 of t | H5 of t | H6 of t | Paragraph of t | Text of string | Emph of t | Bold of t | Ul of t list | Ol of t list | Ulp of t list | Olp of t list | Code of name * string | Code_block of name * string | Br | Hr | NL | Url of href * t * title | Ref of ref_container * name * string * fallback | Img_ref of ref_container * name * alt * fallback | Html of name * (string * string option) list * t | Html_block of name * (string * string option) list * t | Html_comment of string | Raw of string | Raw_block of string | Blockquote of t | Img of alt * src * title | X of < name : string; to_html : ?indent:int -> (t -> string) -> t -> string option; to_sexpr : (t -> string) -> t -> string option; to_t : t -> t option > and fallback = < to_string : string ; to_t : t > and name = string and alt = string and src = string and href = string and title = string and t = element list let rec loose_compare t1 t2 = match t1,t2 with | H1 e1::tl1, H1 e2::tl2 | H2 e1::tl1, H2 e2::tl2 | H3 e1::tl1, H3 e2::tl2 | H4 e1::tl1, H4 e2::tl2 | H5 e1::tl1, H5 e2::tl2 | H6 e1::tl1, H6 e2::tl2 | Emph e1::tl1, Emph e2::tl2 | Bold e1::tl1, Bold e2::tl2 | Blockquote e1::tl1, Blockquote e2::tl2 | Paragraph e1::tl1, Paragraph e2::tl2 -> (match loose_compare e1 e2 with | 0 -> loose_compare tl1 tl2 | i -> i) | Ul e1::tl1, Ul e2::tl2 | Ol e1::tl1, Ol e2::tl2 | Ulp e1::tl1, Ulp e2::tl2 | Olp e1::tl1, Olp e2::tl2 -> (match loose_compare_lists e1 e2 with | 0 -> loose_compare tl1 tl2 | i -> i) | (Code _ as e1)::tl1, (Code _ as e2)::tl2 | (Br as e1)::tl1, (Br as e2)::tl2 | (Hr as e1)::tl1, (Hr as e2)::tl2 | (NL as e1)::tl1, (NL as e2)::tl2 | (Html _ as e1)::tl1, (Html _ as e2)::tl2 | (Html_block _ as e1)::tl1, (Html_block _ as e2)::tl2 | (Raw _ as e1)::tl1, (Raw _ as e2)::tl2 | (Raw_block _ as e1)::tl1, (Raw_block _ as e2)::tl2 | (Html_comment _ as e1)::tl1, (Html_comment _ as e2)::tl2 | (Img _ as e1)::tl1, (Img _ as e2)::tl2 | (Text _ as e1)::tl1, (Text _ as e2)::tl2 -> (match compare e1 e2 with | 0 -> loose_compare tl1 tl2 | i -> i) | Code_block(l1,c1)::tl1, Code_block(l2,c2)::tl2 -> (match compare l1 l2, String.length c1 - String.length c2 with | 0, 0 -> (match compare c1 c2 with | 0 -> loose_compare tl1 tl2 | i -> i) | 0, 1 -> (match compare c1 (c2^"\n") with | 0 -> loose_compare tl1 tl2 | i -> i) | 0, -1 -> (match compare (c1^"\n") c2 with | 0 -> loose_compare tl1 tl2 | i -> i) | i, _ -> i ) | Url (href1, t1, title1)::tl1, Url (href2, t2, title2)::tl2 -> (match compare href1 href2 with | 0 -> (match loose_compare t1 t2 with | 0 -> (match compare title1 title2 with | 0 -> loose_compare tl1 tl2 | i -> i) | i -> i) | i -> i) | Ref (ref_container1, name1, x1, fallback1)::tl1, Ref (ref_container2, name2, x2, fallback2)::tl2 | Img_ref (ref_container1, name1, x1, fallback1)::tl1, Img_ref (ref_container2, name2, x2, fallback2)::tl2 -> (match compare (name1, x1) (name2, x2) with | 0 -> let cff = if fallback1#to_string = fallback2#to_string then 0 else loose_compare (fallback1#to_t) (fallback2#to_t) in if cff = 0 then match compare (ref_container1#get_all) (ref_container2#get_all) with | 0 -> loose_compare tl1 tl2 | i -> i else cff | i -> i) | X e1::tl1, X e2::tl2 -> (match compare (e1#name) (e2#name) with | 0 -> (match compare (e1#to_t) (e2#to_t) with | 0 -> loose_compare tl1 tl2 | i -> i) | i -> i) | X _::_, _ -> 1 | _, X _::_ -> -1 | _ -> compare t1 t2 and loose_compare_lists l1 l2 = match l1, l2 with | [], [] -> 0 | e1::tl1, e2::tl2 -> (match loose_compare e1 e2 with | 0 -> loose_compare_lists tl1 tl2 | i -> i) | _, [] -> 1 | _ -> -1 type tok = (* Cs(n) means (n+2) times C *) | Ampersand | Ampersands of int | At | Ats of int | Backquote | Backquotes of int | Backslash | Backslashs of int | Bar | Bars of int | Caret | Carets of int | Cbrace | Cbraces of int | Colon | Colons of int | Comma | Commas of int | Cparenthesis | Cparenthesiss of int | Cbracket | Cbrackets of int | Dollar | Dollars of int | Dot | Dots of int | Doublequote | Doublequotes of int | Exclamation | Exclamations of int | Equal | Equals of int | Greaterthan | Greaterthans of int | Hash | Hashs of int | Lessthan | Lessthans of int | Minus | Minuss of int | Newline | Newlines of int | Number of string | Obrace | Obraces of int | Oparenthesis | Oparenthesiss of int | Obracket | Obrackets of int | Percent | Percents of int | Plus | Pluss of int | Question | Questions of int | Quote | Quotes of int | Semicolon | Semicolons of int | Slash | Slashs of int | Space | Spaces of int | Star | Stars of int | Tab | Tabs of int | Tilde | Tildes of int | Underscore | Underscores of int | Word of string | Tag of name * extension and extension = < parser_extension : t -> tok list -> tok list -> ((t * tok list * tok list) option); to_string : string > type extensions = extension list let empty_extension = object method parser_extension r p l = None method to_string = "" end let rec normalise_md l = if debug then eprintf "(OMD) normalise_md\n%!"; let rec loop = function | [NL;NL;NL;NL;NL;NL;NL;] | [NL;NL;NL;NL;NL;NL;] | [NL;NL;NL;NL;NL;] | [NL;NL;NL;NL;] | [NL;NL;NL;] | [NL;NL] | [NL] -> [] | [] -> [] | NL::NL::NL::tl -> loop (NL::NL::tl) | Text t1::Text t2::tl -> loop (Text(t1^t2)::tl) | NL::(((Paragraph _|H1 _|H2 _|H3 _|H4 _|H5 _|H6 _ |Code_block _|Ol _|Ul _|Olp _|Ulp _)::_) as tl) -> loop tl | Paragraph[Text " "]::tl -> loop tl | Paragraph[]::tl -> loop tl | Paragraph(p)::tl -> Paragraph(loop p)::loop tl | H1 v::tl -> H1(loop v)::loop tl | H2 v::tl -> H2(loop v)::loop tl | H3 v::tl -> H3(loop v)::loop tl | H4 v::tl -> H4(loop v)::loop tl | H5 v::tl -> H5(loop v)::loop tl | H6 v::tl -> H6(loop v)::loop tl | Emph v::tl -> Emph(loop v)::loop tl | Bold v::tl -> Bold(loop v)::loop tl | Ul v::tl -> Ul(List.map loop v)::loop tl | Ol v::tl -> Ol(List.map loop v)::loop tl | Ulp v::tl -> Ulp(List.map loop v)::loop tl | Olp v::tl -> Olp(List.map loop v)::loop tl | Blockquote v::tl -> Blockquote(loop v)::loop tl | Url(href,v,title)::tl -> Url(href,(loop v),title)::loop tl | Text _ | Code _ | Code_block _ | Br | Hr | NL | Ref _ | Img_ref _ | Html _ | Html_block _ | Html_comment _ | Raw _ | Raw_block _ | Img _ | X _ as v::tl -> v::loop tl in let a = loop l in let b = loop a in if a = b then a else normalise_md b let dummy_X = X (object method name = "dummy" method to_html ?(indent=0) _ _ = None method to_sexpr _ _ = None method to_t _ = None end) let rec visit f = function | [] -> [] | Paragraph v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Paragraph(visit f v)::visit f tl end | H1 v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> H1(visit f v)::visit f tl end | H2 v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> H2(visit f v)::visit f tl end | H3 v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> H3(visit f v)::visit f tl end | H4 v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> H4(visit f v)::visit f tl end | H5 v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> H5(visit f v)::visit f tl end | H6 v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> H6(visit f v)::visit f tl end | Emph v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Emph(visit f v)::visit f tl end | Bold v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Bold(visit f v)::visit f tl end | Ul v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Ul(List.map (visit f) v)::visit f tl end | Ol v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Ol(List.map (visit f) v)::visit f tl end | Ulp v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Ulp(List.map (visit f) v)::visit f tl end | Olp v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Olp(List.map (visit f) v)::visit f tl end | Blockquote v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Blockquote(visit f v)::visit f tl end | Url(href,v,title) as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Url(href,visit f v,title)::visit f tl end | Text v as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Code _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Code_block _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Ref _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Img_ref _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Html _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Html_block _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Html_comment _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Raw _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Raw_block _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Img _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | X _ as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> e::visit f tl end | Br as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Br::visit f tl end | Hr as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> Hr::visit f tl end | NL as e::tl -> begin match f e with | Some(l) -> l@visit f tl | None -> NL::visit f tl end omd-1.3.2/src/omd_representation.mli000066400000000000000000000135621425763206400175070ustar00rootroot00000000000000 module R : Map.S with type key = string class ref_container : object val mutable c : (string * string) R.t method add_ref : R.key -> string -> string -> unit method get_ref : R.key -> (string * string) option method get_all : (string * (string * string)) list end type element = | H1 of t | H2 of t | H3 of t | H4 of t | H5 of t | H6 of t | Paragraph of t | Text of string | Emph of t | Bold of t | Ul of t list | Ol of t list | Ulp of t list | Olp of t list | Code of name * string | Code_block of name * string | Br | Hr | NL | Url of href * t * title | Ref of ref_container * name * string * fallback | Img_ref of ref_container * name * alt * fallback | Html of name * (string * string option) list * t | Html_block of name * (string * string option) list * t | Html_comment of string | Raw of string | Raw_block of string | Blockquote of t | Img of alt * src * title | X of < name : string; to_html : ?indent:int -> (t -> string) -> t -> string option; to_sexpr : (t -> string) -> t -> string option; to_t : t -> t option > and fallback = < to_string : string ; to_t : t > and name = string and alt = string and src = string and href = string and title = string and t = element list type tok = Ampersand (* one & *) | Ampersands of int (* [Ampersands(n)] is (n+2) consecutive occurrences of & *) | At (* @ *) | Ats of int (* @@.. *) | Backquote (* ` *) | Backquotes of int (* ``.. *) | Backslash (* \\ *) | Backslashs of int (* \\\\.. *) | Bar (* | *) | Bars of int (* ||.. *) | Caret (* ^ *) | Carets of int (* ^^.. *) | Cbrace (* } *) | Cbraces of int (* }}.. *) | Colon (* : *) | Colons of int (* ::.. *) | Comma (* , *) | Commas of int (* ,,.. *) | Cparenthesis (* ) *) | Cparenthesiss of int (* )).. *) | Cbracket (* ] *) | Cbrackets of int (* ]].. *) | Dollar (* $ *) | Dollars of int (* $$.. *) | Dot (* . *) | Dots of int (* .... *) | Doublequote (* \034 *) | Doublequotes of int (* \034\034.. *) | Exclamation (* ! *) | Exclamations of int (* !!.. *) | Equal (* = *) | Equals of int (* ==.. *) | Greaterthan (* > *) | Greaterthans of int (* >>.. *) | Hash (* # *) | Hashs of int (* ##.. *) | Lessthan (* < *) | Lessthans of int (* <<.. *) | Minus (* - *) | Minuss of int (* --.. *) | Newline (* \n *) | Newlines of int (* \n\n.. *) | Number of string | Obrace (* { *) | Obraces of int (* {{.. *) | Oparenthesis (* ( *) | Oparenthesiss of int (* ((.. *) | Obracket (* [ *) | Obrackets of int (* [[.. *) | Percent (* % *) | Percents of int (* %%.. *) | Plus (* + *) | Pluss of int (* ++.. *) | Question (* ? *) | Questions of int (* ??.. *) | Quote (* ' *) | Quotes of int (* ''.. *) | Semicolon (* ; *) | Semicolons of int (* ;;.. *) | Slash (* / *) | Slashs of int (* //.. *) | Space (* *) | Spaces of int (* .. *) | Star (* * *) | Stars of int (* **.. *) | Tab (* \t *) | Tabs of int (* \t\t.. *) | Tilde (* ~ *) | Tildes of int (* ~~.. *) | Underscore (* _ *) | Underscores of int (* __.. *) | Word of string | Tag of name * extension (** Lexer's tokens. If you want to use the parser with an extended lexer, you may use the constructor [Tag] to implement the parser's extension. In the parser, [Tag] is used (at least) 3 times in order to represent metadata or to store data. The integers carried by constructors means that the represented character appears (n+2) times. So, [Ampersand(0)] is "&&". Notably, this allows to use the property that in the match case [Ampersand _ ->], we know there are at least 2 ampersands. This is particularly useful for some characters, such as newlines and spaces. It's not useful for all of them indeed but it has been designed this way for the sake of uniformity (one doesn't want to know by heart which constructor have that "at least 2" property and which haven't). *) and extension = < parser_extension : t -> tok list -> tok list -> ((t * tok list * tok list) option); to_string : string > (** - [parser_extension] is a method that takes the current state of the parser's data and returns None if nothing has been changed, otherwise it returns the new state. The current state of the parser's data is [(r, p, l)] where [r] is the result so far, [p] is the list of the previous tokens (it's typically empty or contains information on how many newlines we've just seen), and [l] is the remaining tokens to parse. - and [to_string] is a method that returns directly a string representation of the object (it's normal if it returns the empty string). *) type extensions = extension list (** One must use this type to extend the parser. It's a list of functions of type [extension]. They are processed in order (the head is applied first), so be careful about it. If you use it wrong, it will behave wrong. *) val empty_extension : extension (** An empty extension *) val loose_compare : t -> t -> int (** [loose_compare t1 t2] returns [0] if [t1] and [t2] are equivalent, otherwise it returns another number. *) val normalise_md : t -> t (** [normalise_md md] returns a copy of [md] where some elements have been factorized. *) val visit : (element -> t option) -> t -> t (** visitor for structures of type t: [visit f md] will return a new potentially altered copy of [md] that has been created by the visit of [md] by [f]. The function [f] takes each [element] (from [md]) and returns [Some t] if it has effectively been applied to [element], and [None] otherwise. When it returns [Some t], [t] replaces [element] in the copy of [md], and when it returns [None], either [element] is copied as it is in the copy of [md] or a visited version is copied instead (well, that depends on if [element] has elements inside of it or not). *) omd-1.3.2/src/omd_utils.ml000066400000000000000000000205151425763206400154300ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013/2014 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) open Printf let debug = let _DEBUG = try Some(Sys.getenv "DEBUG") with _ -> None and _OMD_DEBUG = try Some(Sys.getenv "OMD_DEBUG") with _ -> None in match _DEBUG, _OMD_DEBUG with | _, Some "false" -> false | Some _, None -> eprintf "omd: debug mode activated because DEBUG is set, \ you can deactivate the mode by unsetting DEBUG \ or by setting OMD_DEBUG to the string \"false\".\n%!"; true | None, None -> false | _, Some _ -> eprintf "omd: debug mode activated because OMD_DEBUG is set to a value that isn't the string \"false\".\n%!"; true exception Error of string let warn ?(we=false) msg = if we then raise (Error msg) else eprintf "(OMD) Warning: %s\n%!" msg let trackfix = try ignore(Sys.getenv "OMD_FIX"); eprintf "omd: tracking mode activated: token list are very often checked, \ it might take a *very* long time if your input is large.\n%!"; true with Not_found -> false let _ = if debug then Printexc.record_backtrace true let raise = if debug then (fun e -> eprintf "(OMD) Exception raised: %s\n%!" (Printexc.to_string e); raise e) else raise module StringSet : sig include Set.S with type elt = string val of_list : elt list -> t end = struct include Set.Make(String) let of_list l = List.fold_left (fun r e -> add e r) empty l end type 'a split = 'a list -> 'a split_action and 'a split_action = | Continue | Continue_with of 'a list * 'a list | Split of 'a list * 'a list let fsplit_rev ?(excl=(fun _ -> false)) ~(f:'a split) l : ('a list * 'a list) option = let rec loop accu = function | [] -> begin match f [] with | Split(left, right) -> Some(left@accu, right) | Continue_with(left, tl) -> loop (left@accu) tl | Continue -> None end | e::tl as l -> if excl l then None else match f l with | Split(left, right) -> Some(left@accu, right) | Continue_with(left, tl) -> loop (left@accu) tl | Continue -> loop (e::accu) tl in loop [] l let fsplit ?(excl=(fun _ -> false)) ~f l = match fsplit_rev ~excl:excl ~f:f l with | None -> None | Some(rev, l) -> Some(List.rev rev, l) let id_of_string ids s = let n = String.length s in let out = Buffer.create 0 in (* Put [s] into [b], replacing non-alphanumeric characters with dashes. *) let rec loop started i = if i = n then () else match s.[i] with | 'a' .. 'z' | 'A' .. 'Z' | '0' .. '9' as c -> Buffer.add_char out c ; loop true (i + 1) (* Don't want to start with dashes. *) | _ when not started -> loop false (i + 1) | _ -> Buffer.add_char out '-' ; loop false (i + 1) in loop false 0 ; let s' = Buffer.contents out in if s' = "" then "" else (* Find out the index of the last character in [s'] that isn't a dash. *) let last_trailing = let rec loop i = if i < 0 || s'.[i] <> '-' then i else loop (i - 1) in loop (String.length s' - 1) in (* Trim trailing dashes. *) ids#mangle @@ String.sub s' 0 (last_trailing + 1) (* only convert when "necessary" *) let htmlentities ?(md=false) s = let module Break = struct exception Break end in let b = Buffer.create 64 in let rec loop i = if i = String.length s then () else let () = match s.[i] with | ( '0' .. '9' | 'a' .. 'z' | 'A' .. 'Z' ) as c -> Buffer.add_char b c | '"' -> Buffer.add_string b """ | '\'' -> Buffer.add_string b "'" | '&' -> if md then begin try let () = match s.[i+1] with | '#' -> let rec ff j = match s.[j] with | '0' .. '9' -> ff (succ j) | ';' -> () | _ -> raise Break.Break in ff (i+2) | 'A' .. 'Z' | 'a' .. 'z' -> let rec ff j = match s.[j] with | 'A' .. 'Z' | 'a' .. 'z' -> ff (succ j) | ';' -> () | _ -> raise Break.Break in ff (i+2) | _ -> raise Break.Break in Buffer.add_string b "&" with _ -> Buffer.add_string b "&" end else Buffer.add_string b "&" | '<' -> Buffer.add_string b "<" | '>' -> Buffer.add_string b ">" | c -> Buffer.add_char b c in loop (succ i) in loop 0; Buffer.contents b let minimalize_blanks s = let l = String.length s in let b = Buffer.create l in let rec loop f i = if i = l then Buffer.contents b else match s.[i] with | ' ' | '\t' | '\n' -> loop true (succ i) | c -> if Buffer.length b > 0 && f then Buffer.add_char b ' '; loop false (succ i) in loop false 0 let rec eat f = function | [] -> [] | e::tl as l -> if f e then eat f tl else l let rec extract_html_attributes (html:string) = let rec cut_on_char_from s i c = match String.index_from s i c with | 0 -> "", String.sub s 1 (String.length s - 1) | j -> String.sub s i (j-i), String.sub s (j+1) (String.length s - (j+1)) in let remove_prefix_spaces s = if s = "" then s else if s.[0] <> ' ' then s else let rec loop i = if i = String.length s then String.sub s i (String.length s - i) else match s.[i] with | ' ' -> loop (i+1) | _ -> String.sub s i (String.length s - i) in loop 1 in let remove_suffix_spaces s = if s = "" then s else if s.[String.length s - 1] <> ' ' then s else let rec loop i = match s.[i] with | ' ' -> loop (i-1) | _ -> String.sub s 0 (i+1) in loop (String.length s - 1) in let rec loop s res i = if i = String.length s then res else match try Some (take_attribute s i) with Not_found -> None with | Some (((_,_) as a), new_s) -> loop new_s (a::res) 0 | None -> res and take_attribute s i = let name, after_eq = cut_on_char_from s i '=' in let name = remove_suffix_spaces name in let after_eq = remove_prefix_spaces after_eq in let value, rest = cut_on_char_from after_eq 1 after_eq.[0] in (name,value), remove_prefix_spaces rest in if (* Has it at least one attribute? *) try String.index html '>' < String.index html ' ' with Not_found -> true then [] else match html.[1] with | '<' | ' ' -> extract_html_attributes (remove_prefix_spaces (String.sub html 1 (String.length html - 1))) | _ -> try let html = snd (cut_on_char_from html 0 ' ') in loop (String.sub html 0 (String.index html '>')) [] 0 with Not_found -> [] let rec extract_inner_html (html:string) = let rec cut_on_char_from s i c = match String.index_from s i c with | 0 -> "", String.sub s 1 (String.length s - 1) | j -> String.sub s i (j-i), String.sub s (j+1) (String.length s - (j+1)) in let rec rcut_on_char_from s i c = match String.rindex_from s i c with | 0 -> "", String.sub s 1 (String.length s - 1) | j -> String.sub s 0 j, String.sub s (j+1) (String.length s - (j+1)) in let _, p = cut_on_char_from html 0 '>' in let r, _ = rcut_on_char_from p (String.length p - 1) '<' in r let html_void_elements = StringSet.of_list [ "img"; "input"; "link"; "meta"; "br"; "hr"; "source"; "wbr"; "param"; "embed"; "base"; "area"; "col"; "track"; "keygen"; ] let ( @ ) l1 l2 = List.rev_append (List.rev l1) l2 omd-1.3.2/src/omd_utils.mli000066400000000000000000000107361425763206400156050ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013/2014 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) val debug : bool (** Equals [true] if the environment variable DEBUG is set, or if the environment variable OMD_DEBUG is set to a string that is not ["false"]. *) val trackfix : bool exception Error of string val raise : exn -> 'a (** Same as [Pervasives.raise] except if [debug] equals true, in which case it prints a trace on stderr before raising the exception. *) val warn : ?we:bool -> string -> unit (** [warn we x] prints a warning with the message [x] if [we] is true, else raises [Omd_utils.Error x]. *) module StringSet : sig include Set.S with type elt = string val of_list : elt list -> t end (** Set of [string]. Cf. documentation of {!Set.S} *) type 'a split = 'a list -> 'a split_action (** Type of a split function *) and 'a split_action = (** Don't split yet *) | Continue (** Don't split yet but continue with those two lists instead of default *) | Continue_with of 'a list * 'a list (** Do split with this split scheme *) | Split of 'a list * 'a list (** Type of a split action *) val fsplit_rev : ?excl:('a list -> bool) -> f:'a split -> 'a list -> ('a list * 'a list) option (** [fsplit_rev ?excl ~f l] returns [Some(x,y)] where [x] is the **reversed** list of the consecutive elements of [l] that obey the split function [f]. Note that [f] is applied to a list of elements and not just an element, so that [f] can look farther in the list when applied. [f l] returns [Continue] if there're more elements to consume, [Continue_with(left,right)] if there's more elements to consume but we want to choose what goes to the left part and what remains to process (right part), and returns [Split(left,right)] if the splitting is decided. When [f] is applied to an empty list, if it returns [Continue] then the result will be [None]. If [excl] is given, then [excl] is applied before [f] is, to check if the splitting should be stopped right away. When the split fails, it returns [None]. *) val fsplit : ?excl:('a list -> bool) -> f:'a split -> 'a list -> ('a list * 'a list) option (** [fsplit ?excl ~f l] returns [Some(List.rev x, y)] if [fsplit ?excl ~f l] returns [Some(x,y)], else it returns [None]. *) val id_of_string : < mangle : string -> string; .. > -> string -> string (** [id_of_string ids id] returns a mangled version of [id], using the method [ids#mangle]. If you don't need mangling, you may use [object method mangle x = x end] for [ids]. However, the name [ids] also means that your object should have knowledge of all IDs it has issued, in order to avoid collision. This is why [id_of_string] asks for an object rather than "just a function". *) val htmlentities : ?md:bool -> string -> string (** [htmlentities s] returns a new string in which html-significant characters have been converted to html entities. For instance, "" is converted to "<Foo&Bar>". *) val minimalize_blanks : string -> string (** [minimalize_blanks s] returns a copy of [s] in which the first and last characters are never blank, and two consecutive blanks never happen. *) val eat : ('a -> bool) -> 'a list -> 'a list (** [eat f l] returns [l] where elements satisfying [f] have been removed, but it stops removing as soon as one element doesn't satisfy [f]. *) val extract_html_attributes : string -> (string * string) list (** Takes some HTML and returns the list of attributes of the first HTML tag. Notes: * Doesn't check the validity of HTML tags or attributes. * Doesn't support backslash escaping. * Attribute names are delimited by the space and equal characters. * Attribute values are either delimited by the double quote or the simple quote character. *) val extract_inner_html : string -> string (** Takes an HTML node and returns the contents of the node. If it's not given a node, it returns something rubbish. *) val html_void_elements : StringSet.t (** HTML void elements *) val ( @ ) : 'a list -> 'a list -> 'a list (** Tail-recursive version of [Pervasives.(@)]. *) omd-1.3.2/src/omd_xtxt.ml000066400000000000000000000014101425763206400152700ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) (* xtxt = eXTernal eXTension *) (* let extensions = ref [] *) (* let get () = *) (* !extensions *) (* let register e = *) (* extensions := e :: !extensions *) (* let set es = extensions := es *) (* let activate ... *) (* (\* let deactivate ... *\) *) (* priority (integer?) *) (* pre-extension *) (* post-extension *) omd-1.3.2/src/omd_xtxt.mli000066400000000000000000000007371425763206400154540ustar00rootroot00000000000000(***********************************************************************) (* omd: Markdown frontend in OCaml *) (* (c) 2013 by Philippe Wang *) (* Licence : ISC *) (* http://www.isc.org/downloads/software-support-policy/isc-license/ *) (***********************************************************************) (** xtxt = eXTernal eXTension *) omd-1.3.2/tests/000077500000000000000000000000001425763206400134475ustar00rootroot00000000000000John_MacFarlane_said_peg-markdown_takes_forever_to_process_this--jgm_peg-markdown_issues_28.md000066400000000000000000000002651425763206400357540ustar00rootroot00000000000000omd-1.3.2/tests***************************************[[[[[[[[[[[[[[[[[[[[-----------------]]]]]]]]]]]]]]]]]*************************** [[[[[[[[[[[[[[[[[[[[-----------------]]]]]]]]]]]]]]]]]: :: omd-1.3.2/tests/cow/000077500000000000000000000000001425763206400142375ustar00rootroot00000000000000omd-1.3.2/tests/cow/anchors-by-reference.html000066400000000000000000000006341425763206400211310ustar00rootroot00000000000000

This is an example reference-style link. This is another reference-style link. This is a third reference-style link. This is a fourth reference-style link.

omd-1.3.2/tests/cow/anchors-by-reference.md000066400000000000000000000006111425763206400205600ustar00rootroot00000000000000 This is [an example][id] reference-style link. This is [another] [foo] reference-style link. This is [a third][bar] reference-style link. This is [a fourth][4] reference-style link. [id]: http://example.com/ "Optional Title Here" [foo]: http://example.com/ (Optional Title Here) [bar]: http://example.com/ (Optional Title Here) [4]: "Optional Title Here"omd-1.3.2/tests/cow/automatic-anchors.html000066400000000000000000000000751425763206400205500ustar00rootroot00000000000000

http://example.com/

omd-1.3.2/tests/cow/automatic-anchors.md000066400000000000000000000000261425763206400202000ustar00rootroot00000000000000 omd-1.3.2/tests/cow/blockquote-nested-markdown.html000066400000000000000000000004471425763206400224020ustar00rootroot00000000000000

This is a header.

  1. This is the first list item.
  2. This is the second list item.

Here's some example code:

return shell_exec("echo $input | $markdown_script");
omd-1.3.2/tests/cow/blockquote-nested-markdown.md000066400000000000000000000002741425763206400220340ustar00rootroot00000000000000> ## This is a header. > > 1. This is the first list item. > 2. This is the second list item. > > Here's some example code: > > return shell_exec("echo $input | $markdown_script");omd-1.3.2/tests/cow/blockquote-starting-with-empty-lines.html000066400000000000000000000000451425763206400243420ustar00rootroot00000000000000

A B

omd-1.3.2/tests/cow/blockquote-starting-with-empty-lines.md000066400000000000000000000000161425763206400237740ustar00rootroot00000000000000> > A > B > omd-1.3.2/tests/cow/blockquote.html000066400000000000000000000001441425763206400172740ustar00rootroot00000000000000

This is a multi line blockquote test

With more than one line.

omd-1.3.2/tests/cow/blockquote.md000066400000000000000000000001141425763206400167250ustar00rootroot00000000000000 > This is a multi line blockquote test > > With more than one line.omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween-2.html000066400000000000000000000001121425763206400250040ustar00rootroot00000000000000

A B

C D

omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween-2.md000066400000000000000000000000371425763206400244460ustar00rootroot00000000000000> > A > B > > > C > D > omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween-3.html000066400000000000000000000001121425763206400250050ustar00rootroot00000000000000

A B

C D

omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween-3.md000066400000000000000000000000401425763206400244410ustar00rootroot00000000000000> > A > B > > > C > D > omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween-4.html000066400000000000000000000001121425763206400250060ustar00rootroot00000000000000

A B

C D

omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween-4.md000066400000000000000000000000411425763206400244430ustar00rootroot00000000000000> > A > B > > > C > D > omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween.html000066400000000000000000000000601425763206400246470ustar00rootroot00000000000000

A B

C D

omd-1.3.2/tests/cow/blockquotes-with-empty-lines-inbetween.md000066400000000000000000000000361425763206400243060ustar00rootroot00000000000000> > A > B > > > C > D > omd-1.3.2/tests/cow/code-block-html-escape.html000066400000000000000000000001171425763206400213260ustar00rootroot00000000000000

This is some HTML:

<h1>Heading</h1>
omd-1.3.2/tests/cow/code-block-html-escape.md000066400000000000000000000000511425763206400207570ustar00rootroot00000000000000 This is some HTML:

Heading

omd-1.3.2/tests/cow/code-block-vs-setext-titles.html000066400000000000000000000007561425763206400224010ustar00rootroot00000000000000
===
= = =
 = = =
  =  = =
---
- - -
 - - -
  -  - -

-bash --- -

-bash - - - -

-bash - - - -

-bash - - - -

omd-1.3.2/tests/cow/code-block.html000066400000000000000000000001211425763206400171210ustar00rootroot00000000000000

This is a normal paragraph:

This is a code block.
omd-1.3.2/tests/cow/code-block.md000066400000000000000000000000671425763206400165660ustar00rootroot00000000000000 This is a normal paragraph: This is a code block.omd-1.3.2/tests/cow/dfb-3def.html000066400000000000000000000000011425763206400164660ustar00rootroot00000000000000 omd-1.3.2/tests/cow/dfb-3def.md000066400000000000000000000002261425763206400161330ustar00rootroot00000000000000[foo]: http://example.com/ "Optional Title Here" [foo]: http://example.com/ 'Optional Title Here' [foo]: http://example.com/ (Optional Title Here) omd-1.3.2/tests/cow/dfb-4lt5.html000066400000000000000000000000171425763206400164440ustar00rootroot00000000000000

4 < 5

omd-1.3.2/tests/cow/dfb-4lt5.md000066400000000000000000000000061425763206400160760ustar00rootroot000000000000004 < 5 omd-1.3.2/tests/cow/dfb-at-and-t.html000066400000000000000000000000171425763206400172610ustar00rootroot00000000000000

AT&T

omd-1.3.2/tests/cow/dfb-at-and-t.md000066400000000000000000000000051425763206400167120ustar00rootroot00000000000000AT&T omd-1.3.2/tests/cow/dfb-autoemail.html000066400000000000000000000001031425763206400176300ustar00rootroot00000000000000

address@example.com

omd-1.3.2/tests/cow/dfb-autoemail.md000066400000000000000000000000271425763206400172710ustar00rootroot00000000000000 omd-1.3.2/tests/cow/dfb-autoescape.html000066400000000000000000000000761425763206400200120ustar00rootroot00000000000000

http://images.google.com/images?num=30&q=larry+bird

omd-1.3.2/tests/cow/dfb-autoescape.md000066400000000000000000000000641425763206400174430ustar00rootroot00000000000000http://images.google.com/images?num=30&q=larry+bird omd-1.3.2/tests/cow/dfb-autolink.html000066400000000000000000000000741425763206400175050ustar00rootroot00000000000000

http://example.com/

omd-1.3.2/tests/cow/dfb-autolink.md000066400000000000000000000000261425763206400171360ustar00rootroot00000000000000 omd-1.3.2/tests/cow/dfb-bq-2p-lazy.html000066400000000000000000000005521425763206400175560ustar00rootroot00000000000000

This is a blockquote with two paragraphs. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus. Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus.

Donec sit amet nisl. Aliquam semper ipsum sit amet velit. Suspendisse id sem consectetuer libero luctus adipiscing.

omd-1.3.2/tests/cow/dfb-bq-2p-lazy.md000066400000000000000000000005111425763206400172050ustar00rootroot00000000000000> This is a blockquote with two paragraphs. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus. Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus. > Donec sit amet nisl. Aliquam semper ipsum sit amet velit. Suspendisse id sem consectetuer libero luctus adipiscing. omd-1.3.2/tests/cow/dfb-bq-2p.html000066400000000000000000000005521425763206400166010ustar00rootroot00000000000000

This is a blockquote with two paragraphs. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus. Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus.

Donec sit amet nisl. Aliquam semper ipsum sit amet velit. Suspendisse id sem consectetuer libero luctus adipiscing.

omd-1.3.2/tests/cow/dfb-bq-2p.md000066400000000000000000000005211425763206400162310ustar00rootroot00000000000000> This is a blockquote with two paragraphs. Lorem ipsum dolor sit amet, > consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus. > Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus. > > Donec sit amet nisl. Aliquam semper ipsum sit amet velit. Suspendisse > id sem consectetuer libero luctus adipiscing. omd-1.3.2/tests/cow/dfb-bq-nested.html000066400000000000000000000002371425763206400175420ustar00rootroot00000000000000

This is the first level of quoting.

This is nested blockquote.

Back to the first level.

omd-1.3.2/tests/cow/dfb-bq-nested.md000066400000000000000000000001441425763206400171730ustar00rootroot00000000000000> This is the first level of quoting. > > > This is nested blockquote. > > Back to the first level. omd-1.3.2/tests/cow/dfb-cb-ampersand.html000066400000000000000000000001571425763206400202150ustar00rootroot00000000000000
<div class="footer">
    &copy; 2004 Foo Corporation
</div>
omd-1.3.2/tests/cow/dfb-cb-ampersand.md000066400000000000000000000001101425763206400176360ustar00rootroot00000000000000 omd-1.3.2/tests/cow/dfb-code1.html000066400000000000000000000000561425763206400166520ustar00rootroot00000000000000

Use the printf() function.

omd-1.3.2/tests/cow/dfb-code1.md000066400000000000000000000000351425763206400163030ustar00rootroot00000000000000Use the `printf()` function. omd-1.3.2/tests/cow/dfb-code2.html000066400000000000000000000000711425763206400166500ustar00rootroot00000000000000

There is a literal backtick (`) here.

omd-1.3.2/tests/cow/dfb-code2.md000066400000000000000000000000521425763206400163030ustar00rootroot00000000000000``There is a literal backtick (`) here.`` omd-1.3.2/tests/cow/dfb-code3.html000066400000000000000000000001741425763206400166550ustar00rootroot00000000000000

A single backtick in a code span: `

A backtick-delimited string in a code span: `foo`

omd-1.3.2/tests/cow/dfb-code3.md000066400000000000000000000001431425763206400163050ustar00rootroot00000000000000A single backtick in a code span: `` ` `` A backtick-delimited string in a code span: `` `foo` `` omd-1.3.2/tests/cow/dfb-code4.html000066400000000000000000000001011425763206400166440ustar00rootroot00000000000000

Please don't use any <blink> tags.

omd-1.3.2/tests/cow/dfb-code4.md000066400000000000000000000000451425763206400163070ustar00rootroot00000000000000Please don't use any `` tags. omd-1.3.2/tests/cow/dfb-code5.html000066400000000000000000000001361425763206400166550ustar00rootroot00000000000000

&#8212; is the decimal-encoded equivalent of &mdash;.

omd-1.3.2/tests/cow/dfb-code5.md000066400000000000000000000000721425763206400163100ustar00rootroot00000000000000`—` is the decimal-encoded equivalent of `—`. omd-1.3.2/tests/cow/dfb-codeblock-applescript.html000066400000000000000000000001651425763206400221310ustar00rootroot00000000000000

Here is an example of AppleScript:

tell application "Foo"
    beep
end tell
omd-1.3.2/tests/cow/dfb-codeblock-applescript.md000066400000000000000000000001311425763206400215560ustar00rootroot00000000000000Here is an example of AppleScript: tell application "Foo" beep end tell omd-1.3.2/tests/cow/dfb-copy.html000066400000000000000000000000151425763206400166240ustar00rootroot00000000000000

©

omd-1.3.2/tests/cow/dfb-copy.md000066400000000000000000000000071425763206400162610ustar00rootroot00000000000000© omd-1.3.2/tests/cow/dfb-dfb.html000066400000000000000000000001331425763206400164060ustar00rootroot00000000000000

Visit Daring Fireball for more information.

omd-1.3.2/tests/cow/dfb-dfb.md000066400000000000000000000001361425763206400160450ustar00rootroot00000000000000Visit [Daring Fireball][] for more information. [Daring Fireball]: http://daringfireball.net/ omd-1.3.2/tests/cow/dfb-direct-links.html000066400000000000000000000003521425763206400202460ustar00rootroot00000000000000

I get 10 times more traffic from Google than from Yahoo or MSN.

omd-1.3.2/tests/cow/dfb-direct-links.md000066400000000000000000000002611425763206400177010ustar00rootroot00000000000000I get 10 times more traffic from [Google](http://google.com/ "Google") than from [Yahoo](http://search.yahoo.com/ "Yahoo Search") or [MSN](http://search.msn.com/ "MSN Search"). omd-1.3.2/tests/cow/dfb-emph.html000066400000000000000000000002251425763206400166060ustar00rootroot00000000000000

single asterisks

single underscores

double asterisks

double underscores

omd-1.3.2/tests/cow/dfb-emph.md000066400000000000000000000001311425763206400162360ustar00rootroot00000000000000 *single asterisks* _single underscores_ **double asterisks** __double underscores__ omd-1.3.2/tests/cow/dfb-google.html000066400000000000000000000000561425763206400171330ustar00rootroot00000000000000

Google

omd-1.3.2/tests/cow/dfb-google.md000066400000000000000000000000501425763206400165610ustar00rootroot00000000000000[Google][] [Google]: http://google.com/ omd-1.3.2/tests/cow/dfb-h1-h2.html000066400000000000000000000001151425763206400164720ustar00rootroot00000000000000

This is an H1

This is an H2

omd-1.3.2/tests/cow/dfb-h1-h2.md000066400000000000000000000000711425763206400161270ustar00rootroot00000000000000This is an H1 ============= This is an H2 ------------- omd-1.3.2/tests/cow/dfb-h123.html000066400000000000000000000001721425763206400163330ustar00rootroot00000000000000

This is an H1

This is an H2

This is an H3

omd-1.3.2/tests/cow/dfb-h123.md000066400000000000000000000001011425763206400157570ustar00rootroot00000000000000# This is an H1 # ## This is an H2 ## ### This is an H3 ###### omd-1.3.2/tests/cow/dfb-h126.html000066400000000000000000000001671425763206400163420ustar00rootroot00000000000000

This is an H1

This is an H2

This is an H6
omd-1.3.2/tests/cow/dfb-h126.md000066400000000000000000000000701425763206400157670ustar00rootroot00000000000000# This is an H1 ## This is an H2 ###### This is an H6 omd-1.3.2/tests/cow/dfb-hr.html000066400000000000000000000000441425763206400162650ustar00rootroot00000000000000






omd-1.3.2/tests/cow/dfb-hr.md000066400000000000000000000001021425763206400157140ustar00rootroot00000000000000* * * *** ***** - - - --------------------------------------- omd-1.3.2/tests/cow/dfb-images.html000066400000000000000000000002001425763206400171130ustar00rootroot00000000000000

Alt text

Alt text

omd-1.3.2/tests/cow/dfb-images.md000066400000000000000000000001161425763206400165550ustar00rootroot00000000000000![Alt text](/path/to/img.jpg) ![Alt text](/path/to/img.jpg "Optional title") omd-1.3.2/tests/cow/dfb-links-rp.html000066400000000000000000000000741425763206400174160ustar00rootroot00000000000000

See my About page for details.

omd-1.3.2/tests/cow/dfb-links-rp.md000066400000000000000000000000551425763206400170510ustar00rootroot00000000000000See my [About](/about/) page for details. omd-1.3.2/tests/cow/dfb-links.html000066400000000000000000000002401425763206400167720ustar00rootroot00000000000000

This is an example inline link.

This link has no title attribute.

omd-1.3.2/tests/cow/dfb-links.md000066400000000000000000000001711425763206400164310ustar00rootroot00000000000000This is [an example](http://example.com/ "Title") inline link. [This link](http://example.net/) has no title attribute. omd-1.3.2/tests/cow/dfb-list-with-2p-lazy.html000066400000000000000000000004341425763206400210770ustar00rootroot00000000000000
  • This is a list item with two paragraphs.

    This is the second paragraph in the list item. You're only required to indent the first line. Lorem ipsum dolor sit amet, consectetuer adipiscing elit.

  • Another item in the same list.

omd-1.3.2/tests/cow/dfb-list-with-2p-lazy.md000066400000000000000000000003561425763206400205360ustar00rootroot00000000000000* This is a list item with two paragraphs. This is the second paragraph in the list item. You're only required to indent the first line. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. * Another item in the same list. omd-1.3.2/tests/cow/dfb-list-with-2p.html000066400000000000000000000005751425763206400201300ustar00rootroot00000000000000
  1. This is a list item with two paragraphs. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus.

    Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus. Donec sit amet nisl. Aliquam semper ipsum sit amet velit.

  2. Suspendisse id sem consectetuer libero luctus adipiscing.

omd-1.3.2/tests/cow/dfb-list-with-2p.md000066400000000000000000000005411425763206400175550ustar00rootroot000000000000001. This is a list item with two paragraphs. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus. Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus. Donec sit amet nisl. Aliquam semper ipsum sit amet velit. 2. Suspendisse id sem consectetuer libero luctus adipiscing. omd-1.3.2/tests/cow/dfb-list-with-bq.html000066400000000000000000000002101425763206400201730ustar00rootroot00000000000000
  • A list item with a blockquote:

    This is a blockquote inside a list item.

omd-1.3.2/tests/cow/dfb-list-with-bq.md000066400000000000000000000001311425763206400176310ustar00rootroot00000000000000* A list item with a blockquote: > This is a blockquote > inside a list item. omd-1.3.2/tests/cow/dfb-list-with-cb.html000066400000000000000000000001551425763206400201650ustar00rootroot00000000000000
  • A list item with a code block:

    <code goes here>
    
omd-1.3.2/tests/cow/dfb-list-with-cb.md000066400000000000000000000000751425763206400176220ustar00rootroot00000000000000* A list item with a code block: omd-1.3.2/tests/cow/dfb-list9.html000066400000000000000000000000751425763206400167240ustar00rootroot00000000000000
  • Bird

  • Magic

omd-1.3.2/tests/cow/dfb-list9.md000066400000000000000000000000241425763206400163520ustar00rootroot00000000000000* Bird * Magic omd-1.3.2/tests/cow/dfb-litteral-asterisks.html000066400000000000000000000000651425763206400215050ustar00rootroot00000000000000

*this text is surrounded by literal asterisks*

omd-1.3.2/tests/cow/dfb-litteral-asterisks.md000066400000000000000000000000611425763206400211350ustar00rootroot00000000000000\*this text is surrounded by literal asterisks\* omd-1.3.2/tests/cow/dfb-np-cb.html000066400000000000000000000001211425763206400166470ustar00rootroot00000000000000

This is a normal paragraph:

This is a code block.
omd-1.3.2/tests/cow/dfb-np-cb.md000066400000000000000000000000671425763206400163140ustar00rootroot00000000000000This is a normal paragraph: This is a code block. omd-1.3.2/tests/cow/dfb-ref-image.html000066400000000000000000000001221425763206400175050ustar00rootroot00000000000000

Alt text

omd-1.3.2/tests/cow/dfb-ref-image.md000066400000000000000000000000771425763206400171520ustar00rootroot00000000000000![Alt text][id] [id]: url/to/image "Optional title attribute" omd-1.3.2/tests/cow/dfb-ref-style-1.html000066400000000000000000000001551425763206400177270ustar00rootroot00000000000000

This is an example reference-style link.

omd-1.3.2/tests/cow/dfb-ref-style-1.md000066400000000000000000000001401425763206400173550ustar00rootroot00000000000000This is [an example][id] reference-style link. [id]: http://example.com/ "Optional Title Here" omd-1.3.2/tests/cow/dfb-ref-style-2.html000066400000000000000000000001551425763206400177300ustar00rootroot00000000000000

This is an example reference-style link.

omd-1.3.2/tests/cow/dfb-ref-style-2.md000066400000000000000000000001411425763206400173570ustar00rootroot00000000000000This is [an example] [id] reference-style link. [id]: http://example.com/ "Optional Title Here" omd-1.3.2/tests/cow/dfb-reflinks-in-action-implicit-link.html000066400000000000000000000003521425763206400241150ustar00rootroot00000000000000

I get 10 times more traffic from Google than from Yahoo or MSN.

omd-1.3.2/tests/cow/dfb-reflinks-in-action-implicit-link.md000066400000000000000000000003441425763206400235520ustar00rootroot00000000000000I get 10 times more traffic from [Google][] than from [Yahoo][] or [MSN][]. [google]: http://google.com/ "Google" [yahoo]: http://search.yahoo.com/ "Yahoo Search" [msn]: http://search.msn.com/ "MSN Search" omd-1.3.2/tests/cow/dfb-reflinks-in-action.html000066400000000000000000000003521425763206400213520ustar00rootroot00000000000000

I get 10 times more traffic from Google than from Yahoo or MSN.

omd-1.3.2/tests/cow/dfb-reflinks-in-action.md000066400000000000000000000003331425763206400210050ustar00rootroot00000000000000I get 10 times more traffic from [Google] [1] than from [Yahoo] [2] or [MSN] [3]. [1]: http://google.com/ "Google" [2]: http://search.yahoo.com/ "Yahoo Search" [3]: http://search.msn.com/ "MSN Search" omd-1.3.2/tests/cow/dfb-reg-p-with-table.html000066400000000000000000000002071425763206400207250ustar00rootroot00000000000000

This is a regular paragraph.

Foo

This is another regular paragraph.

omd-1.3.2/tests/cow/dfb-reg-p-with-table.md000066400000000000000000000001731425763206400203630ustar00rootroot00000000000000This is a regular paragraph.
Foo
This is another regular paragraph. omd-1.3.2/tests/cow/dfb-star-in-words.html000066400000000000000000000000441425763206400203650ustar00rootroot00000000000000

unfriggingbelievable

omd-1.3.2/tests/cow/dfb-star-in-words.md000066400000000000000000000000271425763206400200220ustar00rootroot00000000000000un*frigging*believable omd-1.3.2/tests/cow/dfb-what-a-great-season.html000066400000000000000000000000411425763206400214200ustar00rootroot00000000000000

1986. What a great season.

omd-1.3.2/tests/cow/dfb-what-a-great-season.md000066400000000000000000000000341425763206400210560ustar00rootroot000000000000001986\. What a great season. omd-1.3.2/tests/cow/doubline-list.html000066400000000000000000000000751425763206400177010ustar00rootroot00000000000000
  • Bird

  • Magic

omd-1.3.2/tests/cow/doubline-list.md000066400000000000000000000000241425763206400173270ustar00rootroot00000000000000 * Bird * Magicomd-1.3.2/tests/cow/dune000066400000000000000000000000721425763206400151140ustar00rootroot00000000000000(executable (name test) (modules test) (libraries omd))omd-1.3.2/tests/cow/emphasis.html000066400000000000000000000001621425763206400167350ustar00rootroot00000000000000

important

important

this midimportantsentence

*not important*

omd-1.3.2/tests/cow/emphasis.md000066400000000000000000000001111425763206400163630ustar00rootroot00000000000000 *important* _important_ this mid*important*sentence \*not important\*omd-1.3.2/tests/cow/escaped-number-period.html000066400000000000000000000000611425763206400212740ustar00rootroot00000000000000

It happened in 1986. What a great season.

omd-1.3.2/tests/cow/escaped-number-period.md000066400000000000000000000000531425763206400207310ustar00rootroot00000000000000It happened in 1986\. What a great season.omd-1.3.2/tests/cow/escaping.html000066400000000000000000000002541425763206400167170ustar00rootroot00000000000000

These should all be escaped:

\

`

*

_

{

}

[

]

(

)

#

+

-

.

!

omd-1.3.2/tests/cow/escaping.md000066400000000000000000000001311425763206400163450ustar00rootroot00000000000000 These should all be escaped: \\ \` \* \_ \{ \} \[ \] \( \) \# \+ \- \. \!omd-1.3.2/tests/cow/github-style-at-start.html000066400000000000000000000001261425763206400213010ustar00rootroot00000000000000

function MyFunc(a) {
    // ...
}

That is some code!

omd-1.3.2/tests/cow/github-style-at-start.md000066400000000000000000000000751425763206400207400ustar00rootroot00000000000000``` function MyFunc(a) { // ... } ``` That is some code!omd-1.3.2/tests/cow/github-style-codeblock.html000066400000000000000000000003241425763206400214670ustar00rootroot00000000000000

Define a function in javascript:

function MyFunc(a) {
    var s = '`';
}

And some HTML

<div>HTML!</div>
omd-1.3.2/tests/cow/github-style-codeblock.md000066400000000000000000000001771425763206400211310ustar00rootroot00000000000000 Define a function in javascript: ``` function MyFunc(a) { var s = '`'; } ``` And some HTML ```html
HTML!
```omd-1.3.2/tests/cow/github-style-linebreaks.html000066400000000000000000000001151425763206400216570ustar00rootroot00000000000000

code can go here
this is rendered on a second line
omd-1.3.2/tests/cow/github-style-linebreaks.md000066400000000000000000000000721425763206400213150ustar00rootroot00000000000000``` code can go here this is rendered on a second line ```omd-1.3.2/tests/cow/h1-with-double-hash.html000066400000000000000000000000501425763206400205720ustar00rootroot00000000000000

This is an H1

omd-1.3.2/tests/cow/h1-with-double-hash.md000066400000000000000000000000211425763206400202240ustar00rootroot00000000000000# This is an H1 #omd-1.3.2/tests/cow/h1-with-equals.html000066400000000000000000000000461425763206400176760ustar00rootroot00000000000000

This is an H1

omd-1.3.2/tests/cow/h1-with-equals.md000066400000000000000000000000331425763206400173260ustar00rootroot00000000000000This is an H1 =============omd-1.3.2/tests/cow/h1-with-single-hash.html000066400000000000000000000000471425763206400206070ustar00rootroot00000000000000

This is an H1

omd-1.3.2/tests/cow/h1-with-single-hash.md000066400000000000000000000000171425763206400202400ustar00rootroot00000000000000# This is an H1omd-1.3.2/tests/cow/h2-with-dashes.html000066400000000000000000000000461425763206400176540ustar00rootroot00000000000000

This is an H2

omd-1.3.2/tests/cow/h2-with-dashes.md000066400000000000000000000000331425763206400173040ustar00rootroot00000000000000This is an H2 -------------omd-1.3.2/tests/cow/h2-with-double-hash.html000066400000000000000000000000501425763206400205730ustar00rootroot00000000000000

This is an H2

omd-1.3.2/tests/cow/h2-with-double-hash.md000066400000000000000000000000231425763206400202270ustar00rootroot00000000000000## This is an H2 ##omd-1.3.2/tests/cow/h2-with-single-hash.html000066400000000000000000000000471425763206400206100ustar00rootroot00000000000000

This is an H2

omd-1.3.2/tests/cow/h2-with-single-hash.md000066400000000000000000000000201425763206400202330ustar00rootroot00000000000000## This is an H2omd-1.3.2/tests/cow/h3-with-double-hash.html000066400000000000000000000000501425763206400205740ustar00rootroot00000000000000

This is an H3

omd-1.3.2/tests/cow/h3-with-double-hash.md000066400000000000000000000000251425763206400202320ustar00rootroot00000000000000### This is an H3 ###omd-1.3.2/tests/cow/h3-with-single-hash.html000066400000000000000000000000471425763206400206110ustar00rootroot00000000000000

This is an H3

omd-1.3.2/tests/cow/h3-with-single-hash.md000066400000000000000000000000211425763206400202350ustar00rootroot00000000000000### This is an H3omd-1.3.2/tests/cow/h4-with-single-hash.html000066400000000000000000000000471425763206400206120ustar00rootroot00000000000000

This is an H4

omd-1.3.2/tests/cow/h4-with-single-hash.md000066400000000000000000000000221425763206400202370ustar00rootroot00000000000000#### This is an H4omd-1.3.2/tests/cow/h5-with-single-hash.html000066400000000000000000000000471425763206400206130ustar00rootroot00000000000000
This is an H5
omd-1.3.2/tests/cow/h5-with-single-hash.md000066400000000000000000000000231425763206400202410ustar00rootroot00000000000000##### This is an H5omd-1.3.2/tests/cow/h6-with-single-hash.html000066400000000000000000000000471425763206400206140ustar00rootroot00000000000000
This is an H6
omd-1.3.2/tests/cow/h6-with-single-hash.md000066400000000000000000000000241425763206400202430ustar00rootroot00000000000000###### This is an H6omd-1.3.2/tests/cow/hashes-in-atx-titles.html000066400000000000000000000006001425763206400210740ustar00rootroot00000000000000

foo # bar

boo

x #y

a # F

b #

c #

d # #

e # #

foo ## bar

boo

x ##y

a ## F

b ##

c #

d ## #

e ## #

omd-1.3.2/tests/cow/hashes-in-atx-titles.md000066400000000000000000000002641425763206400205360ustar00rootroot00000000000000# foo # bar # boo # # x #y # a # F # # b \# # # c \# # d \# \# # e # \# ## foo ## bar ## boo ## ## x ##y ## a ## F ## ## b \## ## ## c \## ## d \## \## ## e ## \## omd-1.3.2/tests/cow/horizontal-rules.html000066400000000000000000000000451425763206400204450ustar00rootroot00000000000000






omd-1.3.2/tests/cow/horizontal-rules.md000066400000000000000000000001031425763206400200740ustar00rootroot00000000000000 * * * *** ***** - - - --------------------------------------- omd-1.3.2/tests/cow/html5-strutural-tags.html000066400000000000000000000004071425763206400211560ustar00rootroot00000000000000

These HTML5 tags should pass through just fine.

hello
head
footsies
read me
read me

the end

omd-1.3.2/tests/cow/html5-strutural-tags.md000066400000000000000000000004031425763206400206060ustar00rootroot00000000000000 These HTML5 tags should pass through just fine.
hello
head
footsies
read me
read me
the endomd-1.3.2/tests/cow/html_tags.html000066400000000000000000000002031425763206400171020ustar00rootroot00000000000000

<HelloWorld>

omd-1.3.2/tests/cow/html_tags.md000066400000000000000000000001321425763206400165370ustar00rootroot00000000000000<**Hello*****World***> omd-1.3.2/tests/cow/images.html000066400000000000000000000003231425763206400163700ustar00rootroot00000000000000

Alt text

Alt text

Alt text

omd-1.3.2/tests/cow/images.md000066400000000000000000000002211425763206400160210ustar00rootroot00000000000000 ![Alt text](/path/to/img.jpg) ![Alt text](/path/to/img.jpg "Optional title") ![Alt text][id] [id]: url/to/image "Optional title attribute"omd-1.3.2/tests/cow/implicit-anchors.html000066400000000000000000000001761425763206400203760ustar00rootroot00000000000000

Search the web at Google or Daring Fireball.

omd-1.3.2/tests/cow/implicit-anchors.md000066400000000000000000000002051425763206400200230ustar00rootroot00000000000000 Search the web at [Google][] or [Daring Fireball][]. [Google]: http://google.com/ [Daring Fireball]: http://daringfireball.net/omd-1.3.2/tests/cow/inline-anchors.html000066400000000000000000000002411425763206400200330ustar00rootroot00000000000000

This is an example inline link.

This link has no title attribute.

omd-1.3.2/tests/cow/inline-anchors.md000066400000000000000000000001711425763206400174710ustar00rootroot00000000000000 This is [an example](http://example.com/ "Title") inline link. [This link](http://example.net/) has no title attribute.omd-1.3.2/tests/cow/inline-code.html000066400000000000000000000006311425763206400173130ustar00rootroot00000000000000

Create a new function.

Use the backtick in MySQL syntax SELECT `column` FROM whatever.

A single backtick in a code span: `

A backtick-delimited string in a code span: `foo`

Please don't use any <blink> tags.

&#8212; is the decimal-encoded equivalent of &mdash;.

omd-1.3.2/tests/cow/inline-code.md000066400000000000000000000004431425763206400167500ustar00rootroot00000000000000 Create a new `function`. Use the backtick in MySQL syntax ``SELECT `column` FROM whatever``. A single backtick in a code span: `` ` `` A backtick-delimited string in a code span: `` `foo` `` Please don't use any `` tags. `—` is the decimal-encoded equivalent of `—`.omd-1.3.2/tests/cow/inline-escaped-chars.html000066400000000000000000000000701425763206400211000ustar00rootroot00000000000000

Hello.this_is_a_variable and.this.is.another_one

omd-1.3.2/tests/cow/inline-escaped-chars.md000066400000000000000000000000641425763206400205370ustar00rootroot00000000000000 Hello.this\_is\_a\_variable and.this.is.another_oneomd-1.3.2/tests/cow/inline-style-tag.html000066400000000000000000000001121425763206400203040ustar00rootroot00000000000000

An exciting sentence.

omd-1.3.2/tests/cow/inline-style-tag.md000066400000000000000000000001051425763206400177420ustar00rootroot00000000000000 An exciting sentence.omd-1.3.2/tests/cow/lazy-blockquote.html000066400000000000000000000001461425763206400202530ustar00rootroot00000000000000

This is a multi line blockquote test

With more than one line.

omd-1.3.2/tests/cow/lazy-blockquote.md000066400000000000000000000001071425763206400177040ustar00rootroot00000000000000 > This is a multi line blockquote test > With more than one line.omd-1.3.2/tests/cow/link-with-code.html000066400000000000000000000000701425763206400177400ustar00rootroot00000000000000

fo]```o

omd-1.3.2/tests/cow/link-with-code.md000066400000000000000000000000351425763206400173750ustar00rootroot00000000000000[``fo]```o``](http://plopxx) omd-1.3.2/tests/cow/list-with-blockquote.html000066400000000000000000000002101425763206400212100ustar00rootroot00000000000000
  • A list item with a blockquote:

    This is a blockquote inside a list item.

omd-1.3.2/tests/cow/list-with-blockquote.md000066400000000000000000000001301425763206400206450ustar00rootroot00000000000000* A list item with a blockquote: > This is a blockquote > inside a list item.omd-1.3.2/tests/cow/list-with-code-header.html000066400000000000000000000001641425763206400212100ustar00rootroot00000000000000
  • A

  • B

    code

header

omd-1.3.2/tests/cow/list-with-code-header.md000066400000000000000000000000541425763206400206420ustar00rootroot00000000000000- A - B ``` code ``` # header omd-1.3.2/tests/cow/list-with-code.html000066400000000000000000000001541425763206400177610ustar00rootroot00000000000000
  • A list item with code:

    alert('Hello world!');
omd-1.3.2/tests/cow/list-with-code.md000066400000000000000000000000721425763206400174140ustar00rootroot00000000000000* A list item with code: alert('Hello world!');omd-1.3.2/tests/cow/lists-with-blank-lines.html000066400000000000000000000005271425763206400214350ustar00rootroot00000000000000
  • 1

  • 2

  • 3

4

  • 5

  • 6

  • 7

8

  • 9

  • 10

  • 11

12

  • 13

  • 14

  • 15

16

omd-1.3.2/tests/cow/lists-with-blank-lines.md000066400000000000000000000001671425763206400210710ustar00rootroot00000000000000* 1 * 2 * 3 4 * 5 * 6 * 7 8 * 9 * 10 * 11 12 * 13 * 14 * 15 16 omd-1.3.2/tests/cow/multi-paragraph-list.html000066400000000000000000000002131425763206400211670ustar00rootroot00000000000000
  1. This is a major bullet point.

    That contains multiple paragraphs.

  2. And another line

omd-1.3.2/tests/cow/multi-paragraph-list.md000066400000000000000000000001431425763206400206250ustar00rootroot00000000000000 1. This is a major bullet point. That contains multiple paragraphs. 2. And another lineomd-1.3.2/tests/cow/multiline-unordered-list.html000066400000000000000000000001541425763206400220650ustar00rootroot00000000000000
  • This line spans more than one line and is lazy
  • Similar to this line
omd-1.3.2/tests/cow/multiline-unordered-list.md000066400000000000000000000001131425763206400215140ustar00rootroot00000000000000 - This line spans more than one line and is lazy - Similar to this lineomd-1.3.2/tests/cow/nested-blockquote.html000066400000000000000000000002221425763206400205510ustar00rootroot00000000000000

This is a multi line blockquote test

And nesting!

With more than one line.

omd-1.3.2/tests/cow/nested-blockquote.md000066400000000000000000000001411425763206400202050ustar00rootroot00000000000000 > This is a multi line blockquote test > > > And nesting! > > With more than one line.omd-1.3.2/tests/cow/non-html-tags-a.html000066400000000000000000000000661425763206400200350ustar00rootroot00000000000000
  • 1
    • 2
omd-1.3.2/tests/cow/non-html-tags-a.md000066400000000000000000000000121425763206400174600ustar00rootroot00000000000000* 1 + 2 omd-1.3.2/tests/cow/non-html-tags-simple.html000066400000000000000000000001661425763206400211070ustar00rootroot00000000000000

<klm></klm>

<klm>

x

</klm>

</klm>

<klm>

omd-1.3.2/tests/cow/non-html-tags-simple.md000066400000000000000000000000541425763206400205370ustar00rootroot00000000000000 x omd-1.3.2/tests/cow/non-html-tags.html000066400000000000000000000056301425763206400176210ustar00rootroot00000000000000

<plop>in a paragraph</plop>

<plop> in a paragraph </plop>

<plop> in a paragraph</plop>

<plop>in a paragraph </plop>

  • <plop>in a list item</plop>
    • <plop>in a sublist item</plop>
      • <plop>in a subsublist item</plop>
  • <plop> in a list item</plop>
    • <plop> in a sublist item</plop>
      • <plop> in a subsublist item</plop>
  • <plop>in a list item </plop>
    • <plop>in a sublist item </plop>
      • <plop>in a subsublist item </plop>
  • <plop> in a list item </plop>
    • <plop> in a sublist item </plop>
      • <plop> in a subsublist item </plop>

<z>in a paragraph</z>

<z> in a paragraph </z>

<z> in a paragraph</z>

<z>in a paragraph </z>

  • <z>in a list item</z>
    • <z>in a sublist item</z>
      • <z>in a subsublist item</z>
  • <z> in a list item</z>
    • <z> in a sublist item</z>
      • <z> in a subsublist item</z>
  • <z>in a list item </z>
    • <z>in a sublist item </z>
      • <z>in a subsublist item </z>
  • <z> in a list item </z>
    • <z> in a sublist item </z>
      • <z> in a subsublist item </z>

<0>in a paragraph</0>

<0> in a paragraph </0>

<0> in a paragraph</0>

<0>in a paragraph </0>

  • <0>in a list item</0>
    • <0>in a sublist item</0>
      • <0>in a subsublist item</0>
  • <0> in a list item</0>
    • <0> in a sublist item</0>
      • <0> in a subsublist item</0>
  • <0>in a list item </0>
    • <0>in a sublist item </0>
      • <0>in a subsublist item </0>
  • <0> in a list item </0>
    • <0> in a sublist item </0>
      • <0> in a subsublist item </0>

The end.

omd-1.3.2/tests/cow/non-html-tags.md000066400000000000000000000027621425763206400172600ustar00rootroot00000000000000in a paragraph in a paragraph in a paragraph in a paragraph * in a list item + in a sublist item + in a subsublist item * in a list item + in a sublist item + in a subsublist item * in a list item + in a sublist item + in a subsublist item * in a list item + in a sublist item + in a subsublist item in a paragraph in a paragraph in a paragraph in a paragraph * in a list item + in a sublist item + in a subsublist item * in a list item + in a sublist item + in a subsublist item * in a list item + in a sublist item + in a subsublist item * in a list item + in a sublist item + in a subsublist item <0>in a paragraph <0> in a paragraph <0> in a paragraph <0>in a paragraph * <0>in a list item + <0>in a sublist item + <0>in a subsublist item * <0> in a list item + <0> in a sublist item + <0> in a subsublist item * <0>in a list item + <0>in a sublist item + <0>in a subsublist item * <0> in a list item + <0> in a sublist item + <0> in a subsublist item The end.omd-1.3.2/tests/cow/not-a-hr.html000066400000000000000000000002051425763206400165470ustar00rootroot00000000000000

-----

- - - - -

-- -- -

- ----

  • ----

  • - ---

-----

omd-1.3.2/tests/cow/not-a-hr.md000066400000000000000000000001011425763206400161760ustar00rootroot00000000000000\----- \- - - - - \-- -- - \- ---- - \---- - \- --- ----\- omd-1.3.2/tests/cow/ordered-list-same-number.html000066400000000000000000000001041425763206400217260ustar00rootroot00000000000000
  1. Red
  2. Green
  3. Blue
omd-1.3.2/tests/cow/ordered-list-same-number.md000066400000000000000000000000361425763206400213660ustar00rootroot00000000000000 1. Red 1. Green 1. Blueomd-1.3.2/tests/cow/ordered-list-wrong-numbers.html000066400000000000000000000001041425763206400223200ustar00rootroot00000000000000
  1. Red
  2. Green
  3. Blue
omd-1.3.2/tests/cow/ordered-list-wrong-numbers.md000066400000000000000000000000361425763206400217600ustar00rootroot00000000000000 8. Red 1. Green 3. Blueomd-1.3.2/tests/cow/ordered-list.html000066400000000000000000000001041425763206400175150ustar00rootroot00000000000000
  1. Red
  2. Green
  3. Blue
omd-1.3.2/tests/cow/ordered-list.md000066400000000000000000000000361425763206400171550ustar00rootroot00000000000000 1. Red 2. Green 3. Blueomd-1.3.2/tests/cow/references__nonregular.html000066400000000000000000000000331425763206400216350ustar00rootroot00000000000000

a

omd-1.3.2/tests/cow/references__nonregular.md000066400000000000000000000000171425763206400212730ustar00rootroot00000000000000[a] [a]: foo omd-1.3.2/tests/cow/relative-anchors.html000066400000000000000000000000741425763206400203740ustar00rootroot00000000000000

See my About page for details.

omd-1.3.2/tests/cow/relative-anchors.md000066400000000000000000000000521425763206400200240ustar00rootroot00000000000000 See my [About](/about/) page for details.omd-1.3.2/tests/cow/simple-paragraph.html000066400000000000000000000000251425763206400203560ustar00rootroot00000000000000

Hello, world!

omd-1.3.2/tests/cow/simple-paragraph.md000066400000000000000000000000161425763206400200120ustar00rootroot00000000000000 Hello, world!omd-1.3.2/tests/cow/strong.html000066400000000000000000000001601425763206400164360ustar00rootroot00000000000000

important

important

really freakingstrong

omd-1.3.2/tests/cow/strong.md000066400000000000000000000000701425763206400160720ustar00rootroot00000000000000 **important** __important__ really **freaking**strongomd-1.3.2/tests/cow/test.ml000066400000000000000000000043271425763206400155560ustar00rootroot00000000000000open Printf let dir = if Array.length Sys.argv > 1 then Sys.argv.(1) else Filename.concat (Sys.getcwd ()) "tests/cow" let slurp filename = let file = open_in filename in let size = in_channel_length file in let buf = Bytes.create size in really_input file buf 0 size; Bytes.unsafe_to_string buf let remove_blank s = let b = Buffer.create (String.length s) in for i = 0 to String.length s - 1 do match s.[i] with | ' ' | '\t' | '\n' -> () | c -> Buffer.add_char b c done; Buffer.contents b let process successes failures file = let html = (Filename.chop_extension file) ^ ".html" in if not (Sys.file_exists html) then ( Printf.eprintf "File %s does not exist.\n" html; exit 2; ); let expected = slurp html in let md, observed = try let md = Omd.of_string (slurp file) in md, Omd.to_html md with e -> [], Printexc.to_string e in (* Make sure a round trip produces identical results *) let round_trip = Omd.of_string(Omd.to_markdown md) in if expected <> observed && remove_blank expected <> remove_blank observed then ( eprintf "FAILURE: %s\n" file; eprintf " expected = %S\n" (expected); eprintf " observed = %S\n" (observed); incr failures ) else if Omd_representation.( loose_compare md round_trip <> 0 && loose_compare (normalise_md md) (normalise_md round_trip) <> 0 ) then ( eprintf "FAILURE: %s\n" file; eprintf " Omd.of_string(Omd.to_markdown md) <> md\n"; eprintf "Expected =%S\n Result =%S\n" (Omd_backend.sexpr_of_md (Omd_representation.normalise_md md)) (Omd_backend.sexpr_of_md (Omd_representation.normalise_md round_trip)); incr failures ) else ( eprintf "SUCCESS: %s\n" file; incr successes ) let () = prerr_endline ("Reading directory " ^ dir); let files = Array.to_list (Sys.readdir dir) in let files = List.map (fun f -> Filename.concat dir f) files in let md_files = List.filter (fun f -> Filename.check_suffix f ".md") files in let md_files = List.sort String.compare md_files in let successes = ref 0 and failures = ref 0 in List.iter (process successes failures) md_files; eprintf "%i test passed; %i tests failed.\n" !successes !failures omd-1.3.2/tests/cow/undef_ref.html000066400000000000000000000004311425763206400170600ustar00rootroot00000000000000

[12NZJKEHUI23][]

[12NZJKEHUI23][ezorjeiozjfioz]

!12NZJKEHUI23][]

![12NZJKEHUI23][ezorjeiozjfioz]

  • [12NZJKEHUI23][]
  • [12NZJKEHUI23][ezorjeiozjfioz]
  • !12NZJKEHUI23][]
  • ![12NZJKEHUI23][ezorjeiozjfioz]
omd-1.3.2/tests/cow/undef_ref.md000066400000000000000000000003231425763206400165140ustar00rootroot00000000000000[12NZJKEHUI23][] [12NZJKEHUI23][ezorjeiozjfioz] ![12NZJKEHUI23][] ![12NZJKEHUI23][ezorjeiozjfioz] * [12NZJKEHUI23][] 42. [12NZJKEHUI23][ezorjeiozjfioz] + ![12NZJKEHUI23][] - ![12NZJKEHUI23][ezorjeiozjfioz] omd-1.3.2/tests/cow/unordered-list-asterisk.html000066400000000000000000000001041425763206400217030ustar00rootroot00000000000000
  • Red
  • Green
  • Blue
omd-1.3.2/tests/cow/unordered-list-asterisk.md000066400000000000000000000000301425763206400213350ustar00rootroot00000000000000 * Red * Green * Blueomd-1.3.2/tests/cow/unordered-list-minus.html000066400000000000000000000001041425763206400212110ustar00rootroot00000000000000
  • Red
  • Green
  • Blue
omd-1.3.2/tests/cow/unordered-list-minus.md000066400000000000000000000000301425763206400206430ustar00rootroot00000000000000 - Red - Green - Blueomd-1.3.2/tests/cow/unordered-list-plus.html000066400000000000000000000001041425763206400210410ustar00rootroot00000000000000
  • Red
  • Green
  • Blue
omd-1.3.2/tests/cow/unordered-list-plus.md000066400000000000000000000000301425763206400204730ustar00rootroot00000000000000 + Red + Green + Blueomd-1.3.2/tests/cow/url-with-parenthesis.html000066400000000000000000000002001425763206400212130ustar00rootroot00000000000000

There's an episode of Star Trek: The Next Generation

omd-1.3.2/tests/cow/url-with-parenthesis.md000066400000000000000000000001511425763206400206540ustar00rootroot00000000000000 There's an [episode](http://en.memory-alpha.org/wiki/Darmok_(episode)) of Star Trek: The Next Generationomd-1.3.2/tests/dune000066400000000000000000000000521425763206400143220ustar00rootroot00000000000000(test (name test_spec) (libraries omd)) omd-1.3.2/tests/newlines.md000066400000000000000000000000301425763206400156060ustar00rootroot00000000000000Bonjour Hello Foo Bar omd-1.3.2/tests/test_spec.ml000066400000000000000000000131171425763206400157750ustar00rootroot00000000000000(* This file tests the conformity of the generated AST with Markdown. *) open Printf let success = ref 0 let failures = ref 0 let () = let report () = if !failures = 0 then printf "Congratulation, all %d specification tests passed!\n" !success else printf "%d test%s passed, %d test%s failed.\n" !success (if !success > 1 then "s" else "") !failures (if !failures > 1 then "s" else "") in at_exit report let test name md_string desired_md = try let md = Omd.of_string md_string in if md = desired_md then ( incr success; (* printf "%s: SUCCESS\n" name *) ) else ( incr failures; printf "%s: FAILURE\n" name; printf " input = %S\nexpected = %S\n result = %S\n" md_string (Omd_backend.sexpr_of_md desired_md) (Omd_backend.sexpr_of_md md) ) with e -> incr failures; printf "%s: EXCEPTION\n %s\n" name (Printexc.to_string e) let () = let open Omd in (* Paragraphs and Line Breaks ***********************************************************************) (* "A paragraph is simply one or more consecutive lines of text, separated by one or more blank lines." Note that the final newlines are not considered to be part of the paragraphs, just a delimiter. *) test "Paragraph, simple" "Paragraph1\nline2\n\nP2\n\n\nP3" [Paragraph [Text "Paragraph1"; NL; Text "line2"]; Paragraph [Text "P2"]; Paragraph [Text "P3"]]; (* A blank line is any line that looks like a blank line — a line containing nothing but spaces or tabs is considered blank. *) test "Paragraph, blank line" "P1\n \nP2\n\t\nP3\n" [Omd.Paragraph [Omd.Text "P1"]; Omd.Paragraph [Omd.Text "P2"]; Omd.Paragraph [Omd.Text "P3"]]; (* "When you do want to insert a
, you end a line with two or more spaces." *) test "Paragraph,
" "Paragraph1 \nline2\n\nParagraph2" [Paragraph [Text "Paragraph1"; Br; Text "line2"]; Paragraph [Text "Paragraph2"]]; (* Normal paragraphs should not be indented with spaces or tabs. *) (* Headers ***********************************************************************) test "header, ===" "Title\n==" [Omd.H1 [Omd.Text "Title"]]; test "header, ---" "Title\n---" [Omd.H2 [Omd.Text "Title"]]; test "header, #" "# Title" [Omd.H1 [Omd.Text "Title"]]; test "header, ##" "## Title" [Omd.H2 [Omd.Text "Title"]]; test "header, ###" "### Title" [Omd.H3 [Omd.Text "Title"]]; test "header, ####" "#### Title" [Omd.H4 [Omd.Text "Title"]]; test "header, #####" "##### Title" [Omd.H5 [Omd.Text "Title"]]; test "header, ######" "###### Title" [Omd.H6 [Omd.Text "Title"]]; test "header, too deep" "######## Title\n" [Omd.Paragraph[Omd.Text "######## Title"]]; test "header, # + space" "# Title " [Omd.H1 [Omd.Text "Title"]]; test "header, # #" "# Title ###" [Omd.H1 [Omd.Text "Title"]]; test "header, # #" "# Title # " [Omd.H1 [Omd.Text "Title"]]; test "header, ## + space" "## Title # " [Omd.H2 [Omd.Text "Title"]]; test "header, # + \\n" "# Title\n" [Omd.H1 [Omd.Text "Title"]]; test "header, # + space + \\n" "# Title \n" [Omd.H1 [Omd.Text "Title"]]; test "header, # + # + \\n" "# Title # \n" [Omd.H1 [Omd.Text "Title"]]; (* Blockquotes ***********************************************************************) test "blockquote, simple" "> quoted" [Blockquote [Paragraph [Text "quoted"]]]; test "blockquote, simple 2" "> quoted\n" [Blockquote [Paragraph [Text "quoted"]]]; test "blockquote, 2 pars" "> quoted\n>\n> quoted2" [Blockquote [Paragraph [Text "quoted"]; Paragraph [Text "quoted2"]]]; test "blockquote, 2 pars (blank line)" "> quoted\n\n> quoted2" [Blockquote [Paragraph [Text "quoted"]; Paragraph [Text "quoted2"]]]; test "blockquote + header" "> ## header\n" [Blockquote [H2 [Text "header"]]]; test "blockquote + header + par" "> ## header\nHello" [Blockquote [H2 [Text "header"]; Paragraph [Text "Hello"]]]; test "blockquote + header + par" "> ## header\n> Hello" [Blockquote [H2 [Text "header"]; Paragraph [Text "Hello"]]]; test "blockquote + list" "> 1. item1\n> 2. item2\n" [Blockquote [Ol [[Text "item1"]; [Text "item2"]]]]; test "blockquote + code (4 spaces)" "> code" [Blockquote [Code_block ("", "code")]]; test "blockquote + code (tab)" "> \tcode" [Blockquote [Code_block ("", "code")]]; test "blockquote + code ```" "> ```\n> code\n> ```" [Blockquote [Code_block ("", "code")]]; (* Lists ***********************************************************************) test "list, simple" "8. Red\n1. Green\n3. Blue" [Ol [[Text "Red"]; [Text "Green"]; [Text "Blue"]]]; test "list, simple2" "\n8. Red\n1. Green\n3. Blue" [Ol [[Text "Red"]; [Text "Green"]; [Text "Blue"]]]; test "list, par" "8. Red\n\n1. Green\n\n3. Blue" [Olp [[Paragraph[Text "Red"]]; [Paragraph[Text "Green"]]; [Paragraph[Text "Blue"]]]]; test "list, *" "* AA\n* VV" [Ul [[Text "AA"]; [Text "VV"]]]; test "list, 2 levels" "* AA\n\n* VV" [Ulp [[Paragraph [Text "AA"]]; [Paragraph [Text "VV"]]]]; test "list + code + space + header" "- A - B ``` code ``` # header" [Ulp [[Paragraph [Omd.Text "A"]]; [Paragraph [Omd.Text "B"]; Code_block ("", "code")]]; NL; NL; H1 [Text "header"]]; (* Code ***********************************************************************) test "code dashes" "```\n--\n--\n--\n```" [Omd.Code_block ("", "--\n--\n--")]