pax_global_header00006660000000000000000000000064131122047120014503gustar00rootroot0000000000000052 comment=55aef483d81095258d999ead0125c80ee35407d9 instaparse-1.4.7/000077500000000000000000000000001311220471200136655ustar00rootroot00000000000000instaparse-1.4.7/.gitattributes000066400000000000000000000000551311220471200165600ustar00rootroot00000000000000* text auto *.clj text *.md text *.png binaryinstaparse-1.4.7/.gitignore000066400000000000000000000004021311220471200156510ustar00rootroot00000000000000/target /lib /classes /checkouts /bin .project .classpath pom.xml *.jar *.class .lein-deps-sum .lein-failures .lein-plugins ideas.txt benchmarks.txt todo.txt /.settings .nrepl-port .lein-repl-history *~ *#*# .cljs_node_repl/ .idea/ *.iml *.asc .nrepl-historyinstaparse-1.4.7/CHANGES.md000066400000000000000000000171651311220471200152710ustar00rootroot00000000000000# Instaparse Change Log ## 1.4.7 ### Enhancements * `visualize` now supports `:output-file :buffered-image`, which returns a java.awt.image.BufferedImage object. ### Bugfixes * Fixed problem where `visualize` with `:output-file` didn't work on rootless trees. ## 1.4.6 ### Performance improvements * Better performance for ABNF grammars in Clojurescript. ## 1.4.5 ### Bugfixes * Fixed regression in 1.4.4 involving parsers based off of URIs. * defparser now supports the full range of relevant parser options. ## 1.4.4 ### Enhancements * Instaparse is now cross-platform compatible between Clojure and Clojurescript. ### Features * defparser - builds parser at compile time ## 1.4.3 ### Bugfixes * Fixed bug with insta/transform on tree with hidden root tag and strings at the top level of the tree. ## 1.4.2 ### Bugfixes * Fixed problem with counted repetitions in ABNF. ## 1.4.1 ### Features * New function `add-line-and-column-info-to-metadata` in the instaparse.core namespace. ### Enhancements * Added new combinators for unicode character ranges, for better portability to Clojurescript. ### Bugfixes * Improved compatibility with boot, which allows having multiple versions of Clojure on the classpath, by making change to string-reader which needs to be aware of what version of Clojure it is running due to a breaking change in Clojure 1.7. * Fixed bug with the way failure messages were printed in certain cases. ## 1.4.0 ### Bugfixes * In 1.3.6, parsing of any CharSequence was introduced, however, the error messages for failed parses weren't printing properly. This has been fixed. * 1.4.0 uses a more robust algorithm for handling nested negative lookaheads, in response to a bug report where the existing mechanism produced incorrect parses (in addition to the correct parse) for a very unusual case. ### Enhancements * New support for tracing the steps the parser goes through. Call your parser with the optional flag `:trace true`. The first time you use this flag, it triggers a recompilation of the code with additional tracing and profiling steps. To restore the code to its non-instrumented form, call `(insta/disable-tracing!)`. ## 1.3.6 ### Enhancements * Modified for compatibility with Clojure 1.7.0-alpha6 * Instaparse now can parse anything supporting the CharSequence interface, not just strings. Specifically, this allows instaparse to operate on StringBuilder objects. ## 1.3.5 ### Bugfixes * Fixed bug with `transform` on hiccup data structures with numbers or other atomic data as leaves. * Fixed bug with character concatenation support in ABNF grammar ### Enhancements * Added support for Unicode characters to ABNF. ## 1.3.4 ### Enhancements * Modified for compatibility with Clojure 1.7.0-alpha2. ## 1.3.3 ### Enhancements Made two changes to make it possible to use instaparse on Google App Engine. * Removed dependency on javax.swing.text.Segment class. * Added `:no-slurp true` keyword option to `insta/parser` to disable URI slurping behavior, since GAE does not support slurp. ## 1.3.2 ### Bugfixes * Regular expressions on empty strings weren't properly returning a failure. ## 1.3.1 ### Enhancements * Updated tests to use Clojure 1.6.0's final release. * Added `:ci-string true` flag to `insta/parser`. ## 1.3.0 ### Compatibility with Clojure 1.6 ## 1.2.16 ### Bugfixes * Calling `empty` on a FlattenOnDemandVector now returns []. ## 1.2.15 ### Enhancements * :auto-whitespace can now take the keyword :standard or :comma to access one of the predefined whitespace parsers. ### Bugfixes * Fixed newline problem visualizing parse trees on Linux. * Fixed problem with visualizing rootless trees. ## 1.2.11 ### Minor enhancements * Further refinements to the way ordered choice interacts with epsilon parsers. ## 1.2.10 ### Bugfixes * Fixed bug introduced by 1.2.9 affecting ordered choice. ## 1.2.9 ### Bugfixes * Fixed bug where ordered choice was ignoring epsilon parser. ## 1.2.8 ### Bugfixes * Fixed bug introduced by 1.2.7, affecting printing of grammars with regexes. ### Enhancements * Parser printing format now includes <> hidden information and tags. ## 1.2.7 ### Bugfixes * Fixed bug when regular expression contains | character. ## 1.2.6 ### Bugfixes * Changed pre-condition assertion for auto-whitespace option which was causing a problem with "lein jar". ## 1.2.5 ### Bugfixes * Improved handling of unusual characters in ABNF grammars. ## 1.2.4 ### Bugfixes * When parsing in :total mode with :enlive as the output format, changed the content of failure node from vector to list to match the rest of the enlive output. ## 1.2.3 ### Bugfixes * Fixed problem when epsilon was the only thing in a nonterminal, e.g., "S = epsilon" ### Features * Added experimental `:auto-whitespace` feature. See the [Experimental Features Document](docs/ExperimentalFeatures.md) for more details. ## 1.2.2 ### Bugfixes * Fixed reflection warning. ## 1.2.1 ### Bugfixes * I had accidentally left a dependency on tools.trace in the repeat.clj file, used while I was debugging that namespace. Removed it. ## 1.2.0 ### New Features * `span` function returns substring indexes into the parsed text for a portion of the parse tree. * `visualize` function draws the parse tree, using rhizome and graphviz if installed. * `:optimize :memory` flag that, for suitable parsers, will perform the parsing in discrete chunks, using less memory. * New parsing flag to undo the effect of the <> hide notation. + `(my-parser text :unhide :tags)` - reveals tags, i.e., `<>` applied on the left-hand sides of rules. + `(my-parser text :unhide :content)` - reveals content hidden on the right-hand side of rules with `<>` + `(my-parser text :unhide :all)` - reveals both tags and content. ### Notable Performance Improvements * Dramatic performance improvement (quadratic time reduced to linear) when repetition parsers (+ or *) operate on text whose parse tree contains a large number of repetitions. * Performance improvement for regular expressions. ### Minor Enhancements * Added more support to IncrementalVector for a wider variety of vector operations, including subvec, nth, and vec. ## 1.1.0 ### Breaking Changes * When you run a parser in "total" mode, the failure node is no longer tagged with `:failure`, but instead is tagged with `:instaparse/failure`. ### New Features * Comments now supported in CFGs. Use (* and *) notation. * Added `ebnf` combinator to the `instaparse/combinators` namespace. This new combinator converts string specifications to the combinator-built equivalent. See combinator section of the updated tutorial for details. * ABNF: can now create a parser from a specification using `:input-format :abnf` for ABNF parser syntax. * New combinators related to ABNF: 1. `abnf` -- converts ABNF string fragments to combinators. 2. `string-ci` -- case-insensitive strings. 3. `rep` -- between m and n repetitions. * New core function related to ABNF: `set-default-input-format!` -- initially defaults to :ebnf ### Minor Enhancements * Added comments to regexes used by the parser that processes the context-free grammar syntax, improving the readability of error messages if you have a faulty grammar specification. ### Bug Fixes * Backslashes in front of quotation mark were escaping the quotation mark, even if the backslash itself was escaped. * Unescaped double-quote marks weren't properly handled, e.g., (parser "A = '\"'"). * Nullable Plus: ((parser "S = ('a'?)+") "") previously returned a failure, now returns [:S] * Fixed problem with failure reporting that would occur if parse failed on an input that ended with a newline character.instaparse-1.4.7/README.md000066400000000000000000002015631311220471200151530ustar00rootroot00000000000000# Instaparse 1.4.7 *What if context-free grammars were as easy to use as regular expressions?* ## Features Instaparse aims to be the simplest way to build parsers in Clojure. + Turns *standard EBNF or ABNF notation* for context-free grammars into an executable parser that takes a string as an input and produces a parse tree for that string. + *No Grammar Left Behind*: Works for *any* context-free grammar, including *left-recursive*, *right-recursive*, and *ambiguous* grammars. + Extends the power of context-free grammars with PEG-like syntax for lookahead and negative lookahead. + Supports both of Clojure's most popular tree formats (hiccup and enlive) as output targets. + Detailed reporting of parse errors. + Optionally produces lazy sequence of all parses (especially useful for diagnosing and debugging ambiguous grammars). + "Total parsing" mode where leftover string is embedded in the parse tree. + Optional combinator library for building grammars programmatically. + Performant. ## Quickstart Instaparse requires Clojure v1.5.1 or later, or ClojureScript v1.7.28 or later. Add the following line to your leiningen dependencies: [instaparse "1.4.7"] Require instaparse in your namespace header: (ns example.core (:require [instaparse.core :as insta])) ### Creating your first parser Here's a typical example of a context-free grammar one might see in a textbook on automata and/or parsing. It is a common convention in many textbooks to use the capital letter `S` to indicate the starting rule, so for this example, we'll follow that convention: S = AB* AB = A B A = 'a'+ B = 'b'+ This looks for alternating runs of 'a' followed by runs of 'b'. So for example "aaaaabbaaabbb" satisfies this grammar. On the other hand, "aaabbbbaa" does not (because the grammar specifies that each run of 'a' must be followed by a run of 'b'). With instaparse, turning this grammar into an executable parser is as simple as typing the grammar in: (def as-and-bs (insta/parser "S = AB* AB = A B A = 'a'+ B = 'b'+")) => (as-and-bs "aaaaabbbaaaabb") [:S [:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]] At this point, if you know EBNF notation for context-free grammars, you probably know enough to dive in and start playing around. However, instaparse is rich with features, so if you want to know the full scope of what it can do, read on... ## Tutorial ### Notation Instaparse supports most of the common notations for context-free grammars. For example, a popular alternative to `*` is to surround the term with curly braces `{}`, and a popular alternative to `?` is to surround the term with square brackets `[]`. Rules can be specified with `=`, `:`, `:=`, or `::=`. Rules can optionally end with `;`. Instaparse is very flexible in terms of how you use whitespace (as in Clojure, `,` is treated as whitespace) and you can liberally use parentheses for grouping. Terminal strings can be enclosed in either single quotes or double quotes (however, since you are writing the grammar specification inside of a Clojure double-quoted string, any use of double-quotes would have to be escaped, therefore single-quotes are easier to read). Newlines are optional; you can put the entire grammar on one line if you desire. In fact, all these notations can be mixed up in the same specification if you want. So here is an equally valid (but messier) way to write out the exact same grammar, just to illustrate the flexibility that you have: (def as-and-bs-alternative (insta/parser "S:={AB} ; AB ::= (A, B) A : \"a\" + ; B ='b' + ;")) Note that regardless of the notation you use in your specification, when you evaluate the parser at the REPL, the rules will be pretty-printed: => as-and-bs-alternative S = AB* AB = A B A = "a"+ B = "b"+ Here's a quick guide to the syntax for defining context-free grammars:
CategoryNotationsExample
Rule: := ::= =S = A
End of rule; . (optional)S = A;
Alternation|A | B
Concatenationwhitespace or ,A B
Grouping()(A | B) C
Optional? []A? [A]
One or more+A+
Zero or more* {}A* {A}
String terminal"" '''a' "a"
Regex terminal#"" #''#'a' #"a"
EpsilonEpsilon epsilon EPSILON eps ε "" ''S = 'a' S | Epsilon
Comment(* *)(* This is a comment *)
As is the norm in EBNF notation, concatenation has a higher precedence than alternation, so in the absence of parentheses, something like `A B | C D` means `(A B) | (C D)`. ### Input from resource file Parsers can also be built from a specification contained in a file, either locally or on the web. For example, I stored on github a file with a simple grammar to parse text containing a single 'a' surrounded optionally by whitespace. The specification in the file looks like this: S = #"\s*" "a" #"\s*" Building the parser from the URI is easy: (insta/parser "https://gist.github.com/Engelberg/5283346/raw/77e0b1d0cd7388a7ddf43e307804861f49082eb6/SingleA") This provides a convenienent way to share parser specifications over the Internet. You can also use a specification contained in a local resource in your classpath: (insta/parser (clojure.java.io/resource "myparser.bnf")) ### `defparser` On ClojureScript, the `(def my-parser (insta/parser "..."))` use case has the following disadvantages: - ClojureScript does not support `slurp`, so `parser` cannot automatically read from file paths / URLs. - Having to parse a grammar string at runtime can impact the startup performance of an application or webpage. To solve those problems, a macro `instaparse.core/defparser` is provided that, if given a string for a grammar specification, will parse that as a grammar up front and emit more performant code. ```clojure ;; Clojure (:require [instaparse.core :as insta :refer [defparser]]) ;; ClojureScript (:require [instaparse.core :as insta :refer-macros [defparser]]) => (time (def p (insta/parser "S = A B; A = 'a'+; B = 'b'+"))) "Elapsed time: 4.368179 msecs" #'user/p => (time (defparser p "S = A B; A = 'a'+; B = 'b'+")) ; the meat of the work happens at macro-time "Elapsed time: 0.091689 msecs" #'user/p => (defparser p "https://gist.github.com/Engelberg/5283346/raw/77e0b1d0cd7388a7ddf43e307804861f49082eb6/SingleA") ; works even in cljs! #'user/p => (defparser p [:S (c/plus (c/string "a"))]) ; still works, but won't do any extra magic behind the scenes #'user/p => (defparser p "S = 1*'a'" :input-format :abnf :output-format :enlive) ; takes additional keyword arguments #'user/p ``` `defparser` is primarily useful in Clojurescript, but works in both Clojure and Clojurescript for cross-platform compatibility. ### Escape characters Putting your grammar in a separate resource file has an additional advantage -- it provides a very straightforward "what you see is what you get" view of the grammar. The only escape characters needed are the ordinary escape characters for strings and regular expressions (additionally, instaparse also supports `\'` inside single-quoted strings). When you specify a grammar directly in your Clojure code as a double-quoted string, extra escape characters may be needed in the strings and regexes of your grammar: 1. All `"` string and regex delimiters must be turned into `\"` or replaced with a single-quote `'`. 2. All backslash characters in your strings and regexes `\` should be escaped and turned into `\\`. (In some cases you can get away with not escaping the backslash, but it is best practice to be consistent and always do it.) For example, the above grammar could be written in Clojure as: (insta/parser "S = #'\\s*' 'a' #'\\s*'") It is unfortunate that this extra level of escaping is necessary. Many programming languages provide some sort of facility for creating "raw strings" which are taken verbatim (e.g., Python's triple-quoted strings). I don't understand why Clojure does not support raw strings, but it doesn't. Fortunately, for many grammars this is a non-issue, and if the escaping does get bad enough to affect readability, there is always the option of storing the grammar in a separate file. ### Output format When building parsers, you can specify an output format of either :hiccup or :enlive. :hiccup is the default, but here is an example of the above parser with :enlive set as the output format: (def as-and-bs-enlive (insta/parser "S = AB* AB = A B A = 'a'+ B = 'b'+" :output-format :enlive)) => (as-and-bs-enlive "aaaaabbbaaaabb") {:tag :S, :content ({:tag :AB, :content ({:tag :A, :content ("a" "a" "a" "a" "a")} {:tag :B, :content ("b" "b" "b")})} {:tag :AB, :content ({:tag :A, :content ("a" "a" "a" "a")} {:tag :B, :content ("b" "b")})})} I find the hiccup format to be pleasant and compact, especially when working with the parsed output in the REPL. The main advantage of the enlive format is that it allows you to use the very powerful enlive library to select and transform nodes in your tree. If you want to alter instaparse's default output format: (insta/set-default-output-format! :enlive) ### Controlling the tree structure The principles of instaparse's output trees: - Every rule equals one level of nesting in the tree. - Each level is automatically tagged with the name of the rule. To better understand this, take a look at these two variations of the same parser we've been discussing: (def as-and-bs-variation1 (insta/parser "S = AB* AB = 'a'+ 'b'+")) => (as-and-bs-variation1 "aaaaabbbaaaabb") [:S [:AB "a" "a" "a" "a" "a" "b" "b" "b"] [:AB "a" "a" "a" "a" "b" "b"]] (def as-and-bs-variation2 (insta/parser "S = ('a'+ 'b'+)*")) => (as-and-bs-variation2 "aaaaabbbaaaabb") [:S "a" "a" "a" "a" "a" "b" "b" "b" "a" "a" "a" "a" "b" "b"] #### Hiding content For this next example, let's consider a parser that looks for a sequence of a's or b's surrounded by parens. (def paren-ab (insta/parser "paren-wrapped = '(' seq-of-A-or-B ')' seq-of-A-or-B = ('a' | 'b')*")) => (paren-ab "(aba)") [:paren-wrapped "(" [:seq-of-A-or-B "a" "b" "a"] ")"] It's very common in parsers to have elements that need to be present in the input and parsed, but we'd rather not have them appear in the output. In the above example, the parens are essential to the grammar yet the tree would be much easier to read and manipulate if we could hide those parens; once the string has been parsed, the parens themselves carry no additional semantic value. In instaparse, you can use angle brackets `<>` to hide parsed elements, suppressing them from the tree output. (def paren-ab-hide-parens (insta/parser "paren-wrapped = <'('> seq-of-A-or-B <')'> seq-of-A-or-B = ('a' | 'b')*")) => (paren-ab-hide-parens "(aba)") [:paren-wrapped [:seq-of-A-or-B "a" "b" "a"]] Voila! The parens "(" and ")" tokens have been hidden. Angle brackets are a powerful tool for hiding whitespace and other delimiters from the output. #### Hiding tags Continuing with the same example parser, let's say we decide that the :seq-of-A-or-B tag is also superfluous -- we'd rather not have that extra nesting level appear in the output tree. We've already seen that one option is to simply lift the right-hand side of the seq-of-A-or-B rule into the paren-wrapped rule, as follows: (def paren-ab-manually-flattened (insta/parser "paren-wrapped = <'('> ('a'|'b')* <')'>")) => (paren-ab-manually-flattened "(aba)") [:paren-wrapped "a" "b" "a"] But sometimes, it is ugly or impractical to do this. It would be nice to have a way to express the concept of "repeated sequence of a's and b's" as a separate rule, without necessarily introducing an additional level of nesting. Again, the angle brackets come to the rescue. We simply use the angle brackets to hide the *name* of the rule. Since each name corresponds to a level of nesting, hiding the name means the parsed contents of that rule will appear in the output tree without the tag and its associated new level of nesting. (def paren-ab-hide-tag (insta/parser "paren-wrapped = <'('> seq-of-A-or-B <')'> = ('a' | 'b')*")) => (paren-ab-hide-tag "(aba)") [:paren-wrapped "a" "b" "a"] You might wonder what would happen if we hid the root tag as well. Let's take a look: (def paren-ab-hide-both-tags (insta/parser " = <'('> seq-of-A-or-B <')'> = ('a' | 'b')*")) => (paren-ab-hide-both-tags "(aba)") ("a" "b" "a") With no root tag, the parser just returns a sequence of children. So in the above example where *all* the tags are hidden, you just get a sequence of parsed elements. Sometimes that's what you want, but in general, I recommend that you don't hide the root tag, ensuring the output is a well-formed tree. #### Revealing hidden information Sometimes, after setting up the parser to hide content and tags, you temporarily want to reveal the hidden information, perhaps for debugging purposes. The optional keyword argument `:unhide :content` reveals the hidden content in the tree output. => (paren-ab-hide-both-tags "(aba)" :unhide :content) ("(" "a" "b" "a" ")") The optional keyword argument `:unhide :tags` reveals the hidden tags in the tree output. => (paren-ab-hide-both-tags "(aba)" :unhide :tags) [:paren-wrapped [:seq-of-A-or-B "a" "b" "a"]] The optional keyword argument `:unhide :all` reveals all hidden information. => (paren-ab-hide-both-tags "(aba)" :unhide :all) [:paren-wrapped "(" [:seq-of-A-or-B "a" "b" "a"] ")"] ### No Grammar Left Behind One of the things that really sets instaparse apart from other Clojure parser generators is that it can handle any context-free grammar. For example, some parsers only accept LL(1) grammars, others accept LALR grammars. Many of the libraries use a recursive-descent strategy that fails for left-recursive grammars. If you are willing to learn the esoteric restrictions posed by the library, it is usually possible to rework your grammar to fit that mold. But instaparse lets you write your grammar in whatever way is most natural. #### Right recursion No problem: => ((insta/parser "S = 'a' S | Epsilon") "aaaa") [:S "a" [:S "a" [:S "a" [:S "a" [:S]]]]] Note the use of Epsilon, a common name for the "empty" parser that always succeeds without consuming any characters. You can also just use an empty string if you prefer. #### Left recursion No problem: => ((insta/parser "S = S 'a' | Epsilon") "aaaa") [:S [:S [:S [:S [:S] "a"] "a"] "a"] "a"] As you can see, either of these recursive parsers will generate a parse tree that is deeply nested. Unfortunately, Clojure does not handle deeply-nested data structures very well. If you were to run the above parser on, say, a string of 20,000 a's, instaparse will happily try to generate the corresponding parse tree but then Clojure will stack overflow when it tries to hash the tree. So, as is often advisable in Clojure, use recursion judiciously in a way that will keep your trees a manageable depth. For the above parser, it is almost certainly better to just do: => ((insta/parser "S = 'a'*") "aaaa") [:S "a" "a" "a" "a"] #### Infinite loops If you specify an unterminated recursive grammar, instaparse will handle that gracefully as well and terminate with an error, rather than getting caught in an infinite loop: => ((insta/parser "S = S") "a") Parse error at line 1, column 1: a ^ ### Ambiguous grammars (def ambiguous (insta/parser "S = A A A = 'a'*")) This grammar is interesting because even though it specifies a repeated run of a's, there are many possible ways the grammar can chop it up. Our parser will faithfully return one of the possible parses: => (ambiguous "aaaaaa") [:S [:A "a"] [:A "a" "a" "a" "a" "a"]] However, we can do better. First, I should point out that `(ambiguous "aaaaaa")` is really just shorthand for `(insta/parse ambiguous "aaaaaa")`. Parsers are not actually functions, but are records that implement the function interface as a shorthand for calling the insta/parse function. `insta/parse` is the way you ask a parser to produce a single parse tree. But there is another library function `insta/parses` that asks the parser to produce a lazy sequence of all parse trees. Compare: => (insta/parse ambiguous "aaaaaa") [:S [:A "a"] [:A "a" "a" "a" "a" "a"]] => (insta/parses ambiguous "aaaaaa") ([:S [:A "a"] [:A "a" "a" "a" "a" "a"]] [:S [:A "a" "a" "a" "a" "a" "a"] [:A]] [:S [:A "a" "a"] [:A "a" "a" "a" "a"]] [:S [:A "a" "a" "a"] [:A "a" "a" "a"]] [:S [:A "a" "a" "a" "a"] [:A "a" "a"]] [:S [:A "a" "a" "a" "a" "a"] [:A "a"]] [:S [:A] [:A "a" "a" "a" "a" "a" "a"]]) You may wonder, why is this useful? Two reasons: 1. Sometimes it is difficult to remove ambiguity from a grammar, but the ambiguity doesn't really matter -- any parse tree will do. In these situations, instaparse's ability to work with ambiguous grammars can be quite handy. 2. Instaparse's ability to generate a sequence of all parses provides a powerful tool for debugging and thus *removing* ambiguity from an unintentionally ambiguous grammar. It turns out that when designing a context-free grammar, it's all too easy to accidentally introduce some unintentional ambiguity. Other parser tools often report ambiguities as cryptic "shift-reduce" messages, if at all. It's rather empowering to see the precise parse that instaparse finds when multiple parses are possible. I generally test my parsers using the `insta/parses` function so I can immediately spot any ambiguities I've inadvertently introduced. When I'm confident the parser is not ambiguous, I switch to `insta/parse` or, equivalently, just call the parser as if it were a function. ### Regular expressions: A word of warning As you can see from the above example, instaparse flexibly interprets * and +, trying all possible numbers of repetitions in order to create a parse tree. It is easy to become spoiled by this, and then forget that regular expressions have different semantics. Instaparse's regular expressions are just Clojure/Java regular expressions, which behave in a greedy manner. To better understand this point, contrast the above parser with this one: (def not-ambiguous (insta/parser "S = A A A = #'a*'")) => (insta/parses not-ambiguous "aaaaaa") ([:S [:A "aaaaaa"] [:A ""]]) In this parser, the * is *inside* the regular expression, which means that it follows greedy regular expression semantics. Therefore, the first A eats all the a's it can, leaving no a's for the second A. For this reason, it is wise to use regular expressions judiciously, mainly to express the patterns of your tokens, and leave the overall task of parsing to instaparse. Regular expressions can often be tortured and abused into serving as a crude parser, but don't do it! There's no need; with instaparse, you now have an equally convenient but more expressive tool to bring to bear on parsing problems. Here is an example that I think is a tasteful use of regular expressions to split a sentence on whitespaces, categorizing the tokens as words or numbers: (def words-and-numbers (insta/parser "sentence = token ( token)* = word | number whitespace = #'\\s+' word = #'[a-zA-Z]+' number = #'[0-9]+'")) => (words-and-numbers "abc 123 def") [:sentence [:word "abc"] [:number "123"] [:word "def"]] ### Partial parses By default, instaparse assumes you are looking for a parse tree that covers the entire input string. However, sometimes it may be useful to look at all the partial parses that satisfy the grammar while consuming some initial portion of the input string. For this purpose, both `insta/parse` and `insta/parses` take a keyword argument, `:partial` that you simply set to true. (def repeated-a (insta/parser "S = 'a'+")) => (insta/parses repeated-a "aaaaaa") ([:S "a" "a" "a" "a" "a" "a"]) => (insta/parses repeated-a "aaaaaa" :partial true) ([:S "a"] [:S "a" "a"] [:S "a" "a" "a"] [:S "a" "a" "a" "a"] [:S "a" "a" "a" "a" "a"] [:S "a" "a" "a" "a" "a" "a"]) Of course, using `:partial true` with `insta/parse` means that you'll only get the first parse result found. => (insta/parse repeated-a "aaaaaa" :partial true) [:S "a"] ### PEG extensions PEGs are a popular alternative to context-free grammars. On the surface, PEGs look very similar to CFGs, but the various choice operators are meant to be interpreted in a strictly greedy, ordered way that removes any ambiguity from the grammar. Some view this lack of ambiguity as an advantage, but it does limit the expressiveness of PEGs relative to context-free grammars. Furthermore, PEGs are usually tightly coupled to a specific parsing strategy that forbids left-recursion, further limiting their utility. To combat that lost expressiveness, PEGs adopted a few operators that actually allow PEGs to do some things that CFGs cannot express. Even though the underlying paradigm is different, I've swiped these juicy bits from PEGs and included them in instaparse, giving instaparse more expressive power than either traditional PEGs or traditional CFGs. Here is a table of the PEG operators that have been adapted for use in instaparse; I'll explain them in more detail shortly.
CategoryNotationsExample
Lookahead&&A
Negative lookahead!!A
Ordered Choice/A / B
#### Lookahead The symbol for lookahead is `&`, and is generally used as part of a chain of concatenated parsers. Lookahead tests whether there are some number of characters that lie ahead in the text stream that satisfy the parser. It performs this test without actually "consuming" characters. Only if that lookahead test succeeds do the remaining parsers in the chain execute. That's a mouthful, and hard to understand in the abstract, so let's look at a concrete example: (def lookahead-example (insta/parser "S = &'ab' ('a' | 'b')+")) The `('a' | 'b')+` part should be familiar at this point, and you hopefully recognize this as a parser that ensures the text is a string entirely of a's and b's. The other part, `&'ab'` is the lookahead. Notice how the `&` precedes the expression it is operating on. Before processing the `('a' | 'b')+`, it looks ahead to verify that the `'ab'` parser could hypothetically be satisfied by the upcoming characters. In other words, it will only accept strings that start off with the characters `ab`. => (lookahead-example "abaaaab") [:S "a" "b" "a" "a" "a" "a" "b"] => (lookahead-example "bbaaaab") Parse error at line 1, column 1: bbaaaab ^ Expected: "ab" If you write something like `&'a'+` with no parens, this will be interpreted as `&('a'+)`. Here is my favorite example of lookahead, a parser that only succeeds on strings with a run of a's followed by a run of b's followed by a run of c's, where each of those runs must be the same length. If you've ever taken an automata course, you may remember that there is a very elegant proof that it is impossible to express this set of constraints with a pure context-free grammar. Well, with lookahead, it *is* possible: (def abc (insta/parser "S = &(A 'c') 'a'+ B A = 'a' A? 'b' = 'b' B? 'c'")) => (abc "aaabbbccc") [:S "a" "a" "a" "b" "b" "b" "c" "c" "c"] This example succeeds because there are three a's followed by three b's followed by three c's. Verifying that this parser fails for unequal runs and other mixes of letters is left as an exercise for the reader. #### Negative lookahead Negative lookahead uses the symbol `!`, and like `&`, it precedes the expression. It does exactly what you'd expect -- it performs a lookahead and confirms that the parser is *not* satisfied by the upcoming characters in the screen. (def negative-lookahead-example (insta/parser "S = !'ab' ('a' | 'b')+")) So this parser turns around the meaning of the previous example, accepting all strings of a's and b's that *don't* start off with `ab`. => (negative-lookahead-example "abaaaab") Parse error at line 1, column 1: abaaaab ^ Expected: NOT "ab" => (negative-lookahead-example "bbaaaab") [:S "b" "b" "a" "a" "a" "a" "b"] One issue with negative lookahead is that it introduces the possibility of paradoxes. Consider: S = !S 'a' How should this parser behave on an input of "a"? If S succeeds, it should fail, and if it fails it should succeed. PEGs simply don't allow this sort of grammar, but the whole spirit of instaparse is to flexibly allow recursive grammars, so I needed to find some way to handle it. Basically, I've taken steps to make sure that a paradoxical grammar won't cause instaparse to go into an infinite loop. It will terminate, but I make no promises about what the results will be. If you specify a paradoxical grammar, it's a garbage-in-garbage-out kind of situation (although to be clear, instaparse won't return complete garbage; it will make some sort of reasonable judgment about how to interpret it). If you're curious about how instaparse behaves with the above paradoxical example, here it is: => ((insta/parser "S = !S 'a'") "a") [:S "a"] Negative lookahead, when used properly, is an extremely powerful tool for removing ambiguity from your parser. To illustrate this, let's take a look at a very common parsing task, which involves tokenizing a string of characters into a combination of identifiers and reserved keywords. Our first attempt at this ends up ambiguous: (def ambiguous-tokenizer (insta/parser "sentence = token ( token)* = keyword | identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keyword = 'cond' | 'defn'")) => (insta/parses ambiguous-tokenizer "defn my cond") ([:sentence [:identifier "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:identifier "defn"] [:identifier "my"] [:keyword "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:keyword "cond"]]) Each of our keywords not only fits the description of keyword, but also of identifier, so our parser doesn't know which way to parse those words. Instaparse makes no guarantee about what order it processes alternatives, and in this situation, we see that in fact, the combination we wanted was listed last among the possible parses. Negative lookahead provides an easy way to remove this ambiguity: (def unambiguous-tokenizer (insta/parser "sentence = token ( token)* = keyword | !keyword identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keyword = 'cond' | 'defn'")) => (insta/parses unambiguous-tokenizer "defn my cond") ([:sentence [:keyword "defn"] [:identifier "my"] [:keyword "cond"]]) #### Ordered choice As I mentioned earlier, a PEG's interpretation of `+`, `*`, and `|` are subtly different from the way those symbols are interpreted in CFGs. `+` and `*` are interpreted greedily, just as they are in regular expressions. `|` proceeds in a rather strict order, trying the first alternative first, and only proceeding if that one fails. To remind users that these multiple choices are strictly ordered, PEGs commonly use the forward slash `/` rather than `|`. Although the PEG paradigm of forced order is antithetical to instaparse's flexible parsing strategy, I decided to co-opt the `/` notation to express a preference of one alternative over another. With that in mind, let's look back at the `ambiguous-tokenizer` example from the previous section. In that example, we found that our desired parse, in which the keywords were classified, ended up at the bottom of the heap: => (insta/parses ambiguous-tokenizer "defn my cond") ([:sentence [:identifier "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:identifier "defn"] [:identifier "my"] [:keyword "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:keyword "cond"]]) We've already seen one way to remove the ambiguity by using negative lookahead. But now we have another tool in our toolbox, `/`, which will allow the ambiguity to remain, while bringing the desired parse result to the top of the list. (def preferential-tokenizer (insta/parser "sentence = token ( token)* = keyword / identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keyword = 'cond' | 'defn'")) => (insta/parses preferential-tokenizer "defn my cond") ([:sentence [:keyword "defn"] [:identifier "my"] [:keyword "cond"]] [:sentence [:identifier "defn"] [:identifier "my"] [:keyword "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:identifier "defn"] [:identifier "my"] [:identifier "cond"]]) The ordered choice operator has its uses, but don't go overboard. There are two main reasons why it is generally better to use the regular unordered alternation operator. 1. When ordered choice interacts with a complex mix of recursion, other ordered choice operators, and indeterminate operators like `+` and `*`, it can quickly become difficult to reason about how the parsing will actually play out. 2. The next version of instaparse will support multithreading. In that version, every use of `|` will be an opportunity to exploit parallelism. On the contrary, uses of `/` will create a bottleneck where options have to be pursued in a specific order. ### Parse errors `(insta/parse my-parser "parse this text")` will either return a parse tree or a failure object. The failure object will pretty-print at the REPL, showing you the furthest point it reached while parsing your text, and listing all the possible tokens that would have allowed it to proceed. `(insta/parses my-parser "parse this text")` will return a sequence of all the parse trees, so in the event that no parse can be found, it will simply return an empty list. However, the failure object is still there, attached to the empty list as metadata. `(insta/failure? result)` will detect both these scenarios and return true if the result is either a failure object, or an empty list with a failure object attached as metdata. `(insta/get-failure result)` provides a unified way to extract the failure object in both these cases. If the result is a failure object, then it is directly returned, and if the result is an empty list with the failure attached as metadata, then the failure object is retrieved from the metadata. ### Total parse mode Sometimes knowing the point of failure is not enough and you need to know the entire context of the parse tree when it failed. To help with these sorts of situations, instaparse offers a "total parse" mode inspired by Christophe Grand's parsley parser. This total parse mode guarantees to parse the entire string; if the parser fails, it completes the parse anyway, embedding the failure point as a node in the parse tree. To demonstrate, let's revisit the ultra-simple `repeated-a` parser. => repeated-a S = "a"+ => (repeated-a "aaaaaaaa") [:S "a" "a" "a" "a" "a" "a" "a" "a"] On a string with a valid parse, the total parse mode performs identically: => (repeated-a "aaaaaaaa" :total true) [:S "a" "a" "a" "a" "a" "a" "a" "a"] On a failure, note the difference: => (repeated-a "aaaabaaa") Parse error at line 1, column 5: aaaabaaa ^ Expected: "a" => (repeated-a "aaaabaaa" :total true) [:S "a" "a" "a" "a" [:instaparse/failure "baaa"]] Note that this kind of total parse result is still considered a "failure", and we can test for that and retrieve the failure object using `insta/failure?` and `insta/get-failure`, respectively. => (insta/failure? (repeated-a "aaaabaaa" :total true)) true => (insta/get-failure (repeated-a "aaaabaaa" :total true)) Parse error at line 1, column 5: aaaabaaa ^ Expected: "a" I find that the total parse mode is the most valuable diagnostic tool when the cause of the error is far away from the point where the parser actually fails. A typical example might be a grammar where you are looking for phrases delimited by quotes, and the text neglects to include a closing quote mark around some phrase in the middle of the text. The parser doesn't fail until it hits the end of the text without encountering a closing quote mark. In such a case, a quick look at the total parse tree will show you the context of the failure, making it easy to spot the location where the run-on phrase began. ### Parsing from another start rule Another valuable tool for interactive debugging is the ability to test out individual rules. To demonstrate this, let's look back at our very first parser: => as-and-bs S = AB* AB = A B A = "a"+ B = "b"+ As we've seen throughout this tutorial, by default, instaparse assumes that the very first rule is your "starting production", the rule from which parsing initially proceeds. But we can easily set other rules to be the starting production with the `:start` keyword argument. => (as-and-bs "aaa" :start :A) [:A "a" "a" "a"] => (as-and-bs "aab" :start :A) Parse error at line 1, column 3: aab ^ Expected: "a" => (as-and-bs "aabb" :start :AB) [:AB [:A "a" "a"] [:B "b" "b"]] => (as-and-bs "aabbaabb" :start :AB) Parse error at line 1, column 5: aabbaabb ^ Expected: "b" The `insta/parser` function, which builds the parser from the specification, also accepts the :start keyword to set the default start rule to something other than the first rule listed. #### Review of keyword arguments At this point, you've seen all the keyword arguments that an instaparse-generated parser accepts, `:start :rule-name`, `:partial true`, and `:total true`. All these keyword arguments can be freely mixed and work with both `insta/parse` and `insta/parses`. You've also seen both keyword arguments that can be used when building the parser from the specification: `:output-format (:enlive or :hiccup)` and `:start :rule-name` to set a different default start rule than the first rule. ### Transforming the tree A parser's job is to turn a string into some kind of tree structure. What you do with it from there is up to you. It is delightfully easy to manipulate trees in Clojure. There are wonderful tools available: enlive, zippers, match, and tree-seq. But even without those tools, most tree manipulations are straightforward to perform in Clojure with recursion. Since tree transformations are already so easy to perform in Clojure, there's not much point in building a sophisticated transform library into instaparse. Nevertheless, I did include one function, `insta/transform`, that addresses the most common transformation needs. `insta/transform` takes a map from tree tags to transform functions. A transform function is defined as a function which takes the children of the tree node as inputs and returns a replacement node. In other words, if you want to turn all nodes in your tree of the form `[:switch x y]` into `[:switch y x]`, then you'd call: (insta/transform {:switch (fn [x y] [:switch y x])} my-tree) Let's make this concrete with an example. So far, throughout the tutorial, we were able to adequately express the tokens of our languages with strings or regular expressions. But sometimes, regular expressions are not sufficient, and we want to bring the full power of context-free grammars to bear on the problem of processing the individual tokens. When we do that, we end up with a bunch of individual characters where we really want a string or a number. To illustrate this, let's revisit the `words-and-numbers` example, but this time, we'll imagine that regular expressions aren't rich enough to specify the constraints on those tokens and we need our grammar to process the string one character at a time: (def words-and-numbers-one-character-at-a-time (insta/parser "sentence = token ( token)* = word | number whitespace = #'\\s+' word = letter+ number = digit+ = #'[a-zA-Z]' = #'[0-9]'")) => (words-and-numbers-one-character-at-a-time "abc 123 def") [:sentence [:word "a" "b" "c"] [:number "1" "2" "3"] [:word "d" "e" "f"]] We'd really like to simplify these `:word` and `:number` terminals. So for `:word` nodes, we want to concatenate the strings with clojure's built-in `str` function, and for `:number` nodes, we want to concatenate the strings and convert the string to a number. We can do this quite simply as follows: => (insta/transform {:word str, :number (comp clojure.edn/read-string str)} (words-and-numbers-one-character-at-a-time "abc 123 def")) [:sentence "abc" 123 "def"] Or, if you're a fan of threading macros, try this version: => (->> (words-and-numbers-one-character-at-a-time "abc 123 def") (insta/transform {:word str, :number (comp clojure.edn/read-string str)})) The `insta/transform` function auto-detects whether you are using enlive or hiccup trees, and processes accordingly. `insta/transform` performs its transformations in a bottom-up manner, which means that taken to an extreme, `insta/transform` can be used not only to rearrange a tree, but to evaluate it. Including a grammar for infix arithmetic math expressions has become nearly obligatory in parser tutorials, so I might as well use that in order to demonstrate evaluation. I've leveraged instaparse's principle of "one rule per node type" and the hide notation `<>` to get a nice clean unambiguous tree that includes only the relevant information for evaluation. (def arithmetic (insta/parser "expr = add-sub = mul-div | add | sub add = add-sub <'+'> mul-div sub = add-sub <'-'> mul-div = term | mul | div mul = mul-div <'*'> term div = mul-div <'/'> term = number | <'('> add-sub <')'> number = #'[0-9]+'")) => (arithmetic "1-2/(3-4)+5*6") [:expr [:add [:sub [:number "1"] [:div [:number "2"] [:sub [:number "3"] [:number "4"]]]] [:mul [:number "5"] [:number "6"]]]] With the tree in this shape, it's trivial to evaluate it: => (->> (arithmetic "1-2/(3-4)+5*6") (insta/transform {:add +, :sub -, :mul *, :div /, :number clojure.edn/read-string :expr identity})) 33 `insta/transform` is designed to play nicely with all the possible outputs of `insta/parse` and `insta/parses`. So if the input is a sequence of parse trees, it will return a sequence of transformed parse trees. If the input is a Failure object, then the Failure object is passed through unchanged. This means you can safely chain a transform to your parser without taking special cases. To demonstrate this, let's look back at the `ambiguous` parser from earlier in the tutorial: (def ambiguous (insta/parser "S = A A A = 'a'*")) => (->> (insta/parses ambiguous "aaaaaa") (insta/transform {:A str})) ([:S "a" "aaaaa"] [:S "aaaaaa" ""] [:S "aa" "aaaa"] [:S "aaa" "aaa"] [:S "aaaa" "aa"] [:S "aaaaa" "a"] [:S "" "aaaaaa"]) => (->> (ambiguous "aabaaa") (insta/transform {:A str})) Parse error at line 1, column 3: aabaaa ^ Expected: "a" ### Understanding the tree #### Character spans The trees produced by instaparse are annotated with metadata so that for each subtree, you can easily recover the start and end index of the input text parsed by that subtree. The convenience function for extracting this metadata is `insta/span`. To demonstrate, let's revisit our first example. => (as-and-bs "aaaaabbbaaaabb") [:S [:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]] => (meta (as-and-bs "aaaaabbbaaaabb")) {:instaparse.gll/start-index 0, :instaparse.gll/end-index 14} => (insta/span (as-and-bs "aaaaabbbaaaabb")) [0 14] => (count "aaaaabbbaaaabb") 14 As you can see, `insta/span` returns a pair containing the start index (inclusive) and end index (exclusive), the customary way to represent the start and end of a substring. So far, this isn't particularly interesting -- we already knew that the entire string was successfully parsed. But since `span` works on all the subtrees, this gives us a powerful tool for exploring the provenance of each portion of the tree. To demonstrate this, here's a quick helper function (not part of instaparse's API) that takes a hiccup tree and replaces all the tags with the character spans. (defn spans [t] (if (sequential? t) (cons (insta/span t) (map spans (next t))) t)) => (spans (as-and-bs "aaaabbbaabbab")) ([0 13] ([0 7] ([0 4] "a" "a" "a" "a") ([4 7] "b" "b" "b")) ([7 11] ([7 9] "a" "a") ([9 11] "b" "b")) ([11 13] ([11 12] "a") ([12 13] "b"))) `insta/span` works on all the tree types produced by instaparse. Furthermore, when you use `insta/transform` to transform your parse tree, `insta/span` will work on the transformed tree as well -- the span metadata is preserved for every node in the transformed tree to which metadata can be attached. Keep in mind that although most types of Clojure data support metadata, primitives such as strings or numbers do not, so if you transform any of your nodes into such primitive data types, `insta/span` on those nodes will simply return `nil`. ##### Line and column information Sometimes, when the input string contains newline characters, it is useful to have the span metadata in the form of line and column numbers. By default, instaparse doesn't do this, because generating line and column information requires a second pass over the input string and parse tree. However, the function `insta/add-line-and-column-info-to-metadata` performs this second pass, taking the input string and parse tree, returning a parse tree with the additional metadata. Make sure to pass in the same input string from which the parse tree was derived! => (def multiline-text "This is line 1\nThis is line 2") => (words-and-numbers multiline-text) [:sentence [:word "This"] [:word "is"] [:word "line"] [:number "1"] [:word "This"] [:word "is"] [:word "line"] [:number "2"]] => (def parsed-multiline-text-with-line-and-column-metadata (insta/add-line-and-column-info-to-metadata multiline-text (words-and-numbers multiline-text))) The additional information is in the metadata, so the tree itself is not visibly changed: => parsed-multiline-text-with-line-and-column-metadata [:sentence [:word "This"] [:word "is"] [:word "line"] [:number "1"] [:word "This"] [:word "is"] [:word "line"] [:number "2"]] But now let's inspect the metadata for the overall parse tree. => (meta parsed-multiline-text-with-line-and-column-metadata) {:instaparse.gll/end-column 15, :instaparse.gll/end-line 2, :instaparse.gll/start-column 1, :instaparse.gll/start-line 1, :instaparse.gll/start-index 0, :instaparse.gll/end-index 29} And let's take a look at the metadata for the word "is" on the second line of the text. => (meta (nth parsed-multiline-text-with-line-and-column-metadata 6)) {:instaparse.gll/end-column 8, :instaparse.gll/end-line 2, :instaparse.gll/start-column 6, :instaparse.gll/start-line 2, :instaparse.gll/start-index 20, :instaparse.gll/end-index 22}] start-line and start-column point to the same character as start-index, and end-line and end-column point to the same character as end-index. So just like the regular span metadata, the line/column start point is inclusive and the end point is exclusive. However, line and column numbers are 1-based counts, rather than 0-based. So, for example, index number 0 of the string corresponds to line 1, column 1. #### Visualizing the tree Instaparse contains a function, `insta/visualize` *(Clojure only)*, that will give you a visual overview of the parse tree, showing the tags, the character spans, and the leaves of the tree. => (insta/visualize (as-and-bs "aaabbab")) Tree Image The visualize function, by default, pops open the tree in a new window. To actually save the tree image as a file for this tutorial, I used both of the optional keyword arguments supported by `insta/visualize`. First, the `:output-file` keyword argument supplies the destination where the image should be saved. Second, the keyword `:options` is used to supply an option map of additional drawing parameters. I lowered it to 63dpi so it wouldn't take up so much screen real estate. So my function call looked like: => (insta/visualize (as-and-bs "aaabbab") :output-file "images/vizexample1.png" :options {:dpi 63}) `insta/visualize` draws the tree using the [rhizome](https://github.com/ztellman/rhizome) library, which in turn uses [graphviz](http://www.graphviz.org). Unfortunately, Java, and by extension Clojure, has a bit of a weakness when it comes to libraries depending on other libraries. If you want to use two libraries that rely on two different versions of a third library, you're in for a headache. In this instance, rhizome is a particularly fast-moving target. As of the time of this writing, rhizome 0.1.8 is the most current version, released just a few weeks after version 0.1.6. If I were to make instaparse depend on rhizome 0.1.8, then in a few weeks when 0.1.9 is released, it will become more difficult to use instaparse in projects which rely on the most recent version of rhizome. For this reason, I've done something a bit unusual: rather than include rhizome directly in instaparse's dependencies, I've set things up so that `insta/visualize` will use whatever version of rhizome *you've* put in your project.clj dependencies (must be version 0.1.8 or greater). On top of that, rhizome assumes that you have graphviz installed on your system. If rhizome is not in your dependencies, or graphviz is not installed, `insta/visualize` will throw an error with a message reminding you of the necessary dependencies. To find the most current version number for rhizome, and for links to graphviz installers, check out the [rhizome github site](https://github.com/ztellman/rhizome). If you don't want to use `insta/visualize`, there is no need to add rhizome to your dependencies and no need to install graphviz. All the other instaparse functions will work just fine. ### Combinators I truly believe that ordinary EBNF notation is the clearest, most concise way to express a context-free grammar. Nevertheless, there may be times when it is useful to build parsers with parser combinators. If you want to use instaparse in this way, you'll need to use the `instaparse.combinators` namespace. If you are not interested in the combinator interface, feel free to skip this section -- the combinators provide no additional power or expressiveness over the string representation. Each construct you've seen from the string specification has a corresponding parser combinator. Most are straightforward, but the last few lines of the table will require some additional explanation.
String syntaxCombinatorMnemonic
EpsilonEpsilonEpsilon
A | B | C(alt A B C)Alternation
A B C(cat A B C)Concatenation
A?(opt A)Optional
A+(plus A)Plus
A*(star A)Star
A / B / C(ord A B C)Ordered Choice
&A(look A)Lookahead
!A(neg A)Negative lookahead
<A>(hide A)Hide
"string"(string "string")String
#"regexp"(regexp "regexp")Regular Expression
A non-terminal(nt :non-terminal)Non-terminal
<S> = ...{:S (hide-tag ...)}Hide tag
When using combinators, instead of building a string, your goal is to build a *grammar map*. So a spec that looks like this: S = ... A = ... B = ... becomes {:S ... combinators describing right-hand-side of S rule ... :A ... combinators describing right-hand-side of A rule ... :B ... combinators describing right-hand-side of B rule ...} You can also build it as a vector: [:S ... combinators describing right-hand-side of S rule ... :A ... combinators describing right-hand-side of A rule ... :B ... combinators describing right-hand-side of B rule ...] The main difference is that if you use the map representation, you'll eventually need to specify the start rule, but if you use the vector, instaparse will assume the first rule is the start rule. Either way, I'm going to refer to the above structure as a *grammar map*. Most of the combinators, if you consult the above table, are pretty obvious. Here are a few additional things to keep in mind, and then a concrete example will follow: 1. Literal strings must be wrapped in a call to the `string` combinator. 2. Regular expressions must be wrapped in a call to the `regexp` combinator. 3. Any reference on the right-hand side of a rule to a non-terminal (i.e., a name of another rule) must be wrapped in a call to the `nt` combinator. 4. Angle brackets on the right-hand side of a rule correspond to the `hide` combinator. 5. Even though the notation for hiding a rule name is to put angle brackets around the name (on the left-hand side), this is implemented by wrapping the `hide-tag` combinator around the entire *right-hand side* of the rule expressed as combinators. Hopefully this will all be clarified with an example. Do you remember the parser that looks for equal numbers of a's followed by b's followed by c's? S = &(A 'c') 'a'+ B A = 'a' A? 'b' = 'b' B? 'c' Well, here's the corresponding grammar map: (use 'instaparse.combinators) (def abc-grammar-map {:S (cat (look (cat (nt :A) (string "c"))) (plus (string "a")) (nt :B)) :A (cat (string "a") (opt (nt :A)) (string "b")) :B (hide-tag (cat (string "b") (opt (nt :B)) (string "c")))}) Once you've built your grammar map, you turn it into an executable parser by calling `insta/parser`. As I mentioned before, if you use map notation, you'll need to specify the start rule. (insta/parser abc-grammar-map :start :S) The result is a parser that is the same as the one built from the string specification. To my eye, the string is dramatically more readable, but if you need or want to use the combinator approach, it's there for you to utilize. #### String to combinator conversion Shortly after I published the first version of instaparse, I received a question, "String specifications can be combined with `clojure.string/join` and combinator grammar maps can be combined with `merge` --- is there any way to mix and match string and combinator grammar representations?" At the time, there wasn't, but now there is. As of version 1.1, there is a new function `ebnf` in the `instaparse.combinators` namespace which *converts* EBNF strings into the same underlying structure that is built by the combinator library, thus allowing for further manipulation by combinators. (EBNF stands for Extended Backus-Naur Form, the technical name for the syntax used by instaparse and described in this tutorial.) For example, (ebnf "'a'* | 'b'+") produces the same structure as if you had typed the combinator version (alt (star (string "a")) (plus (string "b"))) You can also pass entire rules to `ebnf` and you'll get back the corresponding grammar map: (ebnf "A = 'a'*; B = 'b'+") produces {:A (star (string "a")) :B (plus (string "b"))} This opens up the possibility of building a grammar from a mixture of combinators, and strings that have been converted to combinators. Here's a contrived example: (def combo-build-example (insta/parser (merge {:S (alt (nt :A) (nt :B))} (ebnf "A = 'a'*") {:B (ebnf "'b'+")}) :start :S)) ### ABNF Instaparse's primary input format is based on EBNF syntax, but an alternative input format, ABNF, is available. Most users will not need the ABNF input format, but if you need to implement a parser whose specification was written in ABNF syntax, it is very easy to do. Please read [instaparse's ABNF documentation](https://github.com/Engelberg/instaparse/blob/master/docs/ABNF.md) for details. ### String case sensitivity One interesting difference between EBNF and ABNF grammars is that in EBNF, string terminals are case-sensitive whereas in ABNF, all string terminals are case-*in*sensitive. If you like ABNF's case-insensitive approach, but want to use Instaparse's somewhat richer EBNF syntax, there are a couple options available to you. If you want *all* of the string terminals in your Instaparse EBNF grammar to be case-insensitive, the simplest solution is to use the `:string-ci true` keyword argument when calling `insta/parser` to make the strings case-insensitive: => ((insta/parser "S = 'a'+") "AaaAaa") Parse error at line 1, column 1: AaaAaa ^ Expected: "a" => ((insta/parser "S = 'a'+" :string-ci true) "AaaAaa") [:S "a" "a" "a" "a" "a" "a"] On the other hand, if you want to cherry-pick certain string tokens to be case-insensitive, simply convert your string tokens into case-insensitive regexes, for example, replacing the string `'select'` with `#'(?i)select'`. ### Serialization You can serialize an instaparse parser with `print-dup`, and deserialize it with `read`. (You can't use `clojure.edn/read` because edn does not support regular expressions.) Typically, it is more convenient to store and/or transmit the string specification used to generate the parser. The string specification allows the parser to be rebuilt with a different output format; `print-dup` captures the state of the parser after the output format has been "baked in". However, if you have built the parser with the combinators, rather than via a string spec, or if you are storing the parser inside of other Clojure data structures that need to be serialized, then `print-dup` may be your best option. ## Performance notes Some of the parsing libraries out there were written as a learning exercise -- monadic parser combinators, for example, are a great way to develop an appreciation for monads. There's nothing wrong with taking the fruits of a learning exercise and making it available to the public, but there are enough Clojure parser libraries out there that it is getting to be hard to tell the difference between those that are "ready for primetime" and those that aren't. For example, some of the libraries rely heavily on nested continuations, a strategy that is almost certain to cause a stack overflow on moderately large inputs. Others rely heavily on memoization, but never bother to clear the cache between inputs, eventually exhausting all available memory if you use the parser repeatedly. I'm not going to make any precise performance guarantees -- the flexible, general nature of instaparse means that it is possible to write grammars that behave poorly. Nevertheless, I want to convey that performance is something I have taken seriously. I spent countless hours profiling instaparse's behavior on strange grammars and large inputs, using that data to improve performance. Just as one example, I discovered that for a large class of grammars, the biggest bottleneck was Clojure's hashing strategy, so I implemented a wrapper around Clojure's vectors that uses an alternative hashing strategy, successfully reducing running time on many parsers from quadratic to linear. (A shout-out to Christophe Grand who provided me with valuable guidance on this particular improvement.) I've also worked to remove "performance surprises". For example, both left-recursion and right-recursion have sufficiently similar performance that you really don't need to agonize over which one to use -- choose whichever style best fits the problem at hand. If you express your grammar in a natural way, odds are good that you'll find the performance of the generated parser to be satisfactory. An additional performance boost in the form of multithreading is slated for the next release. One performance caveat: instaparse is fairly memory-hungry, relying on extensive caching of intermediate results to keep the computational costs reasonable. This is not unusual -- caching is commonplace in many modern parsers, trading off space for time -- but it's worth bearing in mind. Packrat/PEG parsers and many recursive descent parsers employ a similar memory-intensive strategy, but there are other alternatives out there if that kind of memory usage is unacceptable. As one would expect, instaparse parsers do not hold onto the memory cache once the parse is complete; that memory is made available for garbage collection. The [performance notes document] (https://github.com/Engelberg/instaparse/blob/master/docs/Performance.md) contains a deeper discussion of performance and a few helpful hints for getting the best performance out of your parser. ## Reference All the functionality you've seen in this tutorial is packed into an API of just 9 functions. Here are the doc strings: => (doc insta/parser) ------------------------- instaparse.core/parser ([grammar-specification & {:as options}]) Takes a string specification of a context-free grammar, or a URI for a text file containing such a specification, or a map of parser combinators and returns a parser for that grammar. Optional keyword arguments: :input-format :ebnf or :input-format :abnf :output-format :enlive or :output-format :hiccup :start :keyword (where :keyword is name of starting production rule) :string-ci true (treat all string literals as case insensitive) :auto-whitespace (:standard or :comma) or :auto-whitespace custom-whitespace-parser Clj only: :no-slurp true (disables use of slurp to auto-detect whether input is a URI. When using this option, input must be a grammar string or grammar map. Useful for platforms where slurp is slow or not available.) => (doc insta/parse) ------------------------- instaparse.core/parse ([parser text & {:as options}]) Use parser to parse the text. Returns first parse tree found that completely parses the text. If no parse tree is possible, returns a Failure object. Optional keyword arguments: :start :keyword (where :keyword is name of starting production rule) :partial true (parses that don't consume the whole string are okay) :total true (if parse fails, embed failure node in tree) :unhide <:tags or :content or :all> (for this parse, disable hiding) :optimize :memory (when possible, employ strategy to use less memory) Clj only: :trace true (print diagnostic trace while parsing) => (doc insta/parses) ------------------------- instaparse.core/parses ([parser text & {:as options}]) Use parser to parse the text. Returns lazy seq of all parse trees that completely parse the text. If no parse tree is possible, returns () with a Failure object attached as metadata. Optional keyword arguments: :start :keyword (where :keyword is name of starting production rule) :partial true (parses that don't consume the whole string are okay) :total true (if parse fails, embed failure node in tree) :unhide <:tags or :content or :all> (for this parse, disable hiding) Clj only: :trace true (print diagnostic trace while parsing) => (doc insta/set-default-output-format!) ------------------------- instaparse.core/set-default-output-format! ([type]) Changes the default output format. Input should be :hiccup or :enlive => (doc insta/failure?) ------------------------- instaparse.core/failure? ([result]) Tests whether a parse result is a failure. => (doc insta/get-failure) ------------------------- instaparse.core/get-failure ([result]) Extracts failure object from failed parse result. => (doc insta/transform) ------------------------- instaparse.core/transform ([transform-map parse-tree]) Takes a transform map and a parse tree (or seq of parse-trees). A transform map is a mapping from tags to functions that take a node's contents and return a replacement for the node, i.e., {:node-tag (fn [child1 child2 ...] node-replacement), :another-node-tag (fn [child1 child2 ...] node-replacement)} => (doc insta/span) ------------------------- instaparse.core/span ([tree]) Takes a subtree of the parse tree and returns a [start-index end-index] pair indicating the span of text parsed by this subtree. start-index is inclusive and end-index is exclusive, as is customary with substrings. Returns nil if no span metadata is attached. => (doc insta/add-line-and-column-info-to-metadata) ------------------------- instaparse.core/add-line-and-column-info-to-metadata ([text parse-tree]) Given a string `text` and a `parse-tree` for text, return parse tree with its metadata annotated with line and column info. The info can then be found in the metadata map under the keywords: :instaparse.gll/start-line, :instaparse.gll/start-column, :instaparse.gll/end-line, :instaparse.gll/end-column The start is inclusive, the end is exclusive. Lines and columns are 1-based. => (doc insta/visualize) ------------------------- instaparse.core/visualize ([tree & {output-file :output-file, options :options}]) Creates a graphviz visualization of the parse tree. Optional keyword arguments: :output-file output-file (will save the tree image to output-file) :options options (options passed along to rhizome) Important: This function will only work if you have added rhizome to your dependencies, and installed graphviz on your system. See https://github.com/ztellman/rhizome for more information. ## Experimental Features See the [Experimental Features](docs/ExperimentalFeatures.md) page for a discussion of new features under active development, including memory optimization and automatic handling of whitespace. ## Communication I try to be very responsive to issues posted to the github issues page. But if you have a general question, need some help troubleshooting a grammar, or have something interesting you've done in instaparse that you'd like to share, consider joining the [Instaparse Google Group](https://groups.google.com/d/forum/instaparse) and posting there. ## Special Thanks My interest in this project began while watching a video of Matt Might's [*Parsing with Derivatives*](http://www.youtube.com/watch?v=ZzsK8Am6dKU) talk. That video convinced me that the world would be a better place if building parsers were as easy as working with regular expressions, and that the ability to handle arbitrary, possibly-ambiguous grammars was essential to that goal. Matt Might has published a [paper](http://matt.might.net/papers/might2011derivatives.pdf) about a specific approach to achieving that goal, but I had difficulty getting his *Parsing with Derivatives* technique to work in a performant way. I probably would have given up, but then Danny Yoo released the [Ragg parser generator](http://hashcollision.org/ragg/index.html) for the Racket language. The Ragg library was a huge inspiration -- a model for what I wanted instaparse to become. I asked Danny what technique he used, and he gave me more information about the algorithm he used. However, he told me that if he were to do it again from scratch, he'd probably choose to use a [GLL algorithm](http://ldta.info/2009/ldta2009proceedings.pdf) by Adrian Johnstone and Elizabeth Scott, and he pointed me to a fantastic article about it by Vegard Øye, [posted on Github with source code in Racket](https://github.com/epsil/gll). That article had a link to a [paper](http://www.cs.uwm.edu/%7Edspiewak/papers/generalized-parser-combinators.pdf) and [Scala code](https://github.com/djspiewak/gll-combinators) by Daniel Spiewak, which was also extremely helpful. Alex Engelberg coded the first version of instaparse, proving the capabilities of the GLL algorithm. He encouraged me to take his code and build and document a user-friendly API around it. He continues to be a main contributor on the project, most recently developing the ABNF front-end, bringing the Clojurescript port up to feature parity with the Clojure version, and working out the details of merging the two codebases. I studied a number of other Clojure parser generators to help frame my ideas about what the API should look like. I communicated with Eric Normand ([squarepeg](https://github.com/ericnormand/squarepeg)) and Christophe Grand ([parsley](https://github.com/cgrand/parsley)), both of whom provided useful advice and encouraged me to pursue my vision. YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: [YourKit Java Profiler](http://www.yourkit.com/java/profiler/index.jsp) and [YourKit .NET Profiler](http://www.yourkit.com/.net/profiler/index.jsp). instaparse-1.4.7/docs/000077500000000000000000000000001311220471200146155ustar00rootroot00000000000000instaparse-1.4.7/docs/ABNF.md000066400000000000000000000355661311220471200156640ustar00rootroot00000000000000# ABNF Input Format ABNF is an alternative input format for instaparse grammar specifications. ABNF does not provide any additional expressive power over instaparse's default EBNF-based syntax, so if you are new to instaparse and parsing, you do not need to read this document -- stick with the syntax described in [the tutorial](https://github.com/Engelberg/instaparse/blob/master/README.md). ABNF's main virtue is that it is precisely specified and commonly used in protocol specifications. If you use such protocols, instaparse's ABNF input format is a simple way to turn the ABNF specification into an executable parser. However, unless you are working with such specifications, you do not need the ABNF input format. ## EBNF vs ABNF ### EBNF The most common notation for expressing context-free grammars is [Backus-Naur Form](http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), or BNF for short. BNF, however, is a little too simplistic. People wanted more convenient notation for expressing repetitions, so [EBNF](http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form), or *Extended* Backus-Naur Form was developed. There is a hodge-podge of various syntax extensions that all fall under the umbrella of EBNF. For example, one standard specifies that repetitions should be specified with `{}`, but regular expression operators such as `+`, `*`, and `?` are far more popular. When creating the primary input format for instaparse, I based the syntax off of EBNF. I consulted various standards I found on the internet, and filtered it through my own experience of what I've seen in various textbooks and specs over the years. I included the official repetition operators as well as the ones derived from regular expressions. I also incorporated PEG-like syntax extensions. What I ended up with was a slightly tweaked version of EBNF, making it relatively easy to turn any EBNF-specified grammar into an executable parser. However, with multiple competing standards and actively-used variations, there's no guarantee that an EBNF grammar that you find will perfectly align with instaparse's syntax. You may need to make a few tweaks to get it to work. ### ABNF From what I can tell, the purpose of [ABNF](http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form), or *Augmented* Backus-Naur Form, was to create a grammar syntax that would have a single, well-defined, formal standard, so that all ABNF grammars would look exactly the same. For this reason, ABNF seems to be a more popular grammar syntax in the world of specifications and protocols. For example, if you want to know the formal definition of what constitutes a valid URI, there's an ABNF grammar for that. After instaparse's initial release, I received a couple requests to support ABNF as an alternative input format. Since ABNF is so precisely defined, in theory, any ABNF grammar should work without modification. In practice, I've found that many ABNF specifications have one or two small typos; nevertheless, applying instaparse to ABNF is mostly a trivial copy-paste exercise. I included whatever further extensions and extra instaparse goodies I could safely include, but omitted any extension that would conflict with the ABNF standard and jeopardize the ability to use ABNF grammar specifications without modification. Aside from just wanting to adhere to the ABNF specifcation, I can think of a few niceties that ABNF provides over EBNF: 1. ABNF has a convenient syntax for specifying bounded repetitions, for example, something like "between 3 and 5 repetitions of the letter a". 2. Convenient syntax for expressing characters and ranges of characters. 3. ABNF comes with a "standard library" of a dozen or so common token rules. ## Usage To get a feeling for what ABNF syntax looks like, first check out this [ABNF specification for phone URIs.](https://raw.github.com/Engelberg/instaparse/master/test/instaparse/phone_uri.txt) I copied and pasted it directly from the formal spec -- found one typo which I fixed. (def phone-uri-parser (insta/parser "https://raw.github.com/Engelberg/instaparse/master/test/instaparse/phone_uri.txt" :input-format :abnf)) => (phone-uri-parser "tel:+1-201-555-0123") [:telephone-uri "tel:" [:telephone-subscriber [:global-number [:global-number-digits "+" [:DIGIT "1"] [:phonedigit [:visual-separator "-"]] [:phonedigit [:DIGIT "2"]] [:phonedigit [:DIGIT "0"]] [:phonedigit [:DIGIT "1"]] [:phonedigit [:visual-separator "-"]] [:phonedigit [:DIGIT "5"]] [:phonedigit [:DIGIT "5"]] [:phonedigit [:DIGIT "5"]] [:phonedigit [:visual-separator "-"]] [:phonedigit [:DIGIT "0"]] [:phonedigit [:DIGIT "1"]] [:phonedigit [:DIGIT "2"]] [:phonedigit [:DIGIT "3"]]]]]] The usage, as you can see, is almost identical to the way you build parsers using the `insta/parser` constructor. The only difference is the additional keyword argument `:input-format :abnf`. If you find yourself working with a whole series of ABNF parser specifications, you may find it more convenient to call (insta/set-default-input-format! :abnf) to alter the default input format. Changing the default makes it unnecessary to specify `:input-format :abnf` with each call to the parser constructor. Here is the doc string: => (doc insta/set-default-input-format!) ------------------------- instaparse.core/set-default-input-format! ([type]) Changes the default input format. Input should be :abnf or :ebnf ## ABNF Syntax Guide
CategoryNotationsExampleNotes
Rule= =/S = A=/ is usually used to extend an already-defined rule
Alternation/A / BDespite the use of /, this is unordered choice
ConcatenationwhitespaceA B
Grouping()(A / B) C
Bounded Repetition*3*5 AIn ABNF, repetition precedes the element
Optional*1*1 A
One or more1*1* A
Zero or more**A
String terminal"" '''a' "a"Single-quoted strings are an instaparse extension
Regex terminal#"" #''#'a' #"a"Regexes are an instaparse extension
Character terminal%d %b %x%x30-37
Comment;; comment to the end of the line
Lookahead&&ALookahead is an instaparse extension
Negative lookahead!!ANegative lookahead is an instaparse extension
Some important things to be aware of: + According to the ABNF standard, all strings are *case-insensitive*. + ABNF strings do not support any kind of escape characters. Use ABNF's character notation to specify unusual characters. + In ABNF, there is one repetition operator, `*`, and it *precedes* the thing that it is operating on. So, for example, `3*5` means "between 3 and 5 repetitions". The first number defaults to 0 and the second defaults to infinity, so you can omit one or both numbers to get effects comparable to EBNF's `+`, `*`, and `?`. `4*4` could just be written as `4`. + Use `;` for comments to the end of the line. The ABNF specification has rigid definitions about where comments can be, but in instaparse the rules for comment placement are a bit more flexible and intuitive. + ABNF uses `/` for the ordinary alternative operator with no order implied. + ABNF allows the restatement of a rule name to specify multiple alternatives. The custom is to use `=/` in definitions that are adding alternatives, for example `S = 'a' / 'b'` could be written as:
S = 'a' S =/ 'b' ## Extensions Instaparse extends ABNF by allowing single-quoted strings and both double-quoted and single-quoted regular expressions. The PEG extensions of lookahead `&` and negative lookahead `!` are permitted, but the PEG extension of ordered choice could not be included because of the syntactic conflict with ABNF's usage of `/` for unordered alternatives. Instaparse is somewhat more flexible with whitespace than the ABNF specification dictates, but somewhat less flexible than you might expect from the EBNF input format. For example, in instaparse's EBNF mode, `(A B)C` would be just fine, but ABNF insists on at least one space to indicate concatenation, so you'd have to write `(A B) C`. I relaxed whitespace restrictions when I could do so without radically deviating from the specification. ### Angle brackets The ABNF input format supports instaparse's angle bracket notation, where angle brackets can be used to hide certain parts of the grammar from the resulting tree structure. Including instaparse's angle bracket notation was a bit of a tough decision because technically angle brackets are reserved for special use in ABNF grammars. However, in ABNF notation, angle brackets are meant to be used for prose descriptions of some concept that can't be mechanically specified in the grammar. For example: P = I realized that such constructs can't be mechanically handled anyway, so I might as well co-opt the angle bracket notation, as I did with the EBNF syntax, for the very handy purpose of hiding. This means that when you paste in an ABNF specification, it is always wise to do a quick scan to make sure that no angle brackets were used. They are rarely used, but one [notably strange use of angle brackets](http://w3-org.9356.n7.nabble.com/ipath-empty-ABNF-rule-td192464.html) occurs in the URI specification, which uses `0` to designate the empty string. So be aware of these sorts of possibilities, but you're unlikely to run into them. ## The standard rules The ABNF specification states that the following rules are always available for use in ABNF grammars:
NameExplanation
ALPHAAlphabetic character
BIT0 or 1
CHARASCII character
CR\r
CRLF\r\n
CTLcontrol character
DIGIT0-9
DQUOTE"
HEXDIGHexadecimal digit: 0-9 or A-F
HTAB\t
LF\n
LWSPA specific mixture of whitespace and CRLF (see note below)
OCTET8-bit character
SPthe space character
VCHARvisible character
WSPspace or tab
LWSP is particularly quirky, defined to be either a space or tab character, or an alternating sequence of carriage-return-linefeed and a single space or tab character. It's very specific, presumably relevant to some particular protocol, but not generally useful and I don't recommend using it. ## Combinators The `instaparse.combinators` contains a few combinators that are not documented in the main tutorial, but are listed here because they are only relevant to ABNF grammars.
String syntaxCombinatorFunctionality
"abc" (as used in ABNF)(string-ci "abc")string, case-insensitive
3*5 (as used in ABNF)(rep 3 5 parser)repetition
%d97 (as used in ABNF)(unicode-char 97)unicode code point
%d97-122 (as used in ABNF)(unicode-char 97 122)unicode range
Finally, just as there exists an `ebnf` function in the combinators namespace that turns EBNF fragments into combinator-built data structures, there exists an `abnf` function which does the same for ABNF fragments. This means it is entirely possible to take fragments of EBNF syntax along with fragments of ABNF syntax, and convert all the pieces, merging them into a grammar map along with other pieces built from combinators. I don't expect that many people will need this ability to mix and match, but it's there if you need it. ## Case Sensitivity I've already mentioned that in ABNF syntax, strings are *case-insensitive*, meaning that the string terminal "abc" in an ABNF grammar also matches "aBc", "AbC", etc. Many ABNF grammar specifications leverage this case insensitivity, for example, the spec for hexadecimal digits include the strings "A", "B", "C", "D", "E", and "F", and this is intended to match the lowercase letters as well. A lesser-known quirk of ABNF syntax is that, in theory, non-terminal rule names are also case-insensitive. So for example, in the ABNF rule `S = 'a' s`, the lowercase `s` is actually referring back to the uppercase `S`. Although the specification of ABNF syntax allows for this possibility, as best as I can determine, this "feature" simply isn't used. It would be confusing and bad form to refer to a non-terminal in different places of your grammar with a different mixture of cases. Therefore, by default in instaparse, ABNF non-terminals are in fact, case-sensitive. This makes it easier for ABNF grammars to play nicely with EBNF grammars, grammar maps, and instaparse's transform function, all of which are case-sensitive. If you find yourself working with an ABNF grammar that uses an inconsistent mix of lowercase and uppercase letters to refer to the same non-terminal rules, you have two options available to you. The first possibility, of course, is to simply go through and fix the inconsistencies. The second option is to bind the dynamic variable `instaparse.abnf/*case-insensitive*` to true while building the parser from the ABNF grammar. Under the hood, this works by *converting all non-terminals to uppercase*. This means that in the resulting parse tree, all the rule names will be uppercase, so plan your tree traversals and transformations accordingly. As an example, let's revisit the usage example from above: (def phone-uri-parser (binding [instaparse.abnf/*case-insensitive* true] (insta/parser "https://raw.github.com/Engelberg/instaparse/master/test/instaparse/phone_uri.txt" :input-format :abnf))) => (phone-uri-parser "tel:+1-201-555-0123") [:TELEPHONE-URI "tel:" [:TELEPHONE-SUBSCRIBER [:GLOBAL-NUMBER [:GLOBAL-NUMBER-DIGITS "+" [:DIGIT "1"] [:PHONEDIGIT [:VISUAL-SEPARATOR "-"]] [:PHONEDIGIT [:DIGIT "2"]] [:PHONEDIGIT [:DIGIT "0"]] [:PHONEDIGIT [:DIGIT "1"]] [:PHONEDIGIT [:VISUAL-SEPARATOR "-"]] [:PHONEDIGIT [:DIGIT "5"]] [:PHONEDIGIT [:DIGIT "5"]] [:PHONEDIGIT [:DIGIT "5"]] [:PHONEDIGIT [:VISUAL-SEPARATOR "-"]] [:PHONEDIGIT [:DIGIT "0"]] [:PHONEDIGIT [:DIGIT "1"]] [:PHONEDIGIT [:DIGIT "2"]] [:PHONEDIGIT [:DIGIT "3"]]]]]] The `*case-insensitive*` dynamic variable is also obeyed by the `abnf` combinator. instaparse-1.4.7/docs/ExperimentalFeatures.md000066400000000000000000000334131311220471200212770ustar00rootroot00000000000000# Instaparse Experimental Features This document provides an explanation of some of the things I'm experimenting with in instaparse. Please try the new features and let me know what you think. ## Optimizing memory I've added a new, experimental `:optimize :memory` flag that can conserve memory usage for certain classes of grammars. I discussed the motivation for this in the [Performance document](Performance.md). The idea is to make it more practical to use instaparse in situations where you need to parse files containing a large number of independent chunks. Usage looks like this: (def my-parser (insta/parser my-grammar)) (my-parser text :optimize :memory) It works for grammars where the top-level production is of the form start = chunk+ or start = header chunk+ I don't mean that it literally needs to use the words `start` or `header` or `chunk`. What I mean is that the optimizer looks for top-level productions that finish off with some sort of repeating structure. To be properly optimized, you want to ensure that the `chunk` rule is written with no ambiguity about where a chunk begins and ends. Behind the scenes, here's what the optimization algorithm is doing: After successfully parsing a `chunk`, the parser *forgets* all the backtracking information and continues parsing the remaining text totally fresh looking for the next chunk, with no sense of history about what has come before. As long as it keeps finding one chunk after another, it can get through a very large file with far less memory usage than the standard algorithm. The downside of this approach is that if the parser hits a spot that doesn't match the repeating chunk rule, there's no way for it to know for sure that this is a fatal failure. It is entirely possible that there is some other interpretation of an eariler chunk that would make the whole input parseable. The standard instaparse approach is to backtrack and look for alternative interpretations before declaring a failure. However, without that backtracking history, there's no way to do that. So when you use the `:optimize :memory` flag and your parser hits an error using the "parse one chunk at a time and forget the past" strategy, it *restarts the entire parse process* with the original strategy. I'm not entirely sure this was the right design decision, and would welcome feedback on this point. Here are the tradeoffs: Advantage of the current approach: With this *fall back to the original strategy if the optimizer doesn't work* approach, it should be totally safe to try the optimizer, even if you don't know for sure up front whether the optimizer will work. With the `:optimize :memory` flag, the output will always be exactly the same as if you hadn't used the flag. (A metadata annotation, however, will let you know whether the parse was successfully completed entirely with the optimization strategy.) I like the safety of this approach, and how it is amenable to the attitude of "Let's try this optimization flag out and see if it helps." Disadvantage of the current approach: If you're operating on a block of input text so large that the memory optimization is a *necessity*, then if you have a flaw in your text, you're in trouble -- the parsing restarts with the original strategy and if the flaw is fairly late in your file, you could exhaust your memory. An alternative design would be to say that if you've enabled the `:optimize :memory` flag, and it hits an apparent flaw in the input, then it's immediately reported as a failure, without any attempt to try the more sophisticated strategy and see whether backtracking might help the situation. This would be good for people willing to expend the effort to ensure the grammar conforms to the optimizer's constraints and has no ambiguity in the chunk definition. It would then correct to report a failure right away if encountered by the optimization strategy -- no need to fall back to the original strategy because there's no ambiguity and no alternative interpretation. However, if the flag behaved in this way, then it is possible that if the grammar weren't well-suited for the optimizer, the `:optimize :memory` flag might return a failure in some instances where the regular strategy would return success. In some sense, this would give the programmer maximum control: the programmer can *choose* to rerun the input without the `:optimize :memory` flag or can accept the failure at face value if confident in the grammar's suitability for the optimization strategy. So I'm torn: right now the optimizer falls back to the regular strategy because I like that it is dead simple to use, it's safe to try without a deep understanding of what is going on, and it will always give correct output. But I recognize that having the optimizer simply report the failure gives the programmer greatest control over whether to restart with the regular strategy or not. What do you think is the better design choice? ## Auto Whitespace I have received several requests for instaparse to support the parsing of streams of tokens, rather than just strings. There appear to be two main motivations for this request: 1. For some grammars, explicitly specifying all the places where whitespace can go is a pain. 2. For parsing indentation-sensitive languages, it is useful to have a pre-processing pass that identifies `indent` and `dedent` tokens. I'm still thinking about developing a token-processing version of instaparse. But if I can find a way to address the underlying needs while maintaining the "token-free" simplicity of instaparse, that would be even better. This new experimental "auto whitespace" feature addresses the first issue, simplifying the specification of grammars where you pretty much want to allow optional whitespace between all your tokens. Here's how to use the new feature: First, you want to develop a parser that consumes whitespace. The simplest, most common way to do this would be: (def whitespace (insta/parser "whitespace = #'\\s+'")) Let's test it out: => (whitespace " ") [:whitespace " "] => (whitespace " \t \n \t ") [:whitespace " \t \n \t "] Important: Your whitespace parser should *not* accept the empty string. => (whitespace "") Parse error at line 1, column 1: nil ^ Expected: #"^\s+" (followed by end-of-string) Good, this is what we want. Now, we can define a parser similar to the `words-and-numbers` parser from the tutorial, but this time we'll use the auto-whitespace feature. (def words-and-numbers-auto-whitespace (insta/parser "sentence = token+ = word | number word = #'[a-zA-Z]+' number = #'[0-9]+'" :auto-whitespace whitespace)) Notice the use of the `:auto-whitespace` keyword, and how we call it with the whitespace parser we developed earlier. => (words-and-numbers-auto-whitespace " abc 123 45 de ") [:sentence [:word "abc"] [:number "123"] [:number "45"] [:word "de"]] Behind the scenes, here's what's going on: the whitespace parsing rule(s) are merged into the new parser, and an optional version of the starting production for the whitespace rule is liberally inserted before all tokens and at the end. In this case, that means `` is inserted all over the place. You can see the insertion points by viewing the parser: => words-and-numbers-auto-whitespace sentence = token+ whitespace? whitespace = #"\s+" token = word | number word = whitespace? #"[a-zA-Z]+" number = whitespace? #"[0-9]+" You can also see that the whitespace is in fact getting parsed, and is just being hidden: => (words-and-numbers-auto-whitespace " abc 123 45 de " :unhide :content) [:sentence " " [:word "abc"] " " [:number "123"] " " [:number "45"] " " [:word "de"] " "] Because the whitespace parser rules are merged into the new parser, don't create any rules in your parser with the same names as those in the whitespace parser. If you do, one of the rules will get clobbered and you'll run into problems. (TODO: Report an error if a user tries to do this) Note that it makes no difference whether the `:output-format` of the whitespace parser is :enlive or :hiccup. The rules and the starting production for the whitespace parser are all that matter. Because the :auto-whitespace feature allows you to specify your notion of whitespace, you have the total flexibility to define this however you want. For example, let's say I want to allow not only whitespace, but `(* comments *)` between any tokens. Again, we start by developing a corresponding parser: (def whitespace-or-comments-v1 (insta/parser "ws-or-comment = #'\\s+' | comment comment = '(*' inside-comment* '*)' inside-comment = !( '*)' | '(*' ) #'.' | comment")) Does it eat whitespace? => (whitespace-or-comments-v1 " ") [:ws-or-comment " "] Check. Does it handle a comment? => (whitespace-or-comments-v1 "(* comment *)") Check. Can it handle nested comments? => (whitespace-or-comments-v1 "(* (* comment *) *)") And we mustn't forget -- make sure it *doesn't* parse the empty string: => (whitespace-or-comments-v1 "") However, there's a problem here. The auto-whitespace feature inserts optional `?` versions of the whitespace parser everywhere, *not* repeating versions. It's up to us to make sure that the whitespace parser consumes the *full extent* of any whitespace that could appear between tokens. In other words, if we want to allow multiple comments in a row, we need to spell that out: (def whitespace-or-comments-v2 (insta/parser "ws-or-comments = #'\\s+' | comments comments = comment+ comment = '(*' inside-comment* '*)' inside-comment = !( '*)' | '(*' ) #'.' | comment")) => (whitespace-or-comments-v2 "(* comment1 *)(* (* nested comment *) *)") There's still one more issue, though. Right now, our parser specifies complete empty whitespace, or a series of comments. But if we want to intermingle whitespace and comments, it won't work: => (whitespace-or-comments-v2 " (* comment1 *) (* comment2 *) ") Parse error at line 1, column 1: (* comment1 *) (* comment2 *) ^ Expected one of: #"^\s+" (followed by end-of-string) "(*" I could go through and manually insert optional whitespace, but wouldn't it be deliciously meta to use the auto-whitespace feature with our previous, simple whitespace parser to define our whitespace-or-comments parser? (def whitespace-or-comments (insta/parser "ws-or-comments = #'\\s+' | comments comments = comment+ comment = '(*' inside-comment* '*)' inside-comment = !( '*)' | '(*' ) #'.' | comment" :auto-whitespace whitespace)) Now it works: => (whitespace-or-comments " (* comment1 *) (* comment2 *) ") Just out of curiosity, let's see where the `` got inserted: => whitespace-or-comments ws-or-comments = (whitespace? #"\s+" | comments) whitespace? whitespace = #"\s+" comments = comment+ comment = whitespace? "(*" inside-comment* whitespace? "*)" inside-comment = !(whitespace? "*)" | whitespace? "(*") whitespace? #"." | comment Note that the auto-insertion process inserted `whitespace?` right before the `"*)"`, but this isn't particularly useful, because all whitespace before `*)` would already be eaten by the `inside-comment` rule. If you were inserting the optional whitespace by hand, you'd probably realize it was unnecessary there. However, when you let the system automatically insert it everywhere, some of the insertions might be gratuitous. But that's okay, having the extra optional whitespace inserted there doesn't really hurt us either. Now that we have thoroughly tested our whitespace-or-comments parser, we can use it to enrich our words-and-numbers parser: (def words-and-numbers-auto-whitespace-and-comments (insta/parser "sentence = token+ = word | number word = #'[a-zA-Z]+' number = #'[0-9]+'" :auto-whitespace whitespace-or-comments)) => (words-and-numbers-auto-whitespace-and-comments " abc 123 (* 456 *) (* (* 7*) 89 *) def ") [:sentence [:word "abc"] [:number "123"] [:word "def"]] => words-and-numbers-auto-whitespace-and-comments sentence = token+ ws-or-comments? inside-comment = !(whitespace? "*)" | whitespace? "(*") whitespace? #"." | comment comment = whitespace? "(*" inside-comment* whitespace? "*)" comments = comment+ ws-or-comments = (whitespace? #"\s+" | comments) whitespace? whitespace = #"\s+" token = word | number word = ws-or-comments? #"[a-zA-Z]+" number = ws-or-comments? #"[0-9]+" Note that this feature is only useful in grammars where all the strings and regexes are, conceptually, the "tokens" of your language. Occasionally, you'll see situations where grammars specify tokens through rules that build up the tokens character-by-character, for example: month = ('M'|'m') 'arch' If you try to use the auto-whitespace feature with a grammar like this, it will end up allowing space between the "m" and the "arch", which isn't what you want. The key is to try to express such tokens using a single regular expression: month = #'[Mm]arch' ### Predefined whitespace parsers There's no doubt that the following whitespace rule is by far the most common: whitespace = #"\s+" So for this common case, there's no need to create a separate whitespace parser. You can access this predefined whitespace parser with the option: :auto-whitespace :standard At this time, one other predefined whitespace parser is available, for Clojure-like parsing tasks where the comma is also treated as whitespace. The rule that will be added to your grammar is: whitespace = #"[,\s]+" and you can access it with the option: :auto-whitespace :comma Let me know what you think of the auto-whitespace feature. Is it sufficiently simple and useful to belong in the instaparse library?instaparse-1.4.7/docs/Performance.md000066400000000000000000000415701311220471200174070ustar00rootroot00000000000000# Instaparse Performance Notes In the instaparse tutorial, I make the claim that instaparse is performant without really defining what I mean. I explained that I've spent a lot of time on optimization, without really specifying what I'm tring to optimize. In this document, I'd like to [elaborate on these points](https://github.com/Engelberg/instaparse/blob/master/docs/Performance.md#specific-performance-goals), and talk a bit about how I view [instaparse's role](https://github.com/Engelberg/instaparse/blob/master/docs/Performance.md#the-role-of-instaparse) in the parser ecosystem. Finally, I'll provide [specific tips on how to get good performance from instaparse parsers](https://github.com/Engelberg/instaparse/blob/master/docs/Performance.md#performance-tips). ## A bit of history For decades, parsing has been considered a "solved problem" because there are well-known algorithms that can parse a stream of text blazingly fast, in a single linear pass, using minimal memory. The catch is that these algorithms only apply to certain types of context-free grammars -- these classes of easily-parsed grammars go by names like LL(1) and LALR(1), acronyms describing the parsing technique that applies. The good news is that most context-free grammars can, with some effort, be converted into the kind of format required by parsing algorithms. Furthermore, if you are knowledgable about parsing algorithms and are the one constructing the language / data format to be parsed, you can intentionally constrain the syntax to ensure that it can easily be parsed. If you can do that, great! If there's already a parser written for the kind of data you're working with, even better! However, the programming world is awash with ad hoc config files and data files that don't use an existing standard like XML or JSON. Sometimes you find yourself needing to work with something that's a little too complex to tease apart just with regular expressions, yet hard to justify the time and energy it would take to study up on LL, LALR, etc. and learn how to parse the data within the constraints of tools using those parsing algorithms. ## The role of instaparse That's where instaparse comes in. Instaparse can handle arbitrary context-free grammars written using standard notation, so it's easy to apply it, even for a quickie one-time parsing task. Shortly after the release of instaparse, there were a couple great testimonial blog posts about instaparse. [This blog post by Brandon Harvey](http://walkwithoutrhythm.net/blog/2013/05/16/instaparse-parsing-with-clojure-is-the-new-black/) especially made my day, because it perfectly captured what I had hoped to achieve with instaparse. In his blog post, Brandon describes some cave data that he wanted to parse. Ideally, he wanted to figure out how to get "from a big fat unwieldy string to a nice, regular tree-shaped data structure in 20 minutes or less." The cave data is clearly structured and looks kind of like JSON, but it isn't quite JSON. First, he tried using another Clojure parsing library (a rather excellent library provided you're working with a grammar that fits its constraints), but couldn't figure out how to express his grammar in a way that worked. He got bogged down with a bunch of shift/reduce conflicts and other errors that he didn't know how to interpret without understanding the underlying machinery. Using instaparse, he expressed the grammar in the way that seemed most natural, and it worked. This brings me to a point I'd like to make before discussing performance: *Instaparse aims to be more flexible than traditional parser libraries --- more tolerant of grammars that are ambiguous, require backtracking, or use a mixture of left and right recursion.* To accomplish this, instaparse uses a fundamentally different algorithm than those found in traditional parser libraries, which achieve their speeds and performance guarantees by restricting lookahead and limiting backtracking. ## Specific performance goals With that disclaimer in mind, here are the specifics of what I strive for: + For typical, real-world grammars, I want the running time to be linear with respect to the size of the input. In other words, if you double the size of your text, it should take about twice as long to parse. (Of course, I'm using Clojure data structures, so in practice, the running time is more like O(n * log32 n), but that's pretty close to linear.) + If your grammar is unambiguous and LL(1), the parser should be competitive with parsers generated by tools that *only* accept unambiguous LL(1) grammars (i.e., within some reasonable constant factor). + If you have a reasonable grammar, even one that isn't expressed in "just the right way", it should still have solid performance. + Performance should degrade gracefully as you incorporate more ambiguity and heavy backtracking into the grammar. Roughly speaking, the goal is for instaparse to be performant in the same sense that Clojure is performant. Clojure is not quite as fast as languages like Java or C++ and consumes considerably more memory, but we use it because it offers greater expressivity and flexibility with enough speed to be useful for a wide range of tasks. ## Specific optimizations There were a lot of algorithmic coding decisions that I made by benchmarking multiple alternatives and data structures. I won't go into them all here. My aim in this section is to give you a sense for how I go about optimizing and what sorts of things I focus on. Here is the gist of my optimization process: I take a grammar, try it on increasingly large inputs, and track the running-time growth. If the growth is quadratic (or worse), I profile and investigate to try to track down the offending code and rework it into linear behavior. My goal is to ensure that as many grammars as possible have linear growth. As I mentioned in the tutorial, one of the first things I noticed in my profiling was how critical hashing was. This is a great example of how an algorithm that seems like it *should* be linear can go awry without careful attention to implementation details. We all know that inserting something into a hash map is essentially constant time, so we take that for granted in our analysis. As long as the algorithm only performs O(n) insertions/lookups in the hash table, it should have linear performance, right? Well, if the thing you are inserting into the hash table takes O(n) time to compute the hash, you're in big trouble! So the first big accomplishment of my optimization efforts was to reduce the hashing time to constant for all the information cached by instaparse. Version 1.2 of instaparse sports two new equally significant performance improvements: First, I discovered that on long texts with long repeating sequences, linear-time concatenation of the internal partial tree results was a huge bottleneck, leading to overall quadratic behavior. So in 1.2, I converted over to using a custom data structure with O(1) concatentation. RRB-trees would be another data structure that could potentially solve my concatenation problem, so this is something I intend to look at after the Clojure implementation of RRB-trees matures. The other major performance improvement in 1.2 compensated for an unfortunate change that Oracle made in Java 1.7 to the String class, changing Strings so that the substring operation is O(n) rather than O(1), copying the substring into a freshly allocated string. Instaparse handles regular expressions by testing the regular expression against a substring of the input text that skips past the part of the text already parsed. This strategy, which creates rather large substrings frequently, needed to be modified in light of Java 1.7's poor substring behavior. With these version 1.2 modifications in place, I'm now getting linear-time behavior for all the parsers in my test suite that aren't explicitly designed to demonstrate huge amounts of ambiguity. This is exactly where I want instaparse to be. ## Memory When talking about performance, the other big discussion point is, of course, memory consumption. As I mentioned in the tutorial, instaparse does use a lot of memory. There's really no way around this; it all comes back to my earlier point that instaparse aims to gracefully handle arbitrary levels of ambiguity and backtracking, which means that the entire text needs to reside in memory and lots of intermediate results need to be cached. Instaparse's own syntax for context-free grammars is parsed by an instaparse parser, and is a great example of the practical value of backtracking. Consider the following grammar. The actual semantics of the grammar is not important here, just think about the syntax of the grammar specification and consider how instaparse's `parser` function needs to parse the grammar string as a series of rules: (insta/parser "A = B B B = 'b'") You might expect instaparse to impose a requirement that each line of the grammar be clearly terminated by an end-of-line character, such as `;` or a newline, but in fact, instaparse's CFG parser has no problem if you write out the grammar all mushed together on one line: (insta/parser "A = B B B = 'b'") Working from left-to-right, when it processes the third `B`, it is entirely possible that what it has seen so far should be interpreted as the rule: A = B B B But when it encounters the `=`, it realizes that the only sensible interpretation is for the third `B` to be the beginning of a new rule, and instaparse sorts it all out. Taken to an extreme, consider the parser defined by the following grammar: S = 'ab'+ | 'a' 'ba'+ If you use this parser to parse a long string of "abababab...aba", there's no way to determine when looking at the first 'a' which way to interpret it. The parser can try one path, perhaps assuming that it is part of the `'ab'+` rule, but it won't know until it gets to the very end of the string that it has chosen incorrectly, and has to back up and try another path. Looking at this example, it should be clear that there's no way to parse the input string in a single linear pass with bounded memory. For this reason, I haven't put as much effort into optimizing memory usage -- a lot of data needs to be retained throughout the parsing process, and there simply is less scope for improvement, I think. Certainly Java 1.7's substring behavior was causing massive memory churn, so the changes I made in instaparse 1.2 will also benefit the memory side of the performance equation. But other than that, I haven't found any big wins for optimizing memory consumption. In theory, I can imagine that there might be a way to intelligently figure out which cached data can be safely discarded, but in the context of left-recursion this is an extremely hard problem to solve. Chalk this up as a future research problem, but one that is not likely to bear fruit in the short-term. I have made one step in this direction which I will detail further in the section below about performance tips. ## Performance Tips Occasionally, I receive a question about whether there's a *best* way to write instaparse grammars for maximum performance. I've tried very hard to make it so that instaparse's performance isn't ultra-sensitive to the exact way you word the grammar. My hope is that most people will find these performance tips to be completely unnecessary. However, for those that are interested, here are some recommendations: 1. Instaparse's algorithm is in the family of LL parsing algorithms. So if you know how to easily write your grammar as an LL grammar, that's probably going to yield the best possible performance. If not, don't worry about it. 2. If your token is a string, use a string literal, not a regular epxression. For example, prefer `'apple'` to `#'apple'`. 3. When the greedy behavior of regular expressions is what you want, prefer using `*` nd `+` *inside* the regular expression rather than outside. This comes up very commonly in processing whitespace. In most applications, once you hit whitespace, you want to eat up all the whitespace to get to the next token. So you'll get better performance with `#'\\s*'` than with `#'\\s'*`. In my parsers, I routinely have a rule for optional whitespace that looks like `ows = #'\\s*'` and then I sprinkle `` liberally in my other rules wherever I want to potentially allow whitespace. 4. Related to the previous point, prefer using regular expressions to define tokens in their entirety rather than using instaparse to build up the tokens by analyzing the string character by character. For example, if an identifer in your language is a letter followed by a series of letters or digits, you'll be better off with the rule Identifier = #'[a-zA-Z][a-zA-Z0-9]*' rather than Identifer = Letter Digit* Letter = #'[a-zA-Z]' Digit = #'[a-zA-Z0-9]' 5. Remove as much ambiguity from your grammar as you can. Instaparse works with ambiguous grammars, but dealing with that ambiguity can take a toll on performance. Use the `insta/parses` function on a variety of sample inputs in order to troubleshoot your grammar and discover ways in which your inputs might have multiple interpretations. 6. Even if `insta/parses` returns a single answer, think about whether you've created a lot of *internal ambiguity*, i.e., situations where the parser won't be able to work out the interpretation of the text until it has gotten much further along. One way to analyze this is to test the various rules in your grammar using `insta/parses` with the `:partial true` flag to get a feel for how many scenarios it has to consider before it can be sure it has found the whole chunk of text defined by that rule. 7. Watch out for ambiguity in your hidden content. One time I was working with a grammar that I was convinced was unambiguous -- `insta/parses` always returned a single answer. However, it turned out that the definition of whitespace was highly ambiguous. I didn't realize it because the whitespace was hidden. To help diagnose these sorts of problems, try running `insta/parses` with the `:unhide :all` flag. 8. Prefer Java 1.7. I've received one report where instaparse, running on Java 1.6, was running out of memory on a large input, whereas the exact same grammar on the same input ran perfectly fine on Java 1.7. 9. Prefer using * and + over recursion to describe simple repetition. For example, the rule:
= 'a'+ can be internally optimized in ways that = 'a' A | 'a' cannot. 10. Feed instaparse smaller chunks of text. The reality is that most large parsing tasks involve a series of individual data records that could potentially be parsed independently of one another. As has been discussed earlier in this document, if you feed instaparse the entire block of text, instaparse has to assume the worst -- that it might encounter some sort of failure that causes it to go back and reintrepret all the text it has processed so far. Consider preprocessing the text, chopping it into strings representing the individual data records, and pass the smaller strings into instaparse in order to limit the scope of what possibilities it needs to consider and how much history it needs to track. For example, I saw one grammar where each line of text represented a new record, and the grammar looked like: document = line+ line = ... Instead of applying this grammar to the entire document at once, why not build a parser where `line` is the top-level starting rule, and then map this parser over a `line-seq` of the text? I've added a new, experimental `:optimize :memory` flag that attempts to automate this kind of preprocessing, chopping the text into smaller independent chunks in order to use less memory. This only works on grammars that describe these sorts of repeated data records (possibly with a header at the beginning of the file). If instaparse can't find the pattern or runs into any sort of failure, it will fall back to its usual parsing strategy in order to make sure it has considered all possibilities. Using this flag will likely slow down your parser, but if your data lends itself to this alternative strategy, you'll use much less memory. I consider the `:optimize :memory` flag to be an *alpha* feature, subject to change. If you try it and find it useful, or try it on something where you'd expect it to help and it doesn't, please send me your feedback. 11. As of version 1.2, the enlive output format is slightly faster than hiccup. This may change in the future, so I don't recommend that you base your choice of output format on this slight differential. However, if you're trying to eke out the best possible performance, you might find it useful to experiment with both output formats to see whether one performs better for you than the other. 12. As of version 1.4, instaparse has a way to print a trace of the parser's execution process, as well as some profiling information which can be useful to detmerine whether your parser behaves linearly with respect to the size of the input. [Read about the new tracing feature here.](https://github.com/Engelberg/instaparse/blob/master/docs/Tracing.md) instaparse-1.4.7/docs/Tracing.md000066400000000000000000000214341311220471200165320ustar00rootroot00000000000000# Tracing Instaparse 1.4.0 and up (in Clojure only) features the ability to look at a trace of what the parser is doing. As an example, let's take a look at the as-and-bs parser from the tutorial. ``` => (as-and-bs "aaabb") [:S [:AB [:A "a" "a" "a"] [:B "b" "b"]]] ``` Now let's look at a trace. We do this by calling the parser with the optional keyword argument `:trace true`. `insta/parse` and `insta/parses` both can take this optional argument. ``` => (as-and-bs "aaabb" :trace true) ``` One of my design goals for the tracing feature was that if you don't use it, you shouldn't pay a performance penalty. So by default, the parsing code is not instrumented for tracing. The very first time you call a parser with `:trace true`, you may notice a slight pause as instaparse recompiles itself to support tracing. The trace the prints to standard out, and looks like this: ``` Initiating full parse: S at index 0 (aaabb) Initiating full parse: AB* at index 0 (aaabb) Initiating parse: AB at index 0 (aaabb) Initiating parse: A B at index 0 (aaabb) Initiating parse: A at index 0 (aaabb) Initiating parse: "a"+ at index 0 (aaabb) Initiating parse: "a" at index 0 (aaabb) Result for "a" at index 0 (aaabb) => "a" Result for "a"+ at index 0 (aaabb) => ("a") Result for A at index 0 (aaabb) => [:A "a"] Initiating parse: B at index 1 (aabb) Initiating parse: "b"+ at index 1 (aabb) Initiating parse: "b" at index 1 (aabb) No result for "b" at index 1 (aabb) Initiating parse: "a" at index 1 (aabb) Result for "a" at index 1 (aabb) => "a" Result for "a"+ at index 0 (aaabb) => ("a" "a") Result for A at index 0 (aaabb) => [:A "a" "a"] Initiating parse: B at index 2 (abb) Initiating parse: "b"+ at index 2 (abb) Initiating parse: "b" at index 2 (abb) No result for "b" at index 2 (abb) Initiating parse: "a" at index 2 (abb) Result for "a" at index 2 (abb) => "a" Result for "a"+ at index 0 (aaabb) => ("a" "a" "a") Result for A at index 0 (aaabb) => [:A "a" "a" "a"] Initiating parse: B at index 3 (bb) Initiating parse: "b"+ at index 3 (bb) Initiating parse: "b" at index 3 (bb) Result for "b" at index 3 (bb) => "b" Result for "b"+ at index 3 (bb) => ("b") Result for B at index 3 (bb) => [:B "b"] Result for A B at index 0 (aaabb) => ([:A "a" "a" "a"] [:B "b"]) Result for AB at index 0 (aaabb) => [:AB [:A "a" "a" "a"] [:B "b"]] Initiating parse: AB at index 4 (b) Initiating parse: A B at index 4 (b) Initiating parse: A at index 4 (b) Initiating parse: "a"+ at index 4 (b) Initiating parse: "a" at index 4 (b) No result for "a" at index 4 (b) Initiating parse: "b" at index 4 (b) Result for "b" at index 4 (b) => "b" Result for "b"+ at index 3 (bb) => ("b" "b") Result for B at index 3 (bb) => [:B "b" "b"] Result for A B at index 0 (aaabb) => ([:A "a" "a" "a"] [:B "b" "b"]) Result for AB at index 0 (aaabb) => [:AB [:A "a" "a" "a"] [:B "b" "b"]] Result for AB* at index 0 (aaabb) => ([:AB [:A "a" "a" "a"] [:B "b" "b"]]) Result for S at index 0 (aaabb) => [:S [:AB [:A "a" "a" "a"] [:B "b" "b"]]] Successful parse. Profile: {:push-message 21, :push-result 21, :push-listener 24, :push-stack 26, :push-full-listener 2, :create-node 26} [:S [:AB [:A "a" "a" "a"] [:B "b" "b"]]] ``` Let me explain what some of these lines mean. ``` Initiating full parse: S at index 0 (aaabb) ``` A "full parse" means that it only succeeds if it consumes the entire string. Usually, we're looking to completely parse an entire string, and that's what "full parse" reflects. It is important to understand that the word "initiating" does not necessarily mean that it is starting to work on that parse sub-problem right away. It just means that we're putting it on a stack of sub-problems to try to solve. Notice the `(aaabb)` in parens. This is giving us the next several characters from this point in the string, which makes it a little easier to see at a glance where we are in the string (although, of course the index number can always be used to figure it out precisely). ``` Initiating full parse: AB* at index 0 (aaabb) Initiating parse: AB at index 0 (aaabb) ``` Note that AB* needs to be a full parse to be satisfied, but that kicks off another subproblem, which is to look for a parse of AB (not necessarily a full parse) at index 0. ``` Initiating parse: A at index 0 (aaabb) Initiating parse: "a"+ at index 0 (aaabb) Initiating parse: "a" at index 0 (aaabb) Result for "a" at index 0 (aaabb) => "a" Result for "a"+ at index 0 (aaabb) => ("a") Result for A at index 0 (aaabb) => [:A "a"] ``` Note that after initiating a bunch of parse subtasks, we start to see some results. Again, the content in the parentheses is a look ahead at the next several characters in the string, just to get our bearings. The information after the `=>` is the parse result that was found. Typically, the parse results are found in reverse order from the order in which the subtasks are initiated, because when initiated, the subtasks are put on a stack. ``` No result for "b" at index 1 (aabb) ``` The tracing mechanism reports when tokens (i.e., strings or regular expressions) are sought but not found. In general, the tracing mechanism does not report when subtasks involving non-terminals fail (because internally, instaparse does not transmit failure messages between subtasks). ``` Result for S at index 0 (aaabb) => [:S [:AB [:A "a" "a" "a"] [:B "b" "b"]]] Successful parse. ``` At the end, we see the final parse, followed by some profiling data: ``` Profile: {:push-message 21, :push-result 21, :push-listener 24, :push-stack 26, :push-full-listener 2, :create-node 26} ``` The details of the profiling data don't matter that much, other than to know that it's a measure of how much work instaparse had to do to come up with the result. Repeating the trace with an input of `"aaaaaabbbb"` we get the profiling results: ``` Profile: {:push-message 40, :push-result 40, :push-listener 48, :push-stack 50, :push-full-listener 2, :create-node 50} ``` The key here is that we doubled the length of the input string, and this doubled-the amount of work that instaparse needed to do. That's good, it means that this parser behaves linearly with respect to its input size. Even though the code is instrumented with tracing functionality, you still need to explicitly request the trace each time. If you don't request the trace, it won't display: ``` => (as-and-bs "aaabb") [:S [:AB [:A "a" "a" "a"] [:B "b" "b"]]] ``` Now let's look at an example with negative lookahead. Here is the parser: ``` => negative-lookahead-example S = !"ab" ("a" | "b")+ => (negative-lookahead-example "aabb") [:S "a" "a" "b" "b"] ``` Let's run it with the trace: ``` => (negative-lookahead-example "aabb" :trace true) Initiating full parse: S at index 0 (aabb) Initiating full parse: !"ab" ("a" | "b")+ at index 0 (aabb) Initiating parse: !"ab" at index 0 (aabb) Initiating parse: "ab" at index 0 (aabb) No result for "ab" at index 0 (aabb) Exhausted results for "ab" at index 0 (aabb) Negation satisfied: !"ab" at index 0 (aabb) Initiating full parse: ("a" | "b")+ at index 0 (aabb) Initiating parse: "a" | "b" at index 0 (aabb) Initiating parse: "b" at index 0 (aabb) No result for "b" at index 0 (aabb) Initiating parse: "a" at index 0 (aabb) Result for "a" at index 0 (aabb) => "a" Result for "a" | "b" at index 0 (aabb) => "a" Initiating parse: "a" | "b" at index 1 (abb) Initiating parse: "b" at index 1 (abb) No result for "b" at index 1 (abb) Initiating parse: "a" at index 1 (abb) Result for "a" at index 1 (abb) => "a" Result for "a" | "b" at index 1 (abb) => "a" Initiating parse: "a" | "b" at index 2 (bb) Initiating parse: "b" at index 2 (bb) Result for "b" at index 2 (bb) => "b" Result for "a" | "b" at index 2 (bb) => "b" Initiating parse: "a" | "b" at index 3 (b) Initiating parse: "b" at index 3 (b) Result for "b" at index 3 (b) => "b" Result for "a" | "b" at index 3 (b) => "b" Result for ("a" | "b")+ at index 0 (aabb) => ("a" "a" "b" "b") Result for !"ab" ("a" | "b")+ at index 0 (aabb) => ("a" "a" "b" "b") Result for S at index 0 (aabb) => [:S "a" "a" "b" "b"] Successful parse. Profile: {:push-message 12, :push-result 12, :push-listener 14, :push-stack 17, :push-full-listener 3, :create-node 17} [:S "a" "a" "b" "b"] ``` The interesting thing with negative lookahead (or ordered choice) is the following lines: ``` Initiating parse: !"ab" at index 0 (aabb) Initiating parse: "ab" at index 0 (aabb) No result for "ab" at index 0 (aabb) Exhausted results for "ab" at index 0 (aabb) Negation satisfied: !"ab" at index 0 (aabb) ``` To do negative lookahead, the parser sets up a subtask to try to parse the very thing we want to avoid. If the parser runs out of work to do, then the trace tells us that the negation was in fact satisfied. When you are done tracing, you probably will want to recompile the code without all the tracing and profiling instrumentation. You can either restart the REPL or just type: ``` => (insta/disable-tracing!) nil ``` instaparse-1.4.7/images/000077500000000000000000000000001311220471200151325ustar00rootroot00000000000000instaparse-1.4.7/images/vizexample1.png000066400000000000000000000412421311220471200201100ustar00rootroot00000000000000‰PNG  IHDRÉ]šý8BiIDATxÚí”TUº¶qF½Î\èw93Þ;J’¤‚(’ƒdI*‰"QD”,ˆdÉ ‚€$QDÉYÁ¨ (ADÇëþï³gÊ¿i:TuXõ>kÕêê†î:{ïsö·¿œÍ!„"M²i „B I!„BBRá>?þø£Ù´i“yã7ÌèÑ£MïÞ½M»víLóæÍMƒ LíÚµí‹÷üŒëÓ§3fŒý~÷ìÙ³šH!!)„7?ÿü³Y³f8p |yòä1ùóç7õêÕ3]»v5C† 1Ó¦M3+V¬0«V­²póæÍöÅ{~Æ¿ñlºtéb7_¾|&oÞ¼öo4Ȭ]»ÖüòË/šp!!)„6GŽ1ãÇ7UªT17Ýt“iܸ±ý~Û¶mVh:)€·nÝjÆg5jd?«jÕªö³Ž=ª…’Bˆ`€ùsêÔ©¦T©R¦hÑ¢æÙgŸ5;vìðü:øÌ~ýú™bÅŠ™Ò¥K›W^yE¦Y!!)„ð‡Ã‡›GyÄäÈ‘Ã~ݳgO`®m÷îÝ֤˵uëÖMÚ¥Bx§}ûöÖ/8vìXsîܹÀ^+×FÐþÐ:HX I!„;à:t¨É;·™8qb¨‚e¸ö—^zÉ^ûóÏ?¯@!!)„pon»í6aúý÷߇v\{çÎMáÂ…ÍöíÛµ°BBR‘u~ûí73lØ0S @³~ýú„ׇ~hÇ4bÄ;F!$$…1‰²iÓ¦¦I“&æÔ©S 7¾“'Oš† š-ZÈü*$$…± *˜¾}û&´¦ÅØžzê)s÷ÝwÛj@BHH !2„ˆÐŠ+šI“&%͘'L˜`¥“„„¤"5+ʽ½ð ¾_ËìÙ³Í=÷Üc²eËfµ½}ûö¹úy”É»ï¾û䣒BˆôÉ÷~CšÆ¢E‹~ÿþ›o¾±)nC‚áÇëF’BˆóÙ¸q£)S¦L ‚XÐýskÉ’%Í–-[tC I!Ä¿ÀÄX¶lYóé§ŸB F4É~øÁóë9pà€)_¾¼Ì®BBRñ/fÍšezöì(­1â“äÅ{/éÞ½»™;w®n !!)„´Èßl5 WÒA«D»ôŠ'N˜Ûo¿]7‡"ÙY½zµiÞ¼yð7ý”ôÂüàƒtƒ I!’™V­Z™·ß~;PÂHV"Z#¼óÎ;6 ÄKÞzë-Ó¶m[Ý BBRˆd&þü¾V›IKH’‰Päßx¥š^ÀœPßU I!’”¯¿þÚÜqÇšˆt(V¬˜9~ü¸&BHH ‘ŒY¿~}MD:Ô­[×lÚ´I!$$…HF–/_.¿[´iÓÆ¬X±B!$$…HF/^l:uêäü&ño_¢g›’KŸÇÜ0GBHH ‘„Ð|øpEh¥ùŽ–²6k40å‹Ñ~fø n!!)„¸Ó§O›¢E‹¦™¬Ÿè0æ"EŠøÚ6LHH !Η_~i#;ÉLð,XÐŽ] I!D†|òÉ'&_¾|¶KH¢³téR;VÆ,„„¤"j²páÂ6"Q=z´5±>|X .$$…±A:D:ulcæ'N$̸¾ûî;[ޝ^½zŽ¥½!!)D’2a“'O3þüÐ…10–‰'ja…„¤Â8`k§V®\ÙìÞ½;t׿k×.S©R%S»vm;!$$…ŽC K¡B…l³æ;vþz¹Fºyà_]¶l™PHH !Üå·ß~3 ,°åÛªU«f^ýuG ¤Ç ×òÚk¯™ªU«Úr{ .´×,„„¤ÂSÖ¯_oZ·nm›7wéÒżû7qŽ>“Ïîܹ³É™3§iÓ¦Ù°aƒHHH !üçÌ™3¶æi£F¬â+é#øÝÐâø›;wî4cÇŽ5÷ß¿É;·iܸ±íɵ!!)„$5jÔ0Ç7ƒ ²ïsäÈaJ”(aµ;~Ž©vݺu¶õUFÝ6èNÂÿáÿò;ü.㮻˜¿ÍgPouèСšx!!)„6Ó§O·Q°©9räˆy뭷̈#L×®]MÆ M™2eLÞ¼y­À»á†ì×”ïù7þÿ—ßáwW®\iŽ=zÞß&Ÿ“´E­ I!D`ùꫯ¬oè5‹/6*TP€ŽB*óL™2Å·ÏGãT‘!!)„Ìâ'ÇŽ³<~h²BHH !ÒäøñãÖÌJß̘1ÃÜ{ï½Z!!)„Tß?~|`®§zõê6E I!„¯P0¼bÅŠ ˜¡Íš-Ý>„Bøí³ð³wïÞÐ]{=L¯^½´ˆBBRá:t0ƒåµÓ’ë–[n1Û¶mÓB I!„³¬^½Ú-Z4TfÖÔ|ðÁ¡ƒBŒ3gΘüùó›;v„~,?ü°2dˆUHH !œ¡K—.æ™gžIˆ±œ:uÊ6hþä“O´°BBR˜(o¿ývóË/¿$̘–,YbÊ•+§ÍBBR‘uv!u‚ŠD£I“&ê\"$$…!ãñÇ7½{÷NȱÑ“ªAt BBR6l0 LèüÙ³g›{î¹G‹-$$…у`D@"(„$ÂR I!DTôìÙÓšZ“Ì­˜]1¿ !!)„8¢VgÍšeþ÷ÿ×~¿eË[™† dy"lß¾ÝìܹS7‡"ÙùðÃM¶lÙ¬`D0Ð1cݺuI5¤‚”-[Ö¼ñƦk×®v> .¬›CHH ‘ìôë×Ïüñ4]t‘¹òÊ+M¥J•l{©d¿äUW]eþüç?[!yÍ5×èæ’B$;Ô2E(D^—_~¹íô‘,æF:›4kÖÌ È”ópíµ×šè’B$+ø#ÑS ^_|±¹ì²Ë’bðǦ?¯?ýéOfÒ¤IºI„„¤ÉÊÚµk­Y1¥p@hV¯^ÝjXɾHæ“sʹ¨U«–n!!)D²òôÓO[dD(\}õÕfذaI9˜VsçÎm5èÈ|\wÝuºI„„¤É ‘¬ƒK.¹Äüýï77nLêù8{ö¬©_¿þï&h´K•­’B”o¿ýÖ¼ÿþûfܸq¦{÷î¦iÓ¦¦B… ¦@&gΜæ†n°_yeÏžÝÜxãö}ñâÅMíÚµmïÄçž{μöÚkfß¾}çE­â$’óŠ+®0¥K—6ß}÷&üߌ=Úò\zé¥fæÌ™¿ÿœ\ÒO?ýÔ,\¸Ð 4ÈtêÔÉÔ­[×”(QÂj¡9rä8oM"ï „"ŤqãÆæ‘G1£F2ï¼óŽùúë¯5Ù’Bˆh W¦Æl ÷ß¿ÝtÑôŒlÈóæÍ³m«0 ¢ñdôw¾úê+³mÛ6Û꥗^2ݺu3•+W6¹rå2wÝu—­¦3pà@«-Ñ'R-£.­šù¹þúëMŸ>}¬CrHA8Ž;Ö,^¼ØlÚ´É9rÄüúë¯éþ-Êü:tÈú€_ýuóüóÏ›–-[š;ï¼Ó® šáÇ›>ú(ÿ#$$…H*ÐæV®\i:tè`òæÍkjÖ¬i†jÖ¯_ïZ®âÑ£GÍܹsM›6m¬–s÷Ýw[ÍIfÅAyº—_~ÙÖtå rï½÷šiÓ¦Y!çh§´"9r¤5õò™ÐE‹ex’B$,hƒ=zô°5CÉÑ›?¾ùñÇ}¹–={öX­²P¡B¦jÕªVÓI¤ËÑ ª+VX!•?~Û Ñ~þùg³|ùróÐCYÙ¹sgka’B$º~´nÝÚ ï^xÁ7Áí¥&?yòd+xàë' ’éMrúôéÖgŒ™wÕªUzˆ$$…HÿüóÀ_3þÌÈ=”lµu%$…HPÐ}ôQ[<|Ù²e¡»~"^;vìh{K¾ûî» ±&t9AÐizøðáÐ]?é øiÑ¢… Ð’B„R7ÐTlýLa†HÙ’%KÚ”’°¶ÐB&’ßë{ï½êõÀ‡JÉ<Ìâ ,ÐÃ&!)DxÀ”Gà~G̬‰© t AÈì&ˆÜEÈ“º‘H}2¿øâ S®\9k2ûALBRˆ$€ ¸Aƒ¦mÛ¶67.Á7††ó+ ” Ì!¥"A«äPF`ω'ôJH Lˆ-S¦ŒMüOt¨:së­·ÞÔ1y“‡˜èL:Õ6Š&èJHH (0u‘”?qâĤ3I÷˜”I›"Æ›o¾9‘«NA7‚z‚–Z$!)D’Cé¸!C†øöù”9ãuÞƒšª?"åé¾ùæG?_‘¯AkÐÌuQßÖ/ß©_ë3fÌ0•*URy; I!‚Q†ø }}(ÿ½ñ¦þYJH Š‹Ó ˆÐ(ƒRxáP¾|y_#Xý\èÛ·¯­7+$$…𕃚bÅŠù* èîÑ\xŸÞ¦œÞÏœ€Ä|",ƒÀ€lªG2¯Ìÿb!!)„oPëÓ‹Œm¦Dm²âLÁ™–æ’Úè$g÷»Æ(ÅÛIõðÂÔ˜Þše=ðÅH¦/’Bø‘“'÷ä¡Ë@H>õÔSÖ·Å‹÷)'õËÍ€ŽHé=?¡{†WDé­IPÖ(¢Ÿ²¦Â3jÔ¨ˆ€Z:¥õÞkÍ(ÞîWº%Úh6­õøÿR´hQ=¬’Bx 5?K•*åûuP..µvÂÏÒÓtÜòE É3¥ëü€v_tVÑzœÅ-hæ,$$…ðŒ_|Ñ 6Ì»‡.ÍtöìÙçg𞟥§¹¸Mê/tñÃvûí·Û‚ì~®IÐÖ^}õUÓ­[7=´’Bx‡×§óô„$›lÊ\;ÞG6ÞÔ ¦=7òòRCãæ½{÷zº§N²€<ÝÓX“ ®Ç×_m  I!<ƒî ‰ÖcÑ)ÈÏ£Dš—ÐŒ˜ÖQ"mòäÉcL I!<áÆoÔ$¤Ãøñã­ÐK0iÉ)ÒMRHH á:gΜ±5AEÚ¼öÚk¶Ñ´—Œ9ÒSqØ ‡5‘Ú¶IH `h‡Eg Ǭ4J™¹ú »ôyhuO>ù¤§k2aÂW´W/×ÄÍϪ\¹²ùøãõðJH á Ù³gwe“L ª¶D6ÐXz"¦•¼ígƃב¿íÕÎôæ,³yv=RæQºµ@m]º¶ I!ùÄ“5Él^3ƒß‰p[HmLg!!)„g 0¹RaÆ+!Ñ ³²‘¢±Ðä7µOÓéMyâĉ¦ÿþ¾¬É´iÓLïÞ½]’ÑÌkV´H·„déҥϳF I!<¡]»vfùòåÕ$£ù §7eL­GŽñe=Ξ=kµX'‹<Ä¢µG µ\Ó3™;½»ví2Õ«W×Ã*!)„÷|öÙg¦|ùòŽD8í“ôZH¾ù曦cÇŽ¾®ÉàÁƒ­y1ÈBmzšÓB’~§*n.!)„o8}út×6äx¢[1FjƒbÚÃ–ÚæÔ¦üóÏ?Ûª.´fòrXï¸ãsâÄ ×Ö$šyM}ûöexÈqRH®^½Ú4nÜX©„¤þqòäIsÛm·™C‡¹¦µd5O³ZK$Ð$­ßujSFƒ3fL Ö„qÖ­[× ?­ù‰f^Óiå¦äÀ}ùÅ_è!•Â_ÐìŽ@›ŠwCcÅùJNÐ1bD`æÈËÏúõ×_M­ZµÌo¼¡‡SBRˆ`€ÅÆÄ•LÊP¸paÇÌ›NqîÜ9S²dI3wîܤ»Û·oïyY@!!)D¦à/lÒ¤IÒ´ÐZ¿~½-ôþå—_òúðRŽ€¢dMþ±Ç3-[¶Tu I!‚ 'øjÕª9^².hÌŸ?ß(PÀñä}§ùꫯ¬ ¤z"ƒ©¿yóæ¦Y³fIgÍ"dŒ;ÖnÌû÷ïO¸±Q<ê-äC"€ÂÀ©S§lY¶GyÄša J–+WÎ6»–)!)D(ÀWG7ø—_~9aÆtøðaS±bEÓºukóã?†êÚ1÷ìÙÓ-ZÔìÝ»7aÖ„ÈZjÖz]P^HH 7ß~û­©S§Ž©Zµj¨µJ̨Q£Lîܹm$k˜!w0_¾|æ™gž±z¬=6lØÐ”-[V%ç$$…7sæÌ19rä0Ý»w·‚3LPv|;"&¥!þbŠÓ#,‰~u²þ®×Nm\´ÇW^yEæU I!ÂÍš5kLÞ¼y͸qãÌСCM®\¹lbýy… Ú :¤µlݺ5!׆ #¢‘)TOÕ$Úm•ï¿ÿÞ<ûì³ö°…ï‘"BBRˆÐB„aß¾}M¡B…ÌŽ;~ÿ9&¾‘#GZ-†&Á+V¬Œ&Car‚rˆZ}à2¬ “H8pÀöE;ÃoÉ÷AaíÚµ¦E‹öp…6_¼xqóÝwßé“"¼à#*Uª”ÝÔÒó{a&{ûí·­0"À‡ 1|ïuŽ%eËð7S°`A«íFê’&äU¢ñ£A—(QÂK÷Ú—Ì ÁˆižÃJÍš5­VIëÀÄŠæ{ðàA=h’B„yóæÙS,%Áˆ%ï7‚cȳÄïôÞ{ï9ZɆ–.õ˜Û¶mkÛJ¡™ 80¡">€/”µ#µ­ŸÄI“&Y«€“fYÒS(kHM×ÚµkÛõÇÂ0sæLkbM‹•+WÚƒ•º{HH Μ9cZµje*T¨`S%âaÏž=¶ÓB“’oÝJ•*YÍ”¨L/^¼Øn–T½Ù´i“õò•èM~Î&;|øpÓ­[7Ið §råʶ11f^6h‘9X˜×~ýúYÍÅaÖµkW«}sð`NW­Ze×aË–-ö+‚ŒõX²d‰™2eŠõ+b5à „y—¿Ãú`‚Ǽ­é}ûöíö:’¥š„¤!fóæÍv³ÃŸç† ³ßÒ¥KÏÛhñ£±Q³qãûä}Ó¦M­–Ø£Góâ‹/Úü¹uëÖ¥«•ˆ¬A”)pÁ‚Ö\ÍÁƒyGãdŠ+f6¼çðÔ¡C{ÀAE°Ñ 9^S9ëžèÕ„$$…)ø1“á;Ú°aƒo× °FZÖï…–‡ë)-J‘"0ÂA©347¿Ãñ%$“WHu[IeÁ<o»6!!)DÜ`ö$ÀbÆŒ¸ Éä’€‰‰­íRHH á9ÇîÒ¥‹ýôÓOs]’’Ha!H+¨íË$$…HPˆ8%Ò”dó õ‹””L ŸK•§”E,„„¤®Aô áöï¾ûn ¯OBRB25DÞrÏR BHH á $ò×­[×vòrar I É´À%@jUz„„¤ŽBµ*­àã :’’éÁá®dÉ’æ¹çžÓ¢HH ?øI Çÿ¸{÷îP\³„¤„dFüôÓO¦^½z¶ØA¤¬"f:dOÝ;w¶KX”Ì *A™Í}B E!!)DLÌž=Ûæ>†±¦„¤„d´PÏ÷Ž;ît?S I!58é×GͰn’’±@§šüùóÛŽ0BBRˆtÙ¸q£¹ùæ›m‡0×½””Œ•>øÀZNÖ¬Y£Å’â|ðÏ 2Ä HÚ… I ɬ€&‰FùÚk¯iÁ$$…øG5wß}·iÙ²¥5µ&’’Y>J|•BBR$9‹-²nçÌ™“Pã’”Œ‹Õ«W·Ñ¯nôC•"àÎA“âR¥J™Ï>û,áÆ'!)!/ä“GI>e˜ÒŸ$$…ˆ:ÀÓáé§Ÿ\ar I É Ae“ß}÷QBR$:cÇŽµÝˆäKd$$%$„Z¯·Þzk ÚÁIH á Ô«¬]»¶iРùþûï~¼’’N³råJë¿§›ˆ Ä;ï¼c[Mš4)iÆ,!)!éÛ·o·–˜°CBRˆÿã—_~1=zô0E‹MŠJ"øWÉñäE´.~¤È÷êJï_ýõïkP¿~}3bĈ߿ÿñÇC9&î%|úôT’"¤8pÀÜyç¦[·næÜ¹sI1æ}ûö™lÙ²™«¯¾ú¼×Å_l75á=Íš5KsMøްòÃ?˜ *˜ž={†º2•„¤HJ¦M›f}'Ë—/Oº±ßpà vNùºòÊ+Í«¯¾ªÃЯ¹æš ÖAùóÏ?‡zl\“&Mì+ìc‘IÁÉ“'MãÆMÕªU­™+!­å’K.9oCþË_þ¢<7¹þúë/’<ð@BŒ -m­íRHHŠ€²~ýz[s’RZÉlþ!D?µærï½÷êñ‘'žxš¼Sj‘«W¯N¨1âŸÄ¤/ß·„¤”Ìêß¿¿¹å–[Ì–-[4!ÿч)7ä+VhR|„ ±”—k¯½6!K½ñʽGdµ€Sk¹råLëÖ­C)è/¼ð‚¹ì²Ëì†Ì木U…ÂDΜ9ízüá0íÚµKØq’CI<9•€U‡èòâÅ‹KHê1n±{÷î ~6þ|ÛûN-}.ä›o¾±~È‹.ºÈ6þ3pà@óÿñöÐ’­Ø2“?–ªôtîÜÙ\~ùåæª«®2Û¶m“Âi&NœhOào¿ý¶ýñ¡‡2eÊ”1Ÿþ¹&(Š)b.½ôR³nÝ:MFÀêvOO2@×Ò¥KÛÈꈙ™{RBR$§N²'C^øü8ó•ï^°`ÕSGY/\¸ÐQAÙ›"ÐVùžÎ>gΜ‘Î@“Ur ŸþyÓªU+[©SföìÙ­P*[¶¬mJ©}ÊW¾ççü;ÿÿÏïáäï,O›“!ÑûÛßÌš5k’z­öïßoMÌÏ>û¬¹ÿþûms\æ?GŽæöÛo7+V´kCD+ïé È÷%K–4 07Þx£ýZ©R%Ó©S'3~üxóþûïnC “ $²š’‡®¨V­š552Ïùòå3wÝu—Ò’xÝsÏ=ö{^øêðU²~¬-©z÷îm«%!TÃܳuëÖ R^"¯ÿþïÿ¶•°¢k‚uРA6³D‰vθ߉¦-_¾¼Ë:uêØ½©fÍšö{#‘u ˜ˆØ,P£F²Eü¨ß,!"Ž?nîˆ7ß|³­Là 4räH{aŠåFþ?¿‡Y”¿Cp87*‚súôé¶Èx´ôéÓÇ\qÅ$Å”’L 2Ÿ”4ã ‚pc3fSfsÎÊAaËæƒ¯ŒÍ‡û€õâo/Y²Dy•é@ÏH¯^½ìFÌÌa…C ‡—={ödéÀÁ³&ÄsIʇ֡:tèPk ‡²‡4Gü°)Ÿaü“¢ÓÍû¥—^²÷$s˧<Ì=z4æÃ®iÙ²eö³[¶li,/þ6ëæ¦ELB2$°™"`8Q¡¡‘Œ@s[ƒàï¿õÖ[¶†*­vøü_|Ñ—¤æ\°ôN£k×®MèµÚ¹s§Ý„Y'æ‹›¨A·£T¹GØ0h´K„"‚yÖ¬YIo¦åÈ¢iÓ¦Vpñuƌքí&Ò)FmµP4ÓG}ÔŽÂÏ8‡,@)Ÿg‚—Rjr¸UHãâÆ‹ß¡•Ûå$Oœ8a/^lºtébç–ƒ 1nE ø©0£bÆ!Ñ­ÍÄOÐ`¸nJ6a̲)“ÿÙ1‘¤UÆ ¦Â½{÷&ÜZ1î)S¦Ø8§ç—_~Ùjü~ÁMÐщ¹rå²'nL€É›7‡; X]8Tú™NCE©Ù³g[“.Z¦Ã0T¹á^â9Ç4а$ †gŸR‰•+W¶¦f¬%h‘~‚?åƒQáNF$$vSΛ7Ï.\ØÞŒ«V­ \®‡M‡ë#êíõ×_·×ÍfDT&¥ÕHºþë_ÿj|ðAû%b}H4m4E'Ú 0½¢9qʯU«V§0°˜û°|`æC8 |uO=õ”õÏáš ˆ% PS¥J{ðå¹Þ¸qc஑xˆ7Þxã÷¸‹HΧ„d‚ðî»ïšB… Ù‡ÿH@CiÔ¨‘Õ.yxÐz1¿„åú³zaF8bZÅ_ë'X&ž P"ѺÔcnŽø«8d†!pæôéÓfðàÁö>ÂBäƒ$¦|üŒ:tMúÖ‡~hû²’roE/ IŸ!B•¢É˜ê𴋆Beކ Úñ$*œž‰ldC&m#Œ ù³á¡Å„½Â?9f6|RaŒ*EXtE–SšSàb!àŒ¨xLØaV>ðYµ,HHúAøMˆŒ {qo®ŸJŒ§z" ™€œDHòÇJ$&?¬~b"²I•ÁôG GØ¡:š~×®]qx! ›ÃÓa4®â"Ð0%$C²éò0`7O´ê3ø.È»ätœ ×øŒX§öíÛ'\YNÙl„•„ ¢®9Œ¡'læä[ã×¾Àê¾ûî3uëÖ +!ZHÇA«6l˜„d!àóÁ‰ZµƒqU‰ÿ+Ìi°A ¤S$*l„˜úñ‹…É“'ÛÀ6c‰ ‡/›7oöôsq! Íf”v8èrxøá‡£Þ%$=^ 6¤#F$Åx9±‘/F Œ”‘(©—è‹ö€ 6ÈàL–&ÁäYçUn%w鉾k,yÄPD#(%$=‚è5ò¤ÆŽëûµpRôê´8fÌ›~¦4ü]h~&§Î7¥šIF…œÐþ©@T¿=>È X&¼z~8¨ÐãvŸGTïr²>kÐïw N2n”ÌâA$$=‚òqAÙ€"7¢WPؼcÇŽ¡X'‚&Ø0(aæ÷¥„€î!·rTˆ š¯JM˜ƒ¢Azùü ÑðÜ NB@P7•|Úd»ß;ùÝ™ù(%$=É÷A€N‘“0ï½€›?‚Âí7q4?sÊŽ‘oHÍÌ €pÀ”7?žä)|ïF<*Yïw†nذABÒ/Œ 2×ÕëÓ»Á-ZdOihJ¼÷z³ r¸>Ñžó n¬™W&rÌÌTW Í›7·B"ÏŽŸÏp˜œø¯]!A»ßÉ-V¬XºÍ!$$]†(V¯òŒtJbaëçÅ{/ÁŒ÷È#rpæTuäÈ‘ÀÉÔ//ÍÜ·~ú¨€"Ô/ʳãçóCT<¹“ApDŸÇ[&Qî÷áǧP)!é"8Þ©X$há“Ö{¯ XrË¢M:ÕV¡ ~ž¬#j¾úYņœ~öÒóCU!:m8e5AK×ýþ/Ц9„¤%˜%$]„„zêÊÞ¥>­y] oîܹL5 X'žfÓ^¶½„È?¿*®°B@Iðûù!U‡>¯Nt4AC÷:3è÷; ¢ÓÊ>tñdBø¶_©iÝ`TVIY–‰÷^W[ñ{^Ò‚Žì˜ž‚¼Axí¤y¡Õ—_æÞô6ç ¯âBpÍ™3GBÒ«Í%£Ü›dS íÁ‚BŽ9´(é@ úgzÉÒ¥K}±p„Ê%Æ£ecR ’ÿ=H`>òÉ'%$½€“ÈBÄ"%Æ$$ƒfhÌÑ^B˵ E  §6 Æžµ¹B¼¼ýöÛ¶_¬„¤PµÄé^—“sëó((ÀüB¾©Ó–¹óúó¨’äun+…ÌÉ[ óš¸ùYDýÆSÏõÆoÔýž«QÝHBÒ£ÓÞ_|áø‘TþˆÜ4±T¡Ôr~œ¯´"õܸù\æ'NÏÉÚ‡8+óÙ¦òåÅZù‘?ê–öm{´àǦ @d=S§ƒ¸%4âµR¹a9Io¬™ÍQ,÷{ê¼T7æ—®?©óG%$]‚Ðy§óÒº)ð塞üÁûh»o§®?I©­Ô ¼n܈TRñ£ü›ß›F4óë&äöZQÛÍÎKÐ^©øã•ÌêAM;åz¦>ô¸%$±|œ:u*Ë¿OSgš‰{q(ÌlŽ¢…Ãê¦nÌ/UÁ}ôQ I/ÀùëtUzoÊâÏmÅ©ÏpãFœ9s¦éÙ³g`Öªxñâæ«¯¾òädíæÿsc­Ú´ic›{ÉÇl»‘YHÒÊ)³ˆh7ÖËGîܹãúÍE{Žgn£™£hAئ®†ãÆüöïßß^·„¤,^¼ØtèÐÁ—‡<«7OZ¥ Ü¸é îW%—´ 2’ÓÍf£™·XJoEÌM|%ò3­V^n¬U¼ZKV Û‚Àéà’ôž•Ìæ5½¿…é0ò»©7V·ÖƒˆydÇÃèÑ£/ù–ÞÜf6GÑÀšxå ª^½ºÙ½{·„¤DZy)$#7~¬7OÄæÏ&áöi °P¡BæäÉ“Y«åË—›víÚy&$3šïhàwØpRûŸ^+"(éát_·n§—ôæ5½¿•rãæýˆGšN¬ìÝ»×ñ./éíM™ÍQVµH7æ—†Þ ¸ %™„¤‹4jÔÈÑÊ2nk’^ø$ý®D”ñ¦&¦“¢™·X|’Ñ|†ÓkõÌ3ÏøV‰È‰Ê2Y}.²jÚv{=ؼiÀì„fOÇ' ÝÚ›ÐDÓ{Fœž_*¥¤&!é"„&Mš¸z#†Í'é†?Ä è®0iÒ¤@nÈ~¬½õ°„Тɯƒ ýGL£rrMÒj“åö³ã¤Åcúô鎶úJk¬ÑÌQf`mI¯á¶ÓóËÁ!­Ï’trn¶oßîÚOt+¦”õ13¥¾±¼‰öõ³xFPîËÉ~}é2d6ßѬ¦'LWnFS¾ð öå'tÐÁ_ìæFͼ¦gðÒÜÊ¡Ó÷áÇù{ø{K”(áXÀZZcfŽ2‚ ŸŒüNÎ/=»víšöçHŒ¹ öòšœBp:O2eSıî–O’ñÓK2È5RÑ${õêåÚ¦Í|§f'NÕ‘@“´ÖÙ©µ¢v*¿AèÔBOÉ]»v¹¶&ÑÌkz¤|îÜÜ¡4àÓO?íèÜR]—›Zzfs”ÈŒò*šßÈ™ª’>ÿŽ7bX«Z ¼.o–•Ó:M²W­Z•´kÅa†’4å D:¥á‡µâ‡æ€À§iÑ¢…õÅ%óÞDŒDFs !鑟B2Bÿ;„OêE¢E}OmÌ0C„¾}ûNç½Zê¨Ãd€¤ÿ"EŠ˜={ö¸ò÷ñ9-ZÔóæëAa̘1ö ¡@–óªî™æuÛ!¿!x‰q;v,4×¼zõj+(ÃtÍN0tèPë3âa¦}ûöž×õ´gü†Tq0©4H&æÎkûGff¥ôê-Ò níÚµI# ozÑiA×~i)„f™ Œ9ÒZ;‚à‡L ´H´I4ÝdÐ(ñUË0qâDO>Ã;…#ˆ¡H CC&&@BÒcöïßo‡Ó^‚KÉ= s»0ª&ѻϩÀ‘ ‚–Z•t–ð+Ý#ZHK¡Á&çÎKØ5!‘úª^ç¨"(óåËg‹|'2Ô#¦©wÊHs É€5Ÿú®<ø‰ãéÞ½» WO/Z,LlܸÑäÍ›×Ìš5+áîC\ÕªU³eÃÔ_œV¢o?ýôÓ„[êäæÉ“ǬX±Â—ÏçP‹{„ôŸDÓØ)ÂЬY3ëRˆ%JBÒ'0k=öØcæÎ;ï tZD,P˜%&±Dþä’Qp›Ìé¡~±téRsÓM7™3f„òú ‚C˜Ð 9`ÓŽì~[_(ɽÎ=äÈ‘„˜_jÞbÁC‹ŒUøKHú å·0qЃ|0Âu÷èÑÚ&ɽJDx°hEáí)S¦„ö”˜6eä¬ÁñùD½ñ pbŠ6:,‘! ›ˆK„&g¢—ýæ§Ÿ~2óæÍ³037Îþ, )1M½)VMŽ!¦q¯ûa¦Å)ž{î9+ÄéùÁ„r~9tuéÒÅjh;v DÖœþ´¸0[2ÄÑV|’>€3œ(N¼ì1à¯ÀïÂ&LIL|^…êó9Ë–-³Øõë×·µÃ é¥vM}ͲeËÚBràöêMkÄZazDÛMfÈù£@:>šéR;ÔÉöP™*ilÀA˜`6^ÇO˜º¸ß0#ø±VP„ëԜ'NX %å8´sx§”¢±’ƒa£ÚC,`Þ Á½M›6Öä™ €M4Þ›“›ž¿C nxjòù<®9Y4' XÓ_ƒ ìá¿& ùóç[ßS¼‡ 6¬ ø¯Ñb#÷Ñ‘”K„Ô§Ùºu«-ކIJ-ìFmͬW<°1£a½ùæ›V(’sÊÆMÁtüa‰^â­ >oæ˜@D¬˜½ãuEøÑGÙgªC‡öJ7|¥ º}•ôðä…y‚Ï>ûÌÍ—V\äýðà#|É¿D›à"çõàÁƒí†ÊW¾ççü;ÿÿÏÍïóP“dËf.Sªs⃇²j˜@1Uqú&*“ %hž¬ !êCñók…&ÏÚPä:{öì¶Ï"¿7pà@»I¤´,°)#œEúp ÄÜI¾|ð¼y­Q£†µð0ïäg¦øsyŸ‹Ã¿"~ÿ"8ÏÙÎ;Càâh똼 Cƒç Èþ‚ð¤Ÿ,q Qæ“5ÀÃ{Ì;󈿟½(GŽö°Ž‰š&D¼NÃ’ôÂÓyñã9yêÁ´‡I/› N]hܨ<´<䑇ïIŒçßI”Çô›òš¸1ÃN¦CÚf"6còýX›ÈfÌÆÁ÷œÆ1yïØ±#ªƒ ZM­Zµm ,äæCZóNI¸Èá’à ïñqrh‰øŸ5ÊiˆØGx®[·Îº°X1Ÿ˜ 9¬s0ä{4R!š?ùÉA8lHHº Ï) {½“ÐX™Ó«Ó`ÆÄ*¼mÒéäqÌ­œÞ‰H΃¥«ˆ|ù/ I—ÀoDÀ ¦P7|D˜#Ü(]…V‰P×O#2­ïÿø‡+Át…Ç?–œÛ0_+ŽY„îBïÝs(Ý?Å„»P{´L™2®ý}jÜ%œ ‡!!)b„¨E"舾r3™?V¬Ñ±±€cs´w!„*· ª¨×dmöí&D'c I‘e(‡Eä*ÜL—àsˆŒtÛ¡Må¯[õ$rõê&ø;9ð -œ¿=UBBRD ›³Š 8KÆ më1¯!M :é'">ºwïn˨ ɤ†À ’²©^âW™/êEÒúÈ/T\Ày0×ûÕ箘î½j•¨L›6ÍôêÕK!!™¼P…dlº4xQš‚‚ ,èx}ÏXPqg¡k–?Y°`-­¦Þ Y‡ (ŠÉ ɤ„œ¶ýN&÷¨ßP\ÿ¤ˆhè|à74ñ–… ëppf’I&°fÍšÙÔ?|€©¡uUÂö).P¤HÝ @M_R‡ü†(W¬^Fi' ð<ˆOBRBÒ7è¯HU|pA¨NBž"tAâ\“ˆ¶mÛš%K–âZ¶mÛfM¿Úè³Æý÷ßo8 É„ˆ`äTˆ  H§ûPÆ+´”">Š+fŽ=˜ë!…h[;Xh,$$Lª•*U²&Vöƒ5=1µ L®˜^EÖ ù<{öì;$Ö¨QÃL:U # .ô5ò\BRBÒU–.]jï³fÍ Üµ¬ãEÁôX!x‡ ‘5h¡ÄRß|ó}öï߯EŠƒÚî?BB2¡ *í‘G1Å‹·iAƒt*‚ص4ÒAH ±3}útóÄOòÚ(•‡)X©>±iáÿó?ÿ£‰L(@aJ¼yÕ!VͲ鋴|¸"vºuëè*-=öX`…xPA“¤Ï« =””C zçwu];vìø½).>R¢ ƒ|š§Dý3#×HŠŠª·¤å ï¾ûn[7µD‰6O2¨°¦h“”a™ƒUŠWø&/¾øbÍ]&œ={ÖZÊx&*T¨`Ë^ò¾|ùò¦qãÆ’~V¸‘)NĤ×]wùÃþ`o´È#F~N)’M%"º„dË–Í´k×NO:p0cŽX㫯¾Ú\sÍ5ö+ ²oß>{Í)ŸŠû«ióÿgåÊ•vMÿò—¿˜ÿú¯ÿ2úÓŸì÷|}óÍ75A@1æ*­.0 I!Qš>z#GŽ ìC~à 7Øäü£¹êª«l´mPûþ‘€>zôhó÷¿ÿÝ^käæBU  ÂÜ¤Þ ˜hlAeòäÉöPɆ†ik^±b…óß°óç?ÿù‚u½öÚkÍ¡C‡4A™À¡šg åÜqàX´h‘„¤WP“’&Ôz$nŽ”7ËE]do {ï½7x7Ìÿ]ße—]vÁæ@g ‘6´ÅJ½! IÝU§Në/ºòÊ+í5{Ýÿ2è<üðÃ< Ì•4îÌÙ´i“µ¨¤~&‚èfJH!I—6m*›½Á,7Ej!9‘nÞ¼9p×ËI/¥yÑÌW¤ÍâÅ‹Ï[cLr´Y 2h˜ñ±E®ûoû›3¸qRoôAË 2×_ýysTkT¨…äøñãÍܹsÏû¹…ä{½öÚk¡Ú›QêU*ÿ¤fÆŒJ̯"m(¾rÿú׿úðÖ§OŸ4ýEÜöîÝ«MÍÐ/¿üòßç¨zõêš”(!Ø)rcÏ Ru±„’l<—^z©=•ãøé§ŸltYéÒ¥CŽM¤#›OJäèÇø%S J4%‘6TÙ‰Ì_É• 2¸'ð“§ö¹]rÉ%fàÀZÐTkûüãw7É3Ï<£I‰Òñ"š8{ þo I‡ÀßH!òˆïŽ” |}ûö ]<:Ä_qÅ¿ È0KÆçñWÉ“1hÌ>¾0Ì© øÜR[ ¨o,Î'bNç O§ˆÒ?ØÇƒJ!Ù³gÏóLÿùŸÿiš6mÊ›„¨[=›QPÍ A›' ¦×BÛÙ ‚lFO‹U«VY_$V›ˆ&üÝwßiASA¤2óCúŒˆ,ÌùF±ômd2Ó ág<Ð~’•±×¬YÓ^?%Á QŒaöìÙ®­{‰fÝ4X Â8vÆG¤5ÏÑœø¤½zÞòg I“&I1öX!@‘¶lÔÉ&Ú»K—.v߈ì̹×?þ¸Í'¶„òœA0ÁÆ-$Ïœ9c^}õU›) ³'7¶y*ÜÐ7 áÆkÍš56rnÚ´ifРA¦uëÖæÎ;ï´ÉË´ë™0aBº7•"9…©_äòÕKœ;þ Lq™=h¤;&cÚh¹±îA{4ëN·ˆI“&…zì‚xưâxñ¼‡mÝ™ŸD{¬àöb.Ρ›U§Ø(:fÌûù¤]³={ö¸òQg»nܸÑF71ôñÇ{6 ß~û­éÕ«—]Œ9sæø²»Æ®±kìÉ0öXÁÄÙ¨Q#S¼xqWJ×¥ˆô¥X…!ÜÔþc* Ad‹W¨P!{Ê!ÌÖ­h+ øbú ]±ýÈ ÒØ5v]cO¶±g…õë×›êÕ«›;ï¼Ój–gÏžuåsHA¡L^ž}ìY®\93sæL›Ÿ$4v]cר“aìY“g—.]¬6Ìœ1®xü–Ìÿ®]»l!rZ™/ÇŽólLq¤ Ä믿no€£ù¾ûî³ùG¨Ã‹/¶á¸›6m²/¬Ë—/7¯¼òŠ0`€ '.V¬˜=4nÜØþNZ½ƒˆÆ®±kì{2Œ=+š8ã§BÉ’%­Ð¬Y³¦yâ‰'̨Q£Ìüùómî'¦mæŠÆÈ|OdêðáÃí—/_ÞäÈ‘Ãþ‚ÑKÓ·£B258R7lØ`Un:Z.ܼysÛ“Œ7=׸±&L˜`OiÇOˆCcר5v=Æ+¤‹ìܹӚ°)q×µkWÓªU«ßç ­³mÛ¶V¨¢5rà8xð` ò1³!„BHH !„’B!„ü?H7’‘>DIEND®B`‚instaparse-1.4.7/project.clj000066400000000000000000000051611311220471200160300ustar00rootroot00000000000000(defproject instaparse "1.4.7" :description "Instaparse: No grammar left behind" :url "https://github.com/Engelberg/instaparse" :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :dependencies [[org.clojure/clojure "1.8.0"]] :profiles {:dev {:dependencies [[org.clojure/clojurescript "1.8.40"] [org.clojure/tools.trace "0.7.5"] [criterium "0.3.1"] [rhizome "0.2.5"]]} :1.5 {:dependencies [[org.clojure/clojure "1.5.1"]]} :1.6 {:dependencies [[org.clojure/clojure "1.6.0"]]} :1.7 {:dependencies [[org.clojure/clojure "1.7.0"] [org.clojure/clojurescript "1.7.28"]]} :1.8 {:dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/clojurescript "1.8.34"]]} :1.9 {:dependencies [[org.clojure/clojure "1.9.0-alpha16"] [org.clojure/clojurescript "1.9.562"]]}} :aliases {"test-all" ["with-profile" "+1.5:+1.6:+1.7:+1.8:+1.9" "test"] "test-cljs" ["cljsbuild" "test" "unit-tests"] "test-cljs-all" ["with-profile" "+1.7:+1.8:+1.9" "do" "clean," "test-cljs"]} :test-paths ["test/" "target/generated/test/clj"] :source-paths ["src/" "target/generated/src/clj"] :cljsee {:builds [{:source-paths ["src/"] :output-path "target/generated/src/clj" :rules :clj} {:source-paths ["test/"] :output-path "target/generated/test/clj" :rules :clj}]} :plugins [[lein-cljsbuild "1.1.5"] [cljsee "0.1.0"]] ;:hooks [leiningen.cljsbuild] :target-path "target" :scm {:name "git" :url "https://github.com/Engelberg/instaparse"} :prep-tasks [["cljsee" "once"]] :cljsbuild {:builds [{:id "none" :source-paths ["src/"] :compiler {:output-to "target/js/none.js" :optimizations :none :pretty-print true}} {:id "test" :source-paths ["src/" "test/" "runner/cljs"] :compiler {:output-to "target/js/advanced-test.js" :optimizations :advanced :target :nodejs :pretty-print false}}] :test-commands {"unit-tests" ["node" "target/js/advanced-test.js"]}}) instaparse-1.4.7/runner/000077500000000000000000000000001311220471200151765ustar00rootroot00000000000000instaparse-1.4.7/runner/cljs/000077500000000000000000000000001311220471200161315ustar00rootroot00000000000000instaparse-1.4.7/runner/cljs/runner/000077500000000000000000000000001311220471200174425ustar00rootroot00000000000000instaparse-1.4.7/runner/cljs/runner/runner.cljs000066400000000000000000000016371311220471200216370ustar00rootroot00000000000000(ns instaparse.runner.runner (:require [cljs.nodejs :as nodejs] [instaparse.abnf-test] [instaparse.auto-flatten-seq-test] [instaparse.core-test] [instaparse.defparser-test] [instaparse.grammars] [instaparse.repeat-test] [instaparse.specs] [cljs.test :as test :refer-macros [run-tests]])) (nodejs/enable-util-print!) (defmethod cljs.test/report [:cljs.test/default :end-run-tests] [m] (if (test/successful? m) (println "Tests succeeded!") (do (println "Tests failed.") ((aget js/process "exit") 1)))) (defn -main [] (run-tests 'instaparse.abnf-test 'instaparse.auto-flatten-seq-test 'instaparse.core-test 'instaparse.defparser-test 'instaparse.grammars 'instaparse.repeat-test 'instaparse.specs)) (set! *main-cli-fn* -main) instaparse-1.4.7/src/000077500000000000000000000000001311220471200144545ustar00rootroot00000000000000instaparse-1.4.7/src/instaparse/000077500000000000000000000000001311220471200166255ustar00rootroot00000000000000instaparse-1.4.7/src/instaparse/abnf.cljc000066400000000000000000000235471311220471200204030ustar00rootroot00000000000000(ns instaparse.abnf "This is the context free grammar that recognizes ABNF notation." (:refer-clojure :exclude [cat]) (:require [instaparse.transform :as t] [instaparse.cfg :as cfg] [instaparse.gll :as gll] [instaparse.reduction :as red] [instaparse.util :refer [throw-runtime-exception]] [instaparse.combinators-source :refer [Epsilon opt plus star rep alt ord cat string-ci string string-ci regexp nt look neg hide hide-tag unicode-char]] #?(:cljs [goog.string.format]) [clojure.walk :as walk]) #?(:cljs (:require-macros [instaparse.abnf :refer [precompile-cljs-grammar]]))) (def ^:dynamic *case-insensitive* "This is normally set to false, in which case the non-terminals are treated as case-sensitive, which is NOT the norm for ABNF grammars. If you really want case-insensitivity, bind this to true, in which case all non-terminals will be converted to upper-case internally (which you'll have to keep in mind when transforming)." false) (def abnf-core {:ALPHA (regexp "[a-zA-Z]") :BIT (regexp "[01]") :CHAR (regexp "[\\u0001-\\u007F]") :CR (string "\u000D") :CRLF (string "\u000D\u000A") :CTL (regexp "[\\u0000-\\u001F|\\u007F]") :DIGIT (regexp "[0-9]") :DQUOTE (string "\u0022") :HEXDIG (regexp "[0-9a-fA-F]") :HTAB (string "\u0009") :LF (string "\u000A") :LWSP (alt (alt (string "\u0020") (string "\u0009")) ;WSP (star (cat (string "\u000D\u000A") ;CRLF (alt (string "\u0020") (string "\u0009"))))) ;WSP :OCTET (regexp "[\\u0000-\\u00FF]") :SP (string "\u0020") :VCHAR (regexp "[\\u0021-\\u007E]") :WSP (alt (string "\u0020") ;SP (string "\u0009"))}) ;HTAB (def abnf-grammar-common " = (rule | hide-tag-rule)+; rule = rulename-left alternation ; hide-tag-rule = hide-tag alternation ; rulename-left = rulename; rulename-right = rulename; = <'<' opt-whitespace> rulename-left '>; defined-as = ('=' | '=/') ; alternation = concatenation ( concatenation)*; concatenation = repetition ( repetition)*; repetition = [repeat] element; repeat = NUM | (NUM? '*' NUM?); = rulename-right | group | hide | option | char-val | num-val | look | neg | regexp; look = <'&' opt-whitespace> element; neg = <'!' opt-whitespace> element; = <'(' opt-whitespace> alternation ; option = <'[' opt-whitespace> alternation ; hide = <'<' opt-whitespace> alternation '>; char-val = <'\\u0022'> #'[\\u0020-\\u0021\\u0023-\\u007E]'* <'\\u0022'> (* double-quoted strings *) | <'\\u0027'> #'[\\u0020-\\u0026\u0028-\u007E]'* <'\\u0027'>; (* single-quoted strings *) = <'%'> (bin-val | dec-val | hex-val); bin-val = <'b'> bin-char [ (<'.'> bin-char)+ | ('-' bin-char) ]; bin-char = ('0' | '1')+; dec-val = <'d'> dec-char [ (<'.'> dec-char)+ | ('-' dec-char) ]; dec-char = DIGIT+; hex-val = <'x'> hex-char [ (<'.'> hex-char)+ | ('-' hex-char) ]; hex-char = HEXDIG+; NUM = DIGIT+; = #'[0-9]'; = #'[0-9a-fA-F]'; (* extra entrypoint to be used by the abnf combinator *) = rulelist | alternation; ") (def abnf-grammar-clj-only " = #'[a-zA-Z][-a-zA-Z0-9]*(?x) #identifier'; opt-whitespace = #'\\s*(?:;.*?(?:\\u000D?\\u000A\\s*|$))*(?x) # optional whitespace or comments'; whitespace = #'\\s+(?:;.*?\\u000D?\\u000A\\s*)*(?x) # whitespace or comments'; regexp = #\"#'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'(?x) #Single-quoted regexp\" | #\"#\\\"[^\\\"\\\\]*(?:\\\\.[^\\\"\\\\]*)*\\\"(?x) #Double-quoted regexp\" ") (def abnf-grammar-cljs-only " = #'[a-zA-Z][-a-zA-Z0-9]*'; opt-whitespace = #'\\s*(?:;.*?(?:\\u000D?\\u000A\\s*|$))*'; whitespace = #'\\s+(?:;.*?\\u000D?\\u000A\\s*)*'; regexp = #\"#'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'\" | #\"#\\\"[^\\\"\\\\]*(?:\\\\.[^\\\"\\\\]*)*\\\"\" ") #?(:clj (defmacro precompile-cljs-grammar [] (let [combinators (red/apply-standard-reductions :hiccup (cfg/ebnf (str abnf-grammar-common abnf-grammar-cljs-only)))] (walk/postwalk (fn [form] (cond ;; Lists cannot be evaluated verbatim (seq? form) (list* 'list form) ;; Regexp terminals are handled differently in cljs (= :regexp (:tag form)) `(merge (regexp ~(str (:regexp form))) ~(dissoc form :tag :regexp)) :else form)) combinators)))) #?(:clj (def abnf-parser (red/apply-standard-reductions :hiccup (cfg/ebnf (str abnf-grammar-common abnf-grammar-clj-only)))) :cljs (def abnf-parser (precompile-cljs-grammar))) (defn get-char-combinator [& nums] (cond (= "-" (second nums)) (let [[lo _ hi] nums] (unicode-char lo hi)) :else (apply cat (for [n nums] (unicode-char n))))) (defn project "Restricts map to certain keys" [m ks] (into {} (for [k ks :when (contains? m k)] [k (m k)]))) (defn merge-core "Merges abnf-core map in with parsed grammar map" [grammar-map] (merge (project abnf-core (distinct (mapcat cfg/seq-nt (vals grammar-map)))) grammar-map)) (defn hide-tag? "Tests whether parser was constructed with hide-tag" [p] (= (:red p) red/raw-non-terminal-reduction)) (defn alt-preserving-hide-tag [p1 p2] (let [hide-tag-p1? (hide-tag? p1) hide-tag-p2? (hide-tag? p2)] (cond (and hide-tag-p1? hide-tag-p2?) (hide-tag (alt (dissoc p1 :red) (dissoc p2 :red))) hide-tag-p1? (hide-tag (alt (dissoc p1 :red) p2)) hide-tag-p2? (hide-tag (alt p1 (dissoc p2 :red))) :else (alt p1 p2)))) #?(:clj (defn parse-int ([string] (Integer/parseInt string)) ([string radix] (Integer/parseInt string radix))) :cljs (def parse-int js/parseInt)) (def abnf-transformer { :rule hash-map :hide-tag-rule (fn [tag rule] {tag (hide-tag rule)}) :rulename-left #(if *case-insensitive* (keyword (clojure.string/upper-case (apply str %&))) (keyword (apply str %&))) :rulename-right #(if *case-insensitive* (nt (keyword (clojure.string/upper-case (apply str %&)))) (nt (keyword (apply str %&)))) ; since rulenames are case insensitive, convert it to upper case internally to be consistent :alternation alt :concatenation cat :repeat (fn [& items] (case (count items) 1 (cond (= (first items) "*") {} ; * :else {:low (first items), :high (first items)}) ; x 2 (cond (= (first items) "*") {:high (second items)} ; *x :else {:low (first items)}) ; x* 3 {:low (first items), :high (nth items 2)})) ; x*y :repetition (fn ([repeat element] (cond (empty? repeat) (star element) (= (count repeat) 2) (rep (:low repeat) (:high repeat) element) (= (:low repeat) 1) (plus element) (= (:high repeat) 1) (opt element) :else (rep (or (:low repeat) 0) (or (:high repeat) #?(:clj Double/POSITIVE_INFINITY :cljs js/Infinity)) element))) ([element] element)) :option opt :hide hide :look look :neg neg :regexp (comp regexp cfg/process-regexp) :char-val (fn [& cs] ; case insensitive string (string-ci (apply str cs))) :bin-char (fn [& cs] (parse-int (apply str cs) 2)) :dec-char (fn [& cs] (parse-int (apply str cs))) :hex-char (fn [& cs] (parse-int (apply str cs) 16)) :bin-val get-char-combinator :dec-val get-char-combinator :hex-val get-char-combinator :NUM #(parse-int (apply str %&))}) (defn rules->grammar-map [rules] (merge-core (apply merge-with alt-preserving-hide-tag rules))) (defn abnf "Takes an ABNF grammar specification string and returns the combinator version. If you give it the right-hand side of a rule, it will return the combinator equivalent. If you give it a series of rules, it will give you back a grammar map. Useful for combining with other combinators." [spec] (let [tree (gll/parse abnf-parser :rules-or-parser spec false)] (cond (instance? instaparse.gll.Failure tree) (throw-runtime-exception "Error parsing grammar specification:\n" (with-out-str (println tree))) (= :alternation (ffirst tree)) (t/transform abnf-transformer (first tree)) :else (rules->grammar-map (t/transform abnf-transformer tree))))) (defn build-parser [spec output-format] (let [rule-tree (gll/parse abnf-parser :rulelist spec false)] (if (instance? instaparse.gll.Failure rule-tree) (throw-runtime-exception "Error parsing grammar specification:\n" (with-out-str (println rule-tree))) (let [rules (t/transform abnf-transformer rule-tree) grammar-map (rules->grammar-map rules) start-production (first (first (first rules)))] {:grammar (cfg/check-grammar (red/apply-standard-reductions output-format grammar-map)) :start-production start-production :output-format output-format})))) instaparse-1.4.7/src/instaparse/auto_flatten_seq.cljc000066400000000000000000000366351311220471200230340ustar00rootroot00000000000000(ns instaparse.auto-flatten-seq #?(:clj (:import clojure.lang.PersistentVector)) #?(:clj (:require [clojure.core.protocols :refer [IKVReduce]]))) (def ^:const threshold 32) (defprotocol ConjFlat (conj-flat [self obj]) (cached? [self])) ; Need a backwards compatible version of mix-collection-hash #?(:clj (defmacro compile-if [test then else] (if (eval test) then else))) #?(:clj (defmacro mix-collection-hash-bc [x y] ;; backwards-compatible `(compile-if (resolve 'clojure.core/mix-collection-hash) (mix-collection-hash ~x ~y) ~x))) (declare EMPTY hash-cat afs? true-count) #?(:clj (defmacro hash-conj [premix-hash-v item] `(unchecked-add-int (unchecked-multiply-int 31 ~premix-hash-v) (hash ~item))) :cljs (defn ^number hash-conj "Returns the hash code, consistent with =, for an external ordered collection implementing Iterable. See http://clojure.org/data_structures#hash for full algorithms." [unmixed-hash item] (+ (imul 31 unmixed-hash) (hash item)))) #?(:clj (defn- expt [base pow] (if (zero? pow) 1 (loop [n (int pow), y (int 1), z (int base)] (let [t (even? n), n (quot n 2)] (cond t (recur n y (unchecked-multiply-int z z)) (zero? n) (unchecked-multiply-int z y) :else (recur n (unchecked-multiply-int z y) (unchecked-multiply-int z z))))))) :cljs (defn- expt [base pow] (if (zero? pow) 1 (loop [n (int pow), y (int 1), z (int base)] (let [t (even? n), n (quot n 2)] (cond t (recur n y (imul z z)) (zero? n) (imul z y) :else (recur n (imul z y) (imul z z)))))))) (defn delve [v index] (loop [v (get-in v index) index index] (if (afs? v) (recur (get v 0) (conj index 0)) index))) (defn advance [v index] (cond (= (count index) 1) (when (< (peek index) (dec (true-count v))) (delve v [(inc (peek index))])) (< (peek index) (dec (true-count (get-in v (pop index))))) (delve v (conj (pop index) (inc (peek index)))) :else (recur v (pop index)))) (defn flat-seq ([v] (if (pos? (count v)) (flat-seq v (delve v [0])) nil)) ([v index] (lazy-seq (cons (get-in v index) (when-let [next-index (advance v index)] (flat-seq v next-index)))))) #?(:clj (deftype AutoFlattenSeq [^PersistentVector v ^int premix-hashcode ^int hashcode ^int cnt ^boolean dirty ^:unsynchronized-mutable ^clojure.lang.ISeq cached-seq] Object (toString [self] (.toString (seq self))) (hashCode [self] hashcode) (equals [self other] (and (instance? AutoFlattenSeq other) (== hashcode (.hashcode ^AutoFlattenSeq other)) (== cnt (.cnt ^AutoFlattenSeq other)) (= dirty (.dirty ^AutoFlattenSeq other)) (= v (.v ^AutoFlattenSeq other)))) clojure.lang.IHashEq (hasheq [self] hashcode) java.util.Collection (iterator [self] (if-let [^java.util.Collection s (seq self)] (.iterator s) (let [^java.util.Collection e ()] (.iterator e)))) (size [self] cnt) (toArray [self] (let [^java.util.Collection s (seq self)] (.toArray s))) clojure.lang.Sequential clojure.lang.ISeq (equiv [self other] (and (== hashcode (hash other)) (== cnt (count other)) (or (== cnt 0) (= (seq self) other)))) (empty [self] (with-meta EMPTY (meta self))) (first [self] (first (seq self))) (next [self] (next (seq self))) (more [self] (rest (seq self))) (cons [self obj] (cons obj self)) ConjFlat (conj-flat [self obj] (cond (nil? obj) self (afs? obj) (cond (zero? cnt) obj (<= (count obj) threshold) (let [phc (hash-cat self obj) new-cnt (+ cnt (count obj))] (AutoFlattenSeq. (into v obj) phc (mix-collection-hash-bc phc new-cnt) new-cnt (or dirty (.dirty ^AutoFlattenSeq obj)) nil)) :else (let [phc (hash-cat self obj) new-cnt (+ cnt (count obj))] (AutoFlattenSeq. (conj v obj) phc (mix-collection-hash-bc phc new-cnt) new-cnt true nil))) :else (let [phc (hash-conj premix-hashcode obj) new-cnt (inc cnt)] (AutoFlattenSeq. (conj v obj) phc (mix-collection-hash-bc phc new-cnt) new-cnt dirty nil)))) (cached? [self] cached-seq) clojure.lang.Counted (count [self] cnt) clojure.lang.ILookup (valAt [self key] (.valAt v key)) (valAt [self key not-found] (.valAt v key not-found)) clojure.lang.IObj (withMeta [self metamap] (AutoFlattenSeq. (with-meta v metamap) premix-hashcode hashcode cnt dirty nil)) clojure.lang.IMeta (meta [self] (meta v)) clojure.lang.Seqable (seq [self] (if cached-seq cached-seq (do (set! cached-seq (if dirty (flat-seq v) (seq v))) cached-seq)))) :cljs (deftype AutoFlattenSeq [^PersistentVector v ^number premix-hashcode ^number hashcode ^number cnt ^boolean dirty ^:unsynchronized-mutable ^ISeq cached-seq] Object (toString [self] (pr-str* (seq self))) IHash (-hash [self] hashcode) ISequential ISeq (-first [self] (first (seq self))) (-rest [self] (rest (seq self))) IEquiv (-equiv [self other] (and ;(instance? AutoFlattenSeq other) (= hashcode (hash other)) (= cnt (count other)) (or (= cnt 0) (= (seq self) other)))) ICollection (-conj [self o] (cons o self)) IEmptyableCollection (-empty [self] (with-meta EMPTY (meta self))) INext (-next [self] (next (seq self))) ConjFlat (conj-flat [self obj] (cond (nil? obj) self (afs? obj) (cond (zero? cnt) obj (<= (count obj) threshold) (let [phc (hash-cat self obj) new-cnt (+ cnt (count obj))] (AutoFlattenSeq. (into v obj) phc (mix-collection-hash phc new-cnt) new-cnt (or dirty (.-dirty ^AutoFlattenSeq obj)) nil)) :else (let [phc (hash-cat self obj) new-cnt (+ cnt (count obj))] (AutoFlattenSeq. (conj v obj) phc (mix-collection-hash phc new-cnt) new-cnt true nil))) :else (let [phc (hash-conj premix-hashcode obj) new-cnt (inc cnt)] (AutoFlattenSeq. (conj v obj) phc (mix-collection-hash phc new-cnt) new-cnt dirty nil)))) (cached? [self] cached-seq) ICounted (-count [self] cnt) ILookup (-lookup [self key] (-lookup v key)) (-lookup [self key not-found] (-lookup v key not-found)) IWithMeta (-with-meta [self metamap] (AutoFlattenSeq. (with-meta v metamap) premix-hashcode hashcode cnt dirty nil)) IMeta (-meta [self] (meta v)) ISeqable (-seq [self] (if cached-seq cached-seq (do (set! cached-seq (if dirty (flat-seq v) (seq v))) cached-seq))))) #?(:clj (defn- hash-cat ^long [^AutoFlattenSeq v1 ^AutoFlattenSeq v2] (let [c (count v2) e (int (expt 31 c))] (unchecked-add-int (unchecked-multiply-int e (.premix-hashcode v1)) (unchecked-subtract-int (.premix-hashcode v2) e)))) :cljs (defn- hash-cat ^number [^AutoFlattenSeq v1 ^AutoFlattenSeq v2] (let [c (count v2) e (int (expt 31 c))] (+ (imul e (.-premix-hashcode v1)) (- (.-premix-hashcode v2) e))))) #?(:clj (defn hash-ordered-coll-without-mix ^long [v] (compile-if (resolve 'clojure.core/mix-collection-hash) (let [thirty-one (int 31) cnt (count v)] (loop [acc (int 1) i (int 0)] (if (< i cnt) (recur (unchecked-add-int (unchecked-multiply-int thirty-one acc) (hash (v i))) (inc i)) acc))) (hash v))) :cljs (defn ^number hash-ordered-coll-without-mix "Returns the partially calculated hash code, still requires a call to mix-collection-hash" ([coll] (hash-ordered-coll-without-mix 1 coll)) ([existing-unmixed-hash coll] (loop [unmixed-hash existing-unmixed-hash coll (seq coll)] (if-not (nil? coll) (recur (bit-or (+ (imul 31 unmixed-hash) (hash (first coll))) 0) (next coll)) unmixed-hash))))) #?(:cljs (extend-protocol IPrintWithWriter instaparse.auto-flatten-seq/AutoFlattenSeq (-pr-writer [afs writer opts] (-pr-writer (seq afs) writer opts)))) (defn auto-flatten-seq [v] (let [v (vec v)] (AutoFlattenSeq. v (hash-ordered-coll-without-mix v) (hash v) (count v) false nil))) (def EMPTY (auto-flatten-seq [])) (defn afs? [s] (instance? AutoFlattenSeq s)) (defn true-count [v] (if (afs? v) (count (.-v ^AutoFlattenSeq v)) (count v))) ;; For hiccup format, we need to be able to convert the seq to a vector. (defn flat-vec-helper [acc v] (if-let [s (seq v)] (let [fst (first v)] (if (afs? fst) (recur (flat-vec-helper acc fst) (next v)) (recur (conj! acc fst) (next v)))) acc)) (defn flat-vec "Turns deep vector (like the vector inside of FlattenOnDemandVector) into a flat vec" [v] (persistent! (flat-vec-helper (transient []) v))) (defprotocol GetVec (^PersistentVector get-vec [self])) #?(:clj (deftype FlattenOnDemandVector [v ; ref containing PersistentVector or nil ^int hashcode ^int cnt flat] ; ref containing PersistentVector or nil GetVec (get-vec [self] (when (not @flat) (dosync (when (not @flat) (ref-set flat (with-meta (flat-vec @v) (meta @v))) (ref-set v nil)))) ; clear out v so it can be garbage collected @flat) Object (toString [self] (.toString (get-vec self))) (hashCode [self] hashcode) (equals [self other] (and (instance? FlattenOnDemandVector other) (== hashcode (.hashcode ^FlattenOnDemandVector other)) (== cnt (.cnt ^FlattenOnDemandVector other)) (= v (.v ^FlattenOnDemandVector other)) (= flat (.flat ^FlattenOnDemandVector other)))) clojure.lang.IHashEq (hasheq [self] hashcode) java.util.Collection (iterator [self] (.iterator (get-vec self))) (size [self] cnt) (toArray [self] (.toArray (get-vec self))) clojure.lang.IPersistentCollection (equiv [self other] (or (and (== hashcode (hash other)) (== cnt (count other)) (= (get-vec self) other)))) (empty [self] (with-meta [] (meta self))) clojure.lang.Counted (count [self] cnt) clojure.lang.IPersistentVector (assoc [self i val] (assoc (get-vec self) i val)) (assocN [self i val] (.assocN (get-vec self) i val)) (length [self] cnt) (cons [self obj] (conj (get-vec self) obj)) clojure.lang.IObj (withMeta [self metamap] (if @flat (FlattenOnDemandVector. (ref @v) hashcode cnt (ref (with-meta @flat metamap))) (FlattenOnDemandVector. (ref (with-meta @v metamap)) hashcode cnt (ref @flat)))) clojure.lang.IMeta (meta [self] (if @flat (meta @flat) (meta @v))) clojure.lang.Seqable (seq [self] (seq (get-vec self))) clojure.lang.ILookup (valAt [self key] (.valAt (get-vec self) key)) (valAt [self key not-found] (.valAt (get-vec self) key not-found)) clojure.lang.Indexed (nth [self i] (.nth (get-vec self) i)) (nth [self i not-found] (.nth (get-vec self) i not-found)) clojure.lang.IFn (invoke [self arg] (.invoke (get-vec self) arg)) (applyTo [self arglist] (.applyTo (get-vec self) arglist)) clojure.lang.Reversible (rseq [self] (if (pos? cnt) (rseq (get-vec self)) nil)) clojure.lang.IPersistentStack (peek [self] (peek (get-vec self))) (pop [self] (pop (get-vec self))) clojure.lang.Associative (containsKey [self k] (.containsKey (get-vec self) k)) (entryAt [self k] (.entryAt (get-vec self) k)) IKVReduce (kv-reduce [self f init] (.kvreduce (get-vec self) f init)) java.lang.Comparable (compareTo [self that] (.compareTo (get-vec self) that)) java.util.List (get [self i] (nth (get-vec self) i)) (indexOf [self o] (.indexOf (get-vec self) o)) (lastIndexOf [self o] (.lastIndexOf (get-vec self) o)) (listIterator [self] (.listIterator (get-vec self) 0)) (listIterator [self i] (.listIterator (get-vec self) i)) (subList [self a z] (.subList (get-vec self) a z)) ) :cljs (deftype FlattenOnDemandVector [v ; atom containing PersistentVector or nil ^number hashcode ^number cnt flat] ; atom containing PersistentVector or nil GetVec (get-vec [self] (when (not @flat) (swap! flat (fn [_] (with-meta (flat-vec @v) (meta @v)))) (swap! v (fn [_] nil))) ; clear out v so it can be garbage collected @flat) Object (toString [self] (pr-str* (get-vec self))) IHash (-hash [self] hashcode) IEquiv (-equiv [self other] (or (and (= hashcode (hash other)) (= cnt (count other)) (= (get-vec self) other)))) IEmptyableCollection (-empty [self] (with-meta [] (meta self))) ICounted (-count [self] cnt) IVector (-assoc-n [self i val] (-assoc-n (get-vec self) i val)) ICollection (-conj [self obj] (conj (get-vec self) obj)) IWithMeta (-with-meta [self metamap] (if @flat (FlattenOnDemandVector. (atom @v) hashcode cnt (atom (with-meta @flat metamap))) (FlattenOnDemandVector. (atom (with-meta @v metamap)) hashcode cnt (atom @flat)))) IMeta (-meta [self] (if @flat (meta @flat) (meta @v))) ISequential ISeqable (-seq [self] (seq (get-vec self))) ILookup (-lookup [self key] (-lookup (get-vec self) key)) (-lookup [self key not-found] (-lookup (get-vec self) key not-found)) IIndexed (-nth [self i] (-nth (get-vec self) i)) (-nth [self i not-found] (-nth (get-vec self) i not-found)) IFn (-invoke [self arg] (-invoke (get-vec self) arg)) (-invoke [self arg not-found] (-invoke (get-vec self) arg not-found)) IReversible (-rseq [self] (if (pos? cnt) (rseq (get-vec self)) nil)) IStack (-peek [self] (-peek (get-vec self))) (-pop [self] (-pop (get-vec self))) IAssociative (-assoc [self i val] (assoc (get-vec self) i val)) (-contains-key? [self k] (-contains-key? (get-vec self) k)) IKVReduce (-kv-reduce [self f init] (-kv-reduce (get-vec self) f init)) IComparable (-compare [self that] (-compare (get-vec self) that)) )) #?(:cljs (extend-protocol IPrintWithWriter instaparse.auto-flatten-seq/FlattenOnDemandVector (-pr-writer [v writer opts] (-pr-writer (get-vec v) writer opts)))) (defn convert-afs-to-vec [^AutoFlattenSeq afs] (cond (.-dirty afs) (if (cached? afs) (vec (seq afs)) #?(:clj (FlattenOnDemandVector. (ref (.-v afs)) (.-hashcode afs) (.-cnt afs) (ref nil)) :cljs (FlattenOnDemandVector. (atom (.-v afs)) (.-hashcode afs) (.-cnt afs) (atom nil)))) :else (.-v afs))) instaparse-1.4.7/src/instaparse/cfg.cljc000066400000000000000000000277261311220471200202370ustar00rootroot00000000000000(ns instaparse.cfg "This is the context free grammar that recognizes context free grammars." (:refer-clojure :exclude [cat]) (:require [instaparse.combinators-source :refer [Epsilon opt plus star rep alt ord cat string-ci string string-ci regexp nt look neg hide hide-tag]] [instaparse.reduction :refer [apply-standard-reductions]] [instaparse.gll :refer [parse]] [instaparse.util :refer [throw-illegal-argument-exception throw-runtime-exception]] [clojure.string :as str] #?(:cljs [cljs.reader :as reader]))) (def ^:dynamic *case-insensitive-literals* "When true all string literal terminals in built grammar will be treated as case insensitive" false) (defn regex-doc "Adds a comment to a Clojure regex, or no-op in ClojureScript" [pattern-str comment] #?(:clj (re-pattern (str pattern-str "(?x) #" comment)) :cljs (re-pattern pattern-str))) (def single-quoted-string (regex-doc #"'[^'\\]*(?:\\.[^'\\]*)*'" "Single-quoted string")) (def single-quoted-regexp (regex-doc #"#'[^'\\]*(?:\\.[^'\\]*)*'" "Single-quoted regexp")) (def double-quoted-string (regex-doc #"\"[^\"\\]*(?:\\.[^\"\\]*)*\"" "Double-quoted string")) (def double-quoted-regexp (regex-doc #"#\"[^\"\\]*(?:\\.[^\"\\]*)*\"" "Double-quoted regexp")) (def inside-comment #?(:clj #"(?s)(?:(?!(?:\(\*|\*\))).)*(?x) #Comment text" :cljs #"(?:(?!(?:\(\*|\*\)))[\s\S])*")) (def ws (regex-doc "[,\\s]*" "optional whitespace")) (def opt-whitespace (hide (nt :opt-whitespace))) (def cfg (apply-standard-reductions :hiccup ; use the hiccup output format {:rules (hide-tag (cat opt-whitespace (plus (nt :rule)))) :comment (cat (string "(*") (nt :inside-comment) (string "*)")) :inside-comment (cat (regexp inside-comment) (star (cat (nt :comment) (regexp inside-comment)))) :opt-whitespace (cat (regexp ws) (star (cat (nt :comment) (regexp ws)))) :rule-separator (alt (string ":") (string ":=") (string "::=") (string "=")) :rule (cat (alt (nt :nt) (nt :hide-nt)) opt-whitespace (hide (nt :rule-separator)) opt-whitespace (nt :alt-or-ord) (hide (alt (nt :opt-whitespace) (cat (nt :opt-whitespace) (alt (string ";") (string ".")) (nt :opt-whitespace))))) :nt (cat (neg (nt :epsilon)) (regexp (regex-doc "[^, \\r\\t\\n<>(){}\\[\\]+*?:=|'\"#&!;./]+" "Non-terminal"))) :hide-nt (cat (hide (string "<")) opt-whitespace (nt :nt) opt-whitespace (hide (string ">"))) :alt-or-ord (hide-tag (alt (nt :alt) (nt :ord))) :alt (cat (nt :cat) (star (cat opt-whitespace (hide (string "|")) opt-whitespace (nt :cat)))) :ord (cat (nt :cat) (plus (cat opt-whitespace (hide (string "/")) opt-whitespace (nt :cat)))) :paren (cat (hide (string "(")) opt-whitespace (nt :alt-or-ord) opt-whitespace (hide (string ")"))) :hide (cat (hide (string "<")) opt-whitespace (nt :alt-or-ord) opt-whitespace (hide (string ">"))) :cat (plus (cat opt-whitespace (alt (nt :factor) (nt :look) (nt :neg)) opt-whitespace)) :string (alt (regexp single-quoted-string) (regexp double-quoted-string)) :regexp (alt (regexp single-quoted-regexp) (regexp double-quoted-regexp)) :opt (alt (cat (hide (string "[")) opt-whitespace (nt :alt-or-ord) opt-whitespace (hide (string "]"))) (cat (nt :factor) opt-whitespace (hide (string "?")))) :star (alt (cat (hide (string "{")) opt-whitespace (nt :alt-or-ord) opt-whitespace (hide (string "}"))) (cat (nt :factor) opt-whitespace (hide (string "*")))) :plus (cat (nt :factor) opt-whitespace (hide (string "+"))) :look (cat (hide (string "&")) opt-whitespace (nt :factor)) :neg (cat (hide (string "!")) opt-whitespace (nt :factor)) :epsilon (alt (string "Epsilon") (string "epsilon") (string "EPSILON") (string "eps") (string "\u03b5")) :factor (hide-tag (alt (nt :nt) (nt :string) (nt :regexp) (nt :opt) (nt :star) (nt :plus) (nt :paren) (nt :hide) (nt :epsilon))) ;; extra entrypoint to be used by the ebnf combinator :rules-or-parser (hide-tag (alt (nt :rules) (nt :alt-or-ord)))})) ; Internally, we're converting the grammar into a hiccup parse tree ; Here's how you extract the relevant information (def tag first) (def contents next) (def content fnext) ;;;; Helper functions for reading strings and regexes (defn escape "Converts escaped single-quotes to unescaped, and unescaped double-quotes to escaped" [s] (loop [sq (seq s), v []] (if-let [c (first sq)] (case c \\ (if-let [c2 (second sq)] (if (= c2 \') (recur (drop 2 sq) (conj v c2)) (recur (drop 2 sq) (conj v c c2))) (throw-runtime-exception "Encountered backslash character at end of string: " s)) \" (recur (next sq) (conj v \\ \")) (recur (next sq) (conj v c))) (apply str v)))) ;(defn safe-read-string [s] ; (binding [*read-eval* false] ; (read-string s))) #?(:clj (defn wrap-reader [reader] (let [{major :major minor :minor} *clojure-version*] (if (and (<= major 1) (<= minor 6)) reader (fn [r s] (reader r s {} (java.util.LinkedList.))))))) #?(:clj (let [string-reader (wrap-reader (clojure.lang.LispReader$StringReader.))] (defn safe-read-string "Expects a double-quote at the end of the string" [s] (with-in-str s (string-reader *in* nil)))) :cljs (defn safe-read-string [s] (reader/read-string* (reader/push-back-reader s) nil))) ; I think re-pattern is sufficient, but here's how to do it without. ;(let [regexp-reader (clojure.lang.LispReader$RegexReader.)] ; (defn safe-read-regexp ; "Expects a double-quote at the end of the string" ; [s] ; (with-in-str s (regexp-reader *in* nil)))) (defn process-string "Converts single quoted string to double-quoted" [s] (let [stripped (subs s 1 (dec (count s))) remove-escaped-single-quotes (escape stripped) final-string (safe-read-string (str remove-escaped-single-quotes \"))] final-string)) (defn process-regexp "Converts single quoted regexp to double-quoted" [s] ;(println (with-out-str (pr s))) (let [stripped (subs s 2 (dec (count s))) remove-escaped-single-quotes (escape stripped) final-string (re-pattern remove-escaped-single-quotes)] ; (safe-read-regexp (str remove-escaped-single-quotes \"))] final-string)) ;;; Now we need to convert the grammar's parse tree into combinators (defn build-rule "Convert one parsed rule from the grammar into combinators" [tree] (case (tag tree) :rule (let [[nt alt-or-ord] (contents tree)] (if (= (tag nt) :hide-nt) [(keyword (content (content nt))) (hide-tag (build-rule alt-or-ord))] [(keyword (content nt)) (build-rule alt-or-ord)])) :nt (nt (keyword (content tree))) :alt (apply alt (map build-rule (contents tree))) :ord (apply ord (map build-rule (contents tree))) :paren (recur (content tree)) :hide (hide (build-rule (content tree))) :cat (apply cat (map build-rule (contents tree))) :string ((if *case-insensitive-literals* string-ci string) (process-string (content tree))) :regexp (regexp (process-regexp (content tree))) :opt (opt (build-rule (content tree))) :star (star (build-rule (content tree))) :plus (plus (build-rule (content tree))) :look (look (build-rule (content tree))) :neg (neg (build-rule (content tree))) :epsilon Epsilon)) (defn seq-nt "Returns a sequence of all non-terminals in a parser built from combinators." [parser] (case (:tag parser) :nt [(:keyword parser)] (:string :string-ci :char :regexp :epsilon) [] (:opt :plus :star :look :neg :rep) (recur (:parser parser)) (:alt :cat) (mapcat seq-nt (:parsers parser)) :ord (mapcat seq-nt [(:parser1 parser) (:parser2 parser)]))) (defn check-grammar "Throw error if grammar uses any invalid non-terminals in its productions" [grammar-map] (let [valid-nts (set (keys grammar-map))] (doseq [nt (distinct (mapcat seq-nt (vals grammar-map)))] (when-not (valid-nts nt) (throw-runtime-exception (subs (str nt) 1) " occurs on the right-hand side of your grammar, but not on the left")))) grammar-map) (defn build-parser [spec output-format] (let [rules (parse cfg :rules spec false)] (if (instance? instaparse.gll.Failure rules) (throw-runtime-exception "Error parsing grammar specification:\n" (with-out-str (println rules))) (let [productions (map build-rule rules) start-production (first (first productions))] {:grammar (check-grammar (apply-standard-reductions output-format (into {} productions))) :start-production start-production :output-format output-format})))) (defn build-parser-from-combinators [grammar-map output-format start-production] (if (nil? start-production) (throw-illegal-argument-exception "When you build a parser from a map of parser combinators, you must provide a start production using the :start keyword argument.") {:grammar (check-grammar (apply-standard-reductions output-format grammar-map)) :start-production start-production :output-format output-format})) (defn ebnf "Takes an EBNF grammar specification string and returns the combinator version. If you give it the right-hand side of a rule, it will return the combinator equivalent. If you give it a series of rules, it will give you back a grammar map. Useful for combining with other combinators." [spec] (let [rules (parse cfg :rules-or-parser spec false)] (cond (instance? instaparse.gll.Failure rules) (throw-runtime-exception "Error parsing grammar specification:\n" (with-out-str (println rules))) (= :rule (ffirst rules)) (into {} (map build-rule rules)) :else (build-rule (first rules))))) instaparse-1.4.7/src/instaparse/combinators.cljc000066400000000000000000000016721311220471200220100ustar00rootroot00000000000000(ns instaparse.combinators "The combinator public API for instaparse" (:refer-clojure :exclude [cat]) #?(:clj (:use instaparse.macros) :cljs (:require-macros [instaparse.macros :refer [defclone]])) (:require [instaparse.combinators-source :as c] [instaparse.cfg :as cfg] [instaparse.abnf :as abnf])) ;; The actual source is in combinators-source. ;; This was necessary to avoid a cyclical dependency in the namespaces. (defclone Epsilon c/Epsilon) (defclone opt c/opt) (defclone plus c/plus) (defclone star c/star) (defclone rep c/rep) (defclone alt c/alt) (defclone ord c/ord) (defclone cat c/cat) (defclone string c/string) (defclone string-ci c/string-ci) (defclone unicode-char c/unicode-char) (defclone regexp c/regexp) (defclone nt c/nt) (defclone look c/look) (defclone neg c/neg) (defclone hide c/hide) (defclone hide-tag c/hide-tag) (defclone ebnf cfg/ebnf) (defclone abnf abnf/abnf) instaparse-1.4.7/src/instaparse/combinators_source.cljc000066400000000000000000000146771311220471200234010ustar00rootroot00000000000000(ns instaparse.combinators-source "This is the underlying implementation of the various combinators." (:refer-clojure :exclude [cat]) (:require [instaparse.reduction :refer [singleton? red raw-non-terminal-reduction reduction-types]] [instaparse.util :refer [throw-illegal-argument-exception]])) ;; Ways to build parsers (def Epsilon {:tag :epsilon}) (defn opt "Optional, i.e., parser?" [parser] (if (= parser Epsilon) Epsilon {:tag :opt :parser parser})) (defn plus "One or more, i.e., parser+" [parser] (if (= parser Epsilon) Epsilon {:tag :plus :parser parser})) (defn star "Zero or more, i.e., parser*" [parser] (if (= parser Epsilon) Epsilon {:tag :star :parser parser})) (defn rep "Between m and n repetitions" [m n parser] {:pre [(<= m n)]} (if (= parser Epsilon) Epsilon {:tag :rep :parser parser :min m :max n})) (defn alt "Alternation, i.e., parser1 | parser2 | parser3 | ..." [& parsers] (cond (every? (partial = Epsilon) parsers) Epsilon (singleton? parsers) (first parsers) :else {:tag :alt :parsers parsers})) (defn- ord2 [parser1 parser2] {:tag :ord :parser1 parser1 :parser2 parser2}) (defn ord "Ordered choice, i.e., parser1 / parser2" ([] Epsilon) ([parser1 & parsers] (let [parsers (if (= parser1 Epsilon) (remove #{Epsilon} parsers) parsers)] (if (seq parsers) (ord2 parser1 (apply ord parsers)) parser1)))) (defn cat "Concatenation, i.e., parser1 parser2 ..." [& parsers] (if (every? (partial = Epsilon) parsers) Epsilon (let [parsers (remove #{Epsilon} parsers)] (if (singleton? parsers) (first parsers) ; apply vector reduction {:tag :cat :parsers parsers})))) (defn string "Create a string terminal out of s" [s] (if (= s "") Epsilon {:tag :string :string s})) (defn string-ci "Create a case-insensitive string terminal out of s" [s] (if (= s "") Epsilon {:tag :string-ci :string s})) (defn unicode-char "Matches a Unicode code point or a range of code points" ([code-point] (unicode-char code-point code-point)) ([lo hi] (assert (<= lo hi) "Character range minimum must be less than or equal the maximum") {:tag :char :lo lo :hi hi})) #?(:cljs (defn- add-beginning-constraint "JavaScript regexes have no .lookingAt method, so in cljs we just add a '^' character to the front of the regex." [r] (if (regexp? r) (re-pattern (str "^" (.-source r))) r))) (defn regexp "Create a regexp terminal out of regular expression r" [r] (if (= r "") Epsilon {:tag :regexp :regexp (-> (re-pattern r) #?(:cljs add-beginning-constraint))})) (defn nt "Refers to a non-terminal defined by the grammar map" [s] {:tag :nt :keyword s}) (defn look "Lookahead, i.e., &parser" [parser] {:tag :look :parser parser}) (defn neg "Negative lookahead, i.e., !parser" [parser] {:tag :neg :parser parser}) (defn hide "Hide the result of parser, i.e., " [parser] (assoc parser :hide true)) (defn hide-tag "Hide the tag associated with this rule. Wrap this combinator around the entire right-hand side." [parser] (red parser raw-non-terminal-reduction)) ; Ways to alter a parser with hidden information, unhiding that information (defn hidden-tag? "Tests whether parser was created with hide-tag combinator" [parser] (= (:red parser) raw-non-terminal-reduction)) (defn unhide-content "Recursively undoes the effect of hide on one parser" [parser] (let [parser (if (:hide parser) (dissoc parser :hide) parser)] (cond (:parser parser) (assoc parser :parser (unhide-content (:parser parser))) (:parsers parser) (assoc parser :parsers (map unhide-content (:parsers parser))) (= (:tag parser) :ord) (assoc parser :parser1 (unhide-content (:parser1 parser)) :parser2 (unhide-content (:parser2 parser))) :else parser))) (defn unhide-all-content "Recursively undoes the effect of hide on all parsers in the grammar" [grammar] (into {} (for [[k v] grammar] [k (unhide-content v)]))) (defn unhide-tags "Recursively undoes the effect of hide-tag" [reduction-type grammar] (if-let [reduction (reduction-types reduction-type)] (into {} (for [[k v] grammar] [k (assoc v :red (reduction k))])) (throw-illegal-argument-exception "Invalid output format " reduction-type ". Use :enlive or :hiccup."))) (defn unhide-all "Recursively undoes the effect of both hide and hide-tag" [reduction-type grammar] (if-let [reduction (reduction-types reduction-type)] (into {} (for [[k v] grammar] [k (assoc (unhide-content v) :red (reduction k))])) (throw-illegal-argument-exception "Invalid output format " reduction-type ". Use :enlive or :hiccup."))) ;; New beta feature: automatically add whitespace (defn auto-whitespace-parser [parser ws-parser] (case (:tag parser) (:nt :epsilon) parser (:opt :plus :star :rep :look :neg) (update-in parser [:parser] auto-whitespace-parser ws-parser) (:alt :cat) (assoc parser :parsers (map #(auto-whitespace-parser % ws-parser) (:parsers parser))) :ord (assoc parser :parser1 (auto-whitespace-parser (:parser1 parser) ws-parser) :parser2 (auto-whitespace-parser (:parser2 parser) ws-parser)) (:string :string-ci :regexp) ; If the string/regexp has a reduction associated with it, ; we need to "lift" that reduction out to the (cat whitespace string) ; parser that is being created. (if (:red parser) (assoc (cat ws-parser (dissoc parser :red)) :red (:red parser)) (cat ws-parser parser)))) (defn auto-whitespace [grammar start grammar-ws start-ws] (let [ws-parser (hide (opt (nt start-ws))) grammar-ws (assoc grammar-ws start-ws (hide-tag (grammar-ws start-ws))) modified-grammar (into {} (for [[nt parser] grammar] [nt (auto-whitespace-parser parser ws-parser)])) final-grammar (assoc modified-grammar start (assoc (cat (dissoc (modified-grammar start) :red) ws-parser) :red (:red (modified-grammar start))))] (merge final-grammar grammar-ws))) instaparse-1.4.7/src/instaparse/core.cljc000066400000000000000000000362531311220471200204230ustar00rootroot00000000000000(ns instaparse.core (#?(:clj :require :cljs :require-macros) [instaparse.macros :refer [defclone set-global-var!]]) (:require [clojure.walk :as walk] [instaparse.gll :as gll] [instaparse.cfg :as cfg] [instaparse.failure :as fail] [instaparse.print :as print] [instaparse.reduction :as red] [instaparse.transform :as t] [instaparse.abnf :as abnf] [instaparse.repeat :as repeat] [instaparse.combinators-source :as c] [instaparse.line-col :as lc] [instaparse.viz :as viz] [instaparse.util :refer [throw-illegal-argument-exception]])) (def ^:dynamic *default-output-format* :hiccup) (defn set-default-output-format! "Changes the default output format. Input should be :hiccup or :enlive" [type] {:pre [(#{:hiccup :enlive} type)]} (set-global-var! *default-output-format* type)) (def ^:dynamic *default-input-format* :ebnf) (defn set-default-input-format! "Changes the default input format. Input should be :abnf or :ebnf" [type] {:pre [(#{:abnf :ebnf} type)]} (set-global-var! *default-input-format* type)) (declare failure? standard-whitespace-parsers enable-tracing!) (defn- unhide-parser [parser unhide] (case unhide nil parser :content (assoc parser :grammar (c/unhide-all-content (:grammar parser))) :tags (assoc parser :grammar (c/unhide-tags (:output-format parser) (:grammar parser))) :all (assoc parser :grammar (c/unhide-all (:output-format parser) (:grammar parser))))) (defn parse "Use parser to parse the text. Returns first parse tree found that completely parses the text. If no parse tree is possible, returns a Failure object. Optional keyword arguments: :start :keyword (where :keyword is name of starting production rule) :partial true (parses that don't consume the whole string are okay) :total true (if parse fails, embed failure node in tree) :unhide <:tags or :content or :all> (for this parse, disable hiding) :optimize :memory (when possible, employ strategy to use less memory) Clj only: :trace true (print diagnostic trace while parsing)" [parser text &{:as options}] {:pre [(contains? #{:tags :content :all nil} (get options :unhide)) (contains? #{:memory nil} (get options :optimize))]} (let [start-production (get options :start (:start-production parser)), partial? (get options :partial false) optimize? (get options :optimize false) unhide (get options :unhide) trace? (get options :trace false) #?@(:clj [_ (when (and trace? (not gll/TRACE)) (enable-tracing!))]) parser (unhide-parser parser unhide)] (->> (cond (:total options) (gll/parse-total (:grammar parser) start-production text partial? (red/node-builders (:output-format parser))) (and optimize? (not partial?)) (let [result (repeat/try-repeating-parse-strategy parser text start-production)] (if (failure? result) (gll/parse (:grammar parser) start-production text partial?) result)) :else (gll/parse (:grammar parser) start-production text partial?)) #?(:clj (gll/bind-trace trace?))))) (defn parses "Use parser to parse the text. Returns lazy seq of all parse trees that completely parse the text. If no parse tree is possible, returns () with a Failure object attached as metadata. Optional keyword arguments: :start :keyword (where :keyword is name of starting production rule) :partial true (parses that don't consume the whole string are okay) :total true (if parse fails, embed failure node in tree) :unhide <:tags or :content or :all> (for this parse, disable hiding) Clj only: :trace true (print diagnostic trace while parsing)" [parser text &{:as options}] {:pre [(contains? #{:tags :content :all nil} (get options :unhide))]} (let [start-production (get options :start (:start-production parser)), partial? (get options :partial false) unhide (get options :unhide) trace? (get options :trace false) #?@(:clj [_ (when (and trace? (not gll/TRACE)) (enable-tracing!))]) parser (unhide-parser parser unhide)] (->> (cond (:total options) (gll/parses-total (:grammar parser) start-production text partial? (red/node-builders (:output-format parser))) :else (gll/parses (:grammar parser) start-production text partial?)) #?(:clj (gll/bind-trace trace?))))) (defrecord Parser [grammar start-production output-format] #?@(:clj [clojure.lang.IFn (invoke [parser text] (parse parser text)) (invoke [parser text key1 val1] (parse parser text key1 val1)) (invoke [parser text key1 val1 key2 val2] (parse parser text key1 val1 key2 val2)) (invoke [parser text key1 val1 key2 val2 key3 val3] (parse parser text key1 val1 key2 val2 key3 val3)) (applyTo [parser args] (apply parse parser args))] :cljs [IFn (-invoke [parser text] (parse parser text)) (-invoke [parser text key1 val1] (parse parser text key1 val1)) (-invoke [parser text key1 val1 key2 val2] (parse parser text key1 val1 key2 val2)) (-invoke [parser text key1 val1 key2 val2 key3 val3] (parse parser text key1 val1 key2 val2 key3 val3)) (-invoke [parser text a b c d e f g h] (parse parser text a b c d e f g h)) (-invoke [parser text a b c d e f g h i j] (parse parser text a b c d e f g h i j)) (-invoke [parser text a b c d e f g h i j k l] (parse parser text a b c d e f g h i j k l)) (-invoke [parser text a b c d e f g h i j k l m n] (parse parser text a b c d e f g h i j k l m n)) (-invoke [parser text a b c d e f g h i j k l m n o p] (parse parser text a b c d e f g h i j k l m n o p)) (-invoke [parser text a b c d e f g h i j k l m n o p q r] (parse parser text a b c d e f g h i j k l m n o p)) (-invoke [parser text a b c d e f g h i j k l m n o p q r s more] (apply parse parser text a b c d e f g h i j k l m n o p q r s more))])) #?(:clj (defmethod clojure.core/print-method Parser [x writer] (binding [*out* writer] (println (print/Parser->str x)))) :cljs (extend-protocol IPrintWithWriter instaparse.core/Parser (-pr-writer [parser writer _] (-write writer (print/Parser->str parser))))) (defn parser "Takes a string specification of a context-free grammar, or a URI for a text file containing such a specification (Clj only), or a map of parser combinators and returns a parser for that grammar. Optional keyword arguments: :input-format :ebnf or :input-format :abnf :output-format :enlive or :output-format :hiccup :start :keyword (where :keyword is name of starting production rule) :string-ci true (treat all string literals as case insensitive) :auto-whitespace (:standard or :comma) or :auto-whitespace custom-whitespace-parser Clj only: :no-slurp true (disables use of slurp to auto-detect whether input is a URI. When using this option, input must be a grammar string or grammar map. Useful for platforms where slurp is slow or not available.)" [grammar-specification &{:as options}] {:pre [(contains? #{:abnf :ebnf nil} (get options :input-format)) (contains? #{:enlive :hiccup nil} (get options :output-format)) (let [ws-parser (get options :auto-whitespace)] (or (nil? ws-parser) (contains? standard-whitespace-parsers ws-parser) (and (map? ws-parser) (contains? ws-parser :grammar) (contains? ws-parser :start-production))))]} (let [input-format (get options :input-format *default-input-format*) build-parser (case input-format :abnf abnf/build-parser :ebnf (if (get options :string-ci) (fn [spec output-format] (binding [cfg/*case-insensitive-literals* true] (cfg/build-parser spec output-format))) cfg/build-parser)) output-format (get options :output-format *default-output-format*) start (get options :start nil) built-parser (cond (string? grammar-specification) (let [parser #?(:clj (if (get options :no-slurp) ;; if :no-slurp is set to true, string is a grammar spec (build-parser grammar-specification output-format) ;; otherwise, grammar-specification might be a URI, ;; let's slurp to see (try (let [spec (slurp grammar-specification)] (build-parser spec output-format)) (catch java.io.FileNotFoundException e (build-parser grammar-specification output-format)))) :cljs (build-parser grammar-specification output-format))] (if start (map->Parser (assoc parser :start-production start)) (map->Parser parser))) (map? grammar-specification) (let [parser (cfg/build-parser-from-combinators grammar-specification output-format start)] (map->Parser parser)) (vector? grammar-specification) (let [start (if start start (grammar-specification 0)) parser (cfg/build-parser-from-combinators (apply hash-map grammar-specification) output-format start)] (map->Parser parser)) :else #?(:clj (let [spec (slurp grammar-specification) parser (build-parser spec output-format)] (map->Parser parser)) :cljs (throw-illegal-argument-exception "Expected string, map, or vector as grammar specification, got " (pr-str grammar-specification))))] (let [auto-whitespace (get options :auto-whitespace) ; auto-whitespace is keyword, parser, or nil whitespace-parser (if (keyword? auto-whitespace) (get standard-whitespace-parsers auto-whitespace) auto-whitespace)] (if-let [{ws-grammar :grammar ws-start :start-production} whitespace-parser] (assoc built-parser :grammar (c/auto-whitespace (:grammar built-parser) (:start-production built-parser) ws-grammar ws-start)) built-parser)))) #?(:clj (defmacro defparser "Takes a string specification of a context-free grammar, or a string URI for a text file containing such a specification, or a map/vector of parser combinators, and sets a variable to a parser for that grammar. String specifications are processed at macro-time, not runtime, so this is an appealing alternative to (def _ (parser \"...\")) for ClojureScript users. Optional keyword arguments unique to `defparser`: - :instaparse.abnf/case-insensitive true" [name grammar & {:as opts}] ;; For each of the macro-time opts, ensure that they are the data ;; types we expect, not more complex quoted expressions. {:pre [(or (nil? (:input-format opts)) (keyword? (:input-format opts))) (or (nil? (:output-format opts)) (keyword? (:output-format opts))) (contains? #{true false nil} (:string-ci opts)) (contains? #{true false nil} (:no-slurp opts))]} (if (string? grammar) `(def ~name (map->Parser ~(binding [abnf/*case-insensitive* (:instaparse.abnf/case-insensitive opts false)] (let [macro-time-opts (select-keys opts [:input-format :output-format :string-ci :no-slurp]) runtime-opts (dissoc opts :start) macro-time-parser (apply parser grammar (apply concat macro-time-opts)) pre-processed-grammar (:grammar macro-time-parser) grammar-producing-code (->> pre-processed-grammar (walk/postwalk (fn [form] (cond ;; Lists cannot be evaluated verbatim (seq? form) (list* 'list form) ;; Regexp terminals are handled differently in cljs (= :regexp (:tag form)) `(merge (c/regexp ~(str (:regexp form))) ~(dissoc form :tag :regexp)) :else form)))) start-production (or (:start opts) (:start-production macro-time-parser))] `(parser ~grammar-producing-code :start ~start-production ~@(apply concat runtime-opts)))))) `(def ~name (parser ~grammar ~@(apply concat opts)))))) (defn failure? "Tests whether a parse result is a failure." [result] (or (instance? gll/failure-type result) (instance? gll/failure-type (meta result)))) (defn get-failure "Extracts failure object from failed parse result." [result] (cond (instance? gll/failure-type result) result (instance? gll/failure-type (meta result)) (meta result) :else nil)) (def ^:private standard-whitespace-parsers {:standard (parser "whitespace = #'\\s+'") :comma (parser "whitespace = #'[,\\s]+'")}) #?(:clj (defn enable-tracing! "Recompiles instaparse with tracing enabled. This is called implicitly the first time you invoke a parser with `:trace true` so usually you will not need to call this directly." [] (alter-var-root #'gll/TRACE (constantly true)) (alter-var-root #'gll/PROFILE (constantly true)) (require 'instaparse.gll :reload))) #?(:clj (defn disable-tracing! "Recompiles instaparse with tracing disabled. Call this to restore regular performance characteristics, eliminating the small performance hit imposed by tracing." [] (alter-var-root #'gll/TRACE (constantly false)) (alter-var-root #'gll/PROFILE (constantly false)) (require 'instaparse.gll :reload))) (defclone transform t/transform) (defclone add-line-and-column-info-to-metadata lc/add-line-col-spans) (defclone span viz/span) #?(:clj (defclone visualize viz/tree-viz)) instaparse-1.4.7/src/instaparse/failure.cljc000066400000000000000000000052131311220471200211120ustar00rootroot00000000000000(ns instaparse.failure "Facilities for printing and manipulating error messages." #?(:clj (:import java.io.BufferedReader java.io.StringReader)) (:require [instaparse.print :as print])) (defn index->line-column "Takes an index into text, and determines the line and column info" [index text] (loop [line 1, col 1, counter 0] (cond (= index counter) {:line line :column col} (= \newline (get text counter)) (recur (inc line) 1 (inc counter)) :else (recur line (inc col) (inc counter))))) #?(:clj (defn get-line "Returns nth line of text, 1-based" [n text] (try (nth (line-seq (BufferedReader. (StringReader. (str text)))) (dec n)) (catch Exception e ""))) :cljs (defn get-line [n text] (loop [chars (seq (clojure.string/replace text "\r\n" "\n")) n n] (cond (empty? chars) "" (= n 1) (apply str (take-while (complement #{\newline}) chars)) (= \newline (first chars)) (recur (next chars) (dec n)) :else (recur (next chars) n))))) (defn marker "Creates string with caret at nth position, 1-based" [n] (when (integer? n) (if (<= n 1) "^" (apply str (concat (repeat (dec n) \space) [\^]))))) (defn augment-failure "Adds text, line, and column info to failure object." [failure text] (let [lc (index->line-column (:index failure) text)] (merge failure lc {:text (get-line (:line lc) text)}))) (defn print-reason "Provides special case for printing negative lookahead reasons" [r] (cond (:NOT r) (do (print "NOT ") (print (:NOT r))), (:char-range r) (print (print/char-range->str r)) (instance? #?(:clj java.util.regex.Pattern :cljs js/RegExp) r) (print (print/regexp->str r)) :else (pr r))) (defn pprint-failure "Takes an augmented failure object and prints the error message" [{:keys [line column text reason]}] (println (str "Parse error at line " line ", column " column ":")) (println text) (println (marker column)) (let [full-reasons (distinct (map :expecting (filter :full reason))) partial-reasons (distinct (map :expecting (filter (complement :full) reason))) total (+ (count full-reasons) (count partial-reasons))] (cond (zero? total) nil (= 1 total) (println "Expected:") :else (println "Expected one of:")) (doseq [r full-reasons] (print-reason r) (println " (followed by end-of-string)")) (doseq [r partial-reasons] (print-reason r) (println)))) instaparse-1.4.7/src/instaparse/gll.cljc000066400000000000000000001207721311220471200202510ustar00rootroot00000000000000(ns instaparse.gll "The heart of the parsing mechanism. Contains the trampoline structure, the parsing dispatch function, the nodes where listeners are stored, the different types of listeners, and the loop for executing the various listeners and parse commands that are on the stack." (:require ;; Incremental vector provides a more performant hashing strategy ;; for this use-case for vectors ;; We use the auto flatten version [instaparse.auto-flatten-seq :as afs] ;; failure contains the augment-failure function, which is called to ;; add enough information to the failure object for pretty printing [instaparse.failure :as fail] ;; reduction contains code relating to reductions and flattening. [instaparse.reduction :as red] ;; Two of the public combinators are needed. [instaparse.combinators-source :refer [Epsilon nt]] ;; Need a way to convert parsers into strings for printing and error messages. [instaparse.print :as print] ;; Unicode utilities for char-range #?(:cljs [goog.i18n.uChar :as u])) #?(:cljs (:use-macros [instaparse.gll :only [log profile dprintln dpprint success attach-diagnostic-meta trace-or-false]]))) ;; As of Java 7, strings no longer have fast substring operation, ;; so we use Segments instead, which implement the CharSequence ;; interface with a fast subSequence operation. Fortunately, ;; Java regular expressions work on anything that adheres ;; to the CharSequence interface. There is a built-in class ;; javax.swing.text.Segment which does the trick, but ;; this class is not available on Google App Engine. So ;; to support the use of instaparse on Google App Engine, ;; we simply create our own Segment type. #?(:clj (deftype Segment [^CharSequence s ^int offset ^int count] CharSequence (length [this] count) (subSequence [this start end] (Segment. s (+ offset start) (- end start))) (charAt [this index] (.charAt s (+ offset index))) (toString [this] (.toString (doto (StringBuilder. count) (.append s offset (+ offset count))))))) ;;;;; SETUP DIAGNOSTIC MACROS AND VARS #?(:clj (do (defonce PRINT false) (defmacro dprintln [& body] (when PRINT `(println ~@body))) (defmacro dpprint [& body] (when PRINT `(clojure.pprint/pprint ~@body))) (defonce PROFILE false) (defmacro profile [& body] (when PROFILE `(do ~@body))) ;; By default TRACE is set to false, and all these macros are used ;; throughout the code to ensure there is absolutely no performance ;; penalty from the tracing code. Everything related to tracing ;; is compiled away. ;; ;; We recompile this file with TRACE set to true to activate the ;; tracing code. ;; ;; bind-trace is the one exception where we can't completely compile ;; the new code away, because it is used in instaparse.core, which won't be ;; recompiled. Still, binding is a relatively slow operation, so by testing ;; whether TRACE is true inside the expansion, we can at least avoid ;; the performance hit of binding every time. (defonce TRACE false) (def ^:dynamic *trace* false) (defmacro log [tramp & body] (when TRACE `(when (:trace? ~tramp) (println ~@body)))) (defmacro attach-diagnostic-meta [f metadata] (if TRACE `(with-meta ~f ~metadata) f)) (defmacro bind-trace [trace? body] `(if TRACE (binding [*trace* ~trace?] ~body) ~body)) (defmacro trace-or-false [] (if TRACE '*trace* false)) )) ; In diagnostic messages, how many characters ahead do we want to show. (def ^:dynamic *diagnostic-char-lookahead* 10) (declare sub-sequence string-context) #?(:clj (defn string-context [^CharSequence text index] (let [end (+ index *diagnostic-char-lookahead*), length (.length text)] (if (< length end) (str (sub-sequence text index)) (str (sub-sequence text index end) "..."))))) (profile (def stats (atom {}))) (profile (defn add! [call] (swap! stats update-in [call] (fnil inc 0)))) (profile (defn clear! [] (reset! stats {}))) ;; Now we can get down to parsing (defn get-parser [grammar p] (get grammar p p)) (declare alt-parse cat-parse string-parse epsilon-parse non-terminal-parse opt-parse plus-parse star-parse regexp-parse lookahead-parse rep-parse negative-lookahead-parse ordered-alt-parse string-case-insensitive-parse char-range-parse) (defn -parse [parser index tramp] (log tramp (format "Initiating parse: %s at index %d (%s)" (print/combinators->str parser) index (string-context (:text tramp) index))) (case (:tag parser) :nt (non-terminal-parse parser index tramp) :alt (alt-parse parser index tramp) :cat (cat-parse parser index tramp) :string (string-parse parser index tramp) :string-ci (string-case-insensitive-parse parser index tramp) :char (char-range-parse parser index tramp) :epsilon (epsilon-parse parser index tramp) :opt (opt-parse parser index tramp) :plus (plus-parse parser index tramp) :rep (rep-parse parser index tramp) :star (star-parse parser index tramp) :regexp (regexp-parse parser index tramp) :look (lookahead-parse parser index tramp) :neg (negative-lookahead-parse parser index tramp) :ord (ordered-alt-parse parser index tramp))) (declare alt-full-parse cat-full-parse string-full-parse epsilon-full-parse non-terminal-full-parse opt-full-parse plus-full-parse star-full-parse rep-full-parse regexp-full-parse lookahead-full-parse ordered-alt-full-parse string-case-insensitive-full-parse char-range-full-parse) (defn -full-parse [parser index tramp] (log tramp (format "Initiating full parse: %s at index %d (%s)" (print/combinators->str parser) index (string-context (:text tramp) index))) (case (:tag parser) :nt (non-terminal-full-parse parser index tramp) :alt (alt-full-parse parser index tramp) :cat (cat-full-parse parser index tramp) :string (string-full-parse parser index tramp) :string-ci (string-case-insensitive-full-parse parser index tramp) :char (char-range-full-parse parser index tramp) :epsilon (epsilon-full-parse parser index tramp) :opt (opt-full-parse parser index tramp) :plus (plus-full-parse parser index tramp) :rep (rep-full-parse parser index tramp) :star (star-full-parse parser index tramp) :regexp (regexp-full-parse parser index tramp) :look (lookahead-full-parse parser index tramp) :neg (negative-lookahead-parse parser index tramp) :ord (ordered-alt-full-parse parser index tramp))) (defrecord Failure [index reason]) #?(:clj (defmethod clojure.core/print-method Failure [x writer] (binding [*out* writer] (fail/pprint-failure x))) :cljs (extend-protocol IPrintWithWriter instaparse.gll/Failure (-pr-writer [fail writer _] (-write writer (with-out-str (fail/pprint-failure fail)))))) ; This is a trick to make sure we can recognize the type of ; a Failure record after this namespace is recompiled, ; but the core namespace is not recompiled ; which is what happens when tracing is enabled. (def failure-type (type (Failure. nil nil))) #?(:clj (defn text->segment "Converts text to a Segment, which has fast subsequencing" [^CharSequence text] (Segment. text 0 (count text))) :cljs (defn text->segment [text] text)) #?(:clj (defn sub-sequence "Like clojure.core/subs but consumes and returns a CharSequence" (^CharSequence [^CharSequence text start] (.subSequence text start (.length text))) (^CharSequence [^CharSequence text start end] (.subSequence text start end))) :cljs (def sub-sequence subs)) ; The trampoline structure contains the grammar, text to parse, a stack and a nodes ; Also contains an atom to hold successes and one to hold index of failure point. ; grammar is a map from non-terminals to parsers ; text is a CharSequence ; stack is an atom of a vector containing items implementing the Execute protocol. ; nodes is an atom containing a map from [index parser] pairs to Nodes ; success contains a successful parse ; failure contains the index of the furthest-along failure (defrecord Tramp [grammar text segment fail-index node-builder stack next-stack generation negative-listeners msg-cache nodes success failure trace?]) (defn make-tramp ([grammar text] (make-tramp grammar text (text->segment text) -1 nil)) ([grammar text segment] (make-tramp grammar text segment -1 nil)) ([grammar text fail-index node-builder] (make-tramp grammar text (text->segment text) fail-index node-builder)) ([grammar text segment fail-index node-builder] (Tramp. grammar text segment fail-index node-builder (atom []) (atom []) (atom 0) (atom (sorted-map-by >)) (atom {}) (atom {}) (atom nil) (atom (Failure. 0 [])) (trace-or-false)))) ; A Success record contains the result and the index to continue from (defn make-success [result index] {:result result :index index}) (defn total-success? [tramp s] (= (count (:text tramp)) (:index s))) ; The trampoline's nodes field is map from [index parser] pairs to Nodes ; Nodes track the results of a given parser at a given index, and the listeners ; who care about the result. ; results are expected to be refs of sets. ; listeners are refs of vectors. (defrecord Node [listeners full-listeners results full-results]) (defn make-node [] (Node. (atom []) (atom []) (atom #{}) (atom #{}))) ; Currently using records for Node. Seems to run marginally faster. ; Here's the way without records: ;(defn make-node [] {:listeners (atom []) :full-listeners (atom []) ; :results (atom #{}) :full-results (atom #{})}) ;; Trampoline helper functions (defn push-stack "Pushes an item onto the trampoline's stack" [tramp item] (profile (add! :push-stack)) (swap! (:stack tramp) conj item)) (defn push-message "Pushes onto stack a message to a given listener about a result" [tramp listener result] (let [cache (:msg-cache tramp) i (:index result) k [listener i] c (get @cache k 0) f #(listener result)] (profile (add! :push-message)) #_(dprintln "push-message" i c @(:generation tramp) (count @(:stack tramp)) (count @(:next-stack tramp))) #_(dprintln "push-message: listener result" listener result) (if (> c @(:generation tramp)) (swap! (:next-stack tramp) conj f) (swap! (:stack tramp) conj f)) (swap! cache assoc k (inc c)))) (defn listener-exists? "Tests whether node already has a listener" [tramp node-key] (let [nodes (:nodes tramp)] (when-let [node (@nodes node-key)] (pos? (count @(:listeners node)))))) (defn full-listener-exists? "Tests whether node already has a listener or full-listener" [tramp node-key] (let [nodes (:nodes tramp)] (when-let [node (@nodes node-key)] (or (pos? (count @(:full-listeners node))) (pos? (count @(:listeners node))))))) (defn result-exists? "Tests whether node has a result or full-result" [tramp node-key] (let [nodes (:nodes tramp)] (when-let [node (@nodes node-key)] (or (pos? (count @(:full-results node))) (pos? (count @(:results node))))))) (defn full-result-exists? "Tests whether node has a full-result" [tramp node-key] (let [nodes (:nodes tramp)] (when-let [node (@nodes node-key)] (pos? (count @(:full-results node)))))) (defn node-get "Gets node if already exists, otherwise creates one" [tramp node-key] (let [nodes (:nodes tramp)] (if-let [node (@nodes node-key)] node (let [node (make-node)] (profile (add! :create-node)) (swap! nodes assoc node-key node) node)))) (defn safe-with-meta [obj metamap] (if #?(:clj (instance? clojure.lang.IObj obj) :cljs (satisfies? cljs.core/IWithMeta obj)) (with-meta obj metamap) obj)) (defn push-result "Pushes a result into the trampoline's node. Categorizes as either result or full-result. Schedules notification to all existing listeners of result (Full listeners only get notified about full results)" [tramp node-key result] (log tramp (if (= (:tag (node-key 1)) :neg) (format "Negation satisfied: %s at index %d (%s)" (print/combinators->str (node-key 1)) (node-key 0) (string-context (:text tramp) (node-key 0))) (format "Result for %s at index %d (%s) => %s" (print/combinators->str (node-key 1)) (node-key 0) (string-context (:text tramp) (node-key 0)) (with-out-str (pr (:result result)))))) (let [node (node-get tramp node-key) parser (node-key 1) ;; reduce result with reduction function if it exists result (if (:hide parser) (assoc result :result nil) result) result (if-let [reduction-function (:red parser)] (make-success (safe-with-meta (red/apply-reduction reduction-function (:result result)) {::start-index (node-key 0) ::end-index (:index result)}) (:index result)) result) total? (total-success? tramp result) results (if total? (:full-results node) (:results node))] (when (not (@results result)) ; when result is not already in @results (profile (add! :push-result)) (swap! results conj result) (doseq [listener @(:listeners node)] (push-message tramp listener result)) (when total? (doseq [listener @(:full-listeners node)] (push-message tramp listener result)))))) (defn push-listener "Pushes a listener into the trampoline's node. Schedules notification to listener of all existing results. Initiates parse if necessary" [tramp node-key listener] #_(dprintln "push-listener" [(node-key 1) (node-key 0)] (type listener)) (let [listener-already-exists? (listener-exists? tramp node-key) node (node-get tramp node-key) listeners (:listeners node)] (profile (add! :push-listener)) (swap! listeners conj listener) (doseq [result @(:results node)] (push-message tramp listener result)) (doseq [result @(:full-results node)] (push-message tramp listener result)) (when (not listener-already-exists?) (push-stack tramp #(-parse (node-key 1) (node-key 0) tramp))))) (defn push-full-listener "Pushes a listener into the trampoline's node. Schedules notification to listener of all existing full results." [tramp node-key listener] (let [full-listener-already-exists? (full-listener-exists? tramp node-key) node (node-get tramp node-key) listeners (:full-listeners node)] (profile (add! :push-full-listener)) (swap! listeners conj listener) (doseq [result @(:full-results node)] (push-message tramp listener result)) (when (not full-listener-already-exists?) (push-stack tramp #(-full-parse (node-key 1) (node-key 0) tramp))))) (def merge-negative-listeners (partial merge-with into)) (defn push-negative-listener "Pushes a thunk onto the trampoline's negative-listener stack." [tramp creator negative-listener] #_(dprintln "push-negative-listener" (type negative-listener)) ; creator is a node-key, i.e., a [index parser] pair (swap! (:negative-listeners tramp) merge-negative-listeners {(creator 0) [(attach-diagnostic-meta negative-listener {:creator creator})]})) ;(defn success [tramp node-key result end] ; (push-result tramp node-key (make-success result end))) #?(:clj (defmacro success [tramp node-key result end] `(push-result ~tramp ~node-key (make-success ~result ~end)))) (declare build-node-with-meta) (defn fail [tramp node-key index reason] (log tramp (format "No result for %s at index %d (%s)" (print/combinators->str (node-key 1)) (node-key 0) (string-context (:text tramp) (node-key 0)))) (swap! (:failure tramp) (fn [failure] (let [current-index (:index failure)] (case (compare index current-index) 1 (Failure. index [reason]) 0 (Failure. index (conj (:reason failure) reason)) -1 failure)))) #_(dprintln "Fail index" (:fail-index tramp)) (when (= index (:fail-index tramp)) (success tramp node-key (build-node-with-meta (:node-builder tramp) :instaparse/failure (sub-sequence (:text tramp) index) index (count (:text tramp))) (count (:text tramp))))) ;; Stack helper functions (defn step "Executes one thing on the stack (not threadsafe)" [stack] (let [top (peek @stack)] (swap! stack pop) #_(dprintln "Top" top (meta top)) (top))) (defn run "Executes the stack until exhausted" ([tramp] (run tramp nil)) ([tramp found-result?] (let [stack (:stack tramp)] ;_ (dprintln "run" found-result? (count @(:stack tramp)) (count @(:next-stack tramp)))] (cond @(:success tramp) (do (log tramp "Successful parse.\nProfile: " @stats) (cons (:result @(:success tramp)) (lazy-seq (do (reset! (:success tramp) nil) (run tramp true))))) (pos? (count @stack)) (do ;(dprintln "stacks" (count @stack) (count @(:next-stack tramp))) (step stack) (recur tramp found-result?)) (pos? (count @(:negative-listeners tramp))) (let [[index listeners] (first @(:negative-listeners tramp)) listener (peek listeners)] (log tramp (format "Exhausted results for %s at index %d (%s)" (print/combinators->str (((meta listener) :creator) 1)) (((meta listener) :creator) 0) (string-context (:text tramp) (((meta listener) :creator) 0)))) (listener) (if (= (count listeners) 1) (swap! (:negative-listeners tramp) dissoc index) (swap! (:negative-listeners tramp) update-in [index] pop)) (recur tramp found-result?)) found-result? (let [next-stack (:next-stack tramp)] #_(dprintln "Swapping stacks" (count @(:stack tramp)) (count @(:next-stack tramp))) (reset! stack @next-stack) (reset! next-stack []) (swap! (:generation tramp) inc) #_(dprintln "Swapped stacks" (count @(:stack tramp)) (count @(:next-stack tramp))) (recur tramp nil)) :else nil)))) ;; Listeners ; There are six kinds of listeners that receive notifications ; The first kind is a NodeListener which simply listens for a completed parse result ; Takes the node-key of the parser which is awaiting this result. (defn NodeListener [node-key tramp] (fn [result] ;(dprintln "Node Listener received" [(node-key 0) (:tag (node-key 1))] "result" result) (push-result tramp node-key result))) ; The second kind of listener handles lookahead. (defn LookListener [node-key tramp] (fn [result] (success tramp node-key nil (node-key 0)))) ; The third kind of listener is a CatListener which listens at each stage of the ; concatenation parser to carry on the next step. Think of it as a parse continuation. ; A CatListener needs to know the sequence of results for the parsers that have come ; before, and a list of parsers that remain. Also, the node-key of the final node ; that needs to know the overall result of the cat parser. (defn CatListener [results-so-far parser-sequence node-key tramp] (dpprint {:tag :CatListener :results-so-far results-so-far :parser-sequence (map :tag parser-sequence) :node-key [(node-key 0) (:tag (node-key 1))]}) (fn [result] (let [{parsed-result :result continue-index :index} result new-results-so-far (afs/conj-flat results-so-far parsed-result)] (if (seq parser-sequence) (push-listener tramp [continue-index (first parser-sequence)] (CatListener new-results-so-far (next parser-sequence) node-key tramp)) (success tramp node-key new-results-so-far continue-index))))) (defn CatFullListener [results-so-far parser-sequence node-key tramp] ; (dpprint {:tag :CatFullListener ; :results-so-far results-so-far ; :parser-sequence (map :tag parser-sequence) ; :node-key [(node-key 0) (:tag (node-key 1))]}) (fn [result] (let [{parsed-result :result continue-index :index} result new-results-so-far (afs/conj-flat results-so-far parsed-result)] (cond (red/singleton? parser-sequence) (push-full-listener tramp [continue-index (first parser-sequence)] (CatFullListener new-results-so-far (next parser-sequence) node-key tramp)) (seq parser-sequence) (push-listener tramp [continue-index (first parser-sequence)] (CatFullListener new-results-so-far (next parser-sequence) node-key tramp)) :else (success tramp node-key new-results-so-far continue-index))))) ; The fourth kind of listener is a PlusListener, which is a variation of ; the CatListener but optimized for "one or more" parsers. (defn PlusListener [results-so-far parser prev-index node-key tramp] (fn [result] (let [{parsed-result :result continue-index :index} result] (if (= continue-index prev-index) (when (zero? (count results-so-far)) (success tramp node-key nil continue-index)) (let [new-results-so-far (afs/conj-flat results-so-far parsed-result)] (push-listener tramp [continue-index parser] (PlusListener new-results-so-far parser continue-index node-key tramp)) (success tramp node-key new-results-so-far continue-index)))))) (defn PlusFullListener [results-so-far parser prev-index node-key tramp] (fn [result] (let [{parsed-result :result continue-index :index} result] (if (= continue-index prev-index) (when (zero? (count results-so-far)) (success tramp node-key nil continue-index)) (let [new-results-so-far (afs/conj-flat results-so-far parsed-result)] (if (= continue-index (count (:text tramp))) (success tramp node-key new-results-so-far continue-index) (push-listener tramp [continue-index parser] (PlusFullListener new-results-so-far parser continue-index node-key tramp)))))))) ; The fifth kind of listener is a RepListener, which wants between m and n repetitions of a parser (defn RepListener [results-so-far n-results-so-far parser m n prev-index node-key tramp] (fn [result] (let [{parsed-result :result continue-index :index} result] ;(dprintln "Rep" (type results-so-far)) (let [new-results-so-far (afs/conj-flat results-so-far parsed-result) new-n-results-so-far (inc n-results-so-far)] (when (<= m new-n-results-so-far n) (success tramp node-key new-results-so-far continue-index)) (when (< new-n-results-so-far n) (push-listener tramp [continue-index parser] (RepListener new-results-so-far new-n-results-so-far parser m n continue-index node-key tramp))))))) (defn RepFullListener [results-so-far n-results-so-far parser m n prev-index node-key tramp] (fn [result] (let [{parsed-result :result continue-index :index} result] ;(dprintln "RepFull" (type parsed-result)) (let [new-results-so-far (afs/conj-flat results-so-far parsed-result) new-n-results-so-far (inc n-results-so-far)] (if (= continue-index (count (:text tramp))) (when (<= m new-n-results-so-far n) (success tramp node-key new-results-so-far continue-index)) (when (< new-n-results-so-far n) (push-listener tramp [continue-index parser] (RepFullListener new-results-so-far new-n-results-so-far parser m n continue-index node-key tramp)))))))) ; The top level listener is the final kind of listener (defn TopListener [tramp] (fn [result] (reset! (:success tramp) result))) ;; Parsers (defn string-parse [this index tramp] (let [string (:string this) text (:text tramp) end (min (count text) (+ index (count string))) head (sub-sequence text index end)] (if (= string head) (success tramp [index this] string end) (fail tramp [index this] index {:tag :string :expecting string})))) (defn string-full-parse [this index tramp] (let [string (:string this) text (:text tramp) end (min (count text) (+ index (count string))) head (sub-sequence text index end)] (if (and (= end (count text)) (= string head)) (success tramp [index this] string end) (fail tramp [index this] index {:tag :string :expecting string :full true})))) #?(:clj (defn equals-ignore-case [^String s1 ^String s2] (.equalsIgnoreCase s1 s2)) :cljs (defn equals-ignore-case [s1 s2] (= (.toUpperCase s1) (.toUpperCase s2)))) (defn string-case-insensitive-parse [this index tramp] (let [string (:string this) text (:text tramp) end (min (count text) (+ index (count string))) head (sub-sequence text index end)] (if (equals-ignore-case string head) (success tramp [index this] string end) (fail tramp [index this] index {:tag :string :expecting string})))) (defn string-case-insensitive-full-parse [this index tramp] (let [string (:string this) text (:text tramp) end (min (count text) (+ index (count string))) head (sub-sequence text index end)] (if (and (= end (count text)) (equals-ignore-case string head)) (success tramp [index this] string end) (fail tramp [index this] index {:tag :string :expecting string :full true})))) #?(:clj (defn single-char-code-at "Returns the int value of a single char at the given index, assuming we're looking for up to 0xFFFF (the maximum value for a UTF-16 single char)." [^CharSequence text index] (int (.charAt text index))) :cljs (defn single-char-code-at [text index] (.charCodeAt text index))) #?(:clj (defn unicode-code-point-at "Returns the unicode code point representing one or two chars at the given index." [^CharSequence text index] (Character/codePointAt text (int index))) :cljs (defn unicode-code-point-at [text index] (u/getCodePointAround text (int index)))) #?(:clj (defn code-point->chars "Takes a Unicode code point, and returns a string of one or two chars." [code-point] (String. (Character/toChars code-point))) :cljs (defn code-point->chars [code-point] (u/fromCharCode code-point))) (defn char-range-parse [this index tramp] (let [lo (:lo this) hi (:hi this) text (:text tramp)] (cond (>= index (count text)) (fail tramp [index this] index {:tag :char :expecting {:char-range true :lo lo :hi hi}}) (<= hi 0xFFFF) (let [code (single-char-code-at text index)] (if (<= lo code hi) (success tramp [index this] (str (char code)) (inc index)) (fail tramp [index this] index {:tag :char :expecting {:char-range true :lo lo :hi hi}}))) :else (let [code-point (unicode-code-point-at text index) char-string (code-point->chars code-point)] (if (<= lo code-point hi) (success tramp [index this] char-string (+ index (count char-string))) (fail tramp [index this] index {:tag :char :expecting {:char-range true :lo lo :hi hi}})))))) (defn char-range-full-parse [this index tramp] (let [lo (:lo this) hi (:hi this) text (:text tramp) end (count text)] (cond (>= index (count text)) (fail tramp [index this] index {:tag :char :expecting {:char-range true :lo lo :hi hi}}) (<= hi 0xFFFF) (let [code (single-char-code-at text index)] (if (and (= (inc index) end) (<= lo code hi)) (success tramp [index this] (str (char code)) end) (fail tramp [index this] index {:tag :char :expecting {:char-range true :lo lo :hi hi}}))) :else (let [code-point (unicode-code-point-at text index) char-string (code-point->chars code-point)] (if (and (= (+ index (count char-string)) end) (<= lo code-point hi)) (success tramp [index this] char-string end) (fail tramp [index this] index {:tag :char :expecting {:char-range true :lo lo :hi hi} :full true})))))) #?(:clj (defn re-match-at-front [regexp text] (let [^java.util.regex.Matcher matcher (re-matcher regexp text) match? (.lookingAt matcher)] (when match? (.group matcher)))) :cljs (defn re-match-at-front [regexp text] (let [re (js/RegExp. (.-source regexp) "g") m (.exec re text)] (when (and m (zero? (.-index m))) (first m))))) (defn regexp-parse [this index tramp] (let [regexp (:regexp this) ^Segment text (:segment tramp) substring (sub-sequence text index) match (re-match-at-front regexp substring)] (if match (success tramp [index this] match (+ index (count match))) (fail tramp [index this] index {:tag :regexp :expecting regexp})))) (defn regexp-full-parse [this index tramp] (let [regexp (:regexp this) ^Segment text (:segment tramp) substring (sub-sequence text index) match (re-match-at-front regexp substring) desired-length (- (count text) index)] (if (and match (= (count match) desired-length)) (success tramp [index this] match (count text)) (fail tramp [index this] index {:tag :regexp :expecting regexp :full true})))) (defn cat-parse [this index tramp] (let [parsers (:parsers this)] ; Kick-off the first parser, with a CatListener ready to pass the result on in the chain ; and with a final target of notifying this parser when the whole sequence is complete (push-listener tramp [index (first parsers)] (CatListener afs/EMPTY (next parsers) [index this] tramp)))) (defn cat-full-parse [this index tramp] (let [parsers (:parsers this)] ; Kick-off the first parser, with a CatListener ready to pass the result on in the chain ; and with a final target of notifying this parser when the whole sequence is complete (push-listener tramp [index (first parsers)] (CatFullListener afs/EMPTY (next parsers) [index this] tramp)))) (defn plus-parse [this index tramp] (let [parser (:parser this)] (push-listener tramp [index parser] (PlusListener afs/EMPTY parser index [index this] tramp)))) (defn plus-full-parse [this index tramp] (let [parser (:parser this)] (push-listener tramp [index parser] (PlusFullListener afs/EMPTY parser index [index this] tramp)))) (defn rep-parse [this index tramp] (let [parser (:parser this), m (:min this), n (:max this)] (if (zero? m) (do (success tramp [index this] nil index) (when (>= n 1) (push-listener tramp [index parser] (RepListener afs/EMPTY 0 parser 1 n index [index this] tramp)))) (push-listener tramp [index parser] (RepListener afs/EMPTY 0 parser m n index [index this] tramp))))) (defn rep-full-parse [this index tramp] (let [parser (:parser this), m (:min this), n (:max this)] (if (zero? m) (do (success tramp [index this] nil index) (when (>= n 1) (push-listener tramp [index parser] (RepFullListener afs/EMPTY 0 parser 1 n index [index this] tramp)))) (push-listener tramp [index parser] (RepFullListener afs/EMPTY 0 parser m n index [index this] tramp))))) (defn star-parse [this index tramp] (let [parser (:parser this)] (push-listener tramp [index parser] (PlusListener afs/EMPTY parser index [index this] tramp)) (success tramp [index this] nil index))) (defn star-full-parse [this index tramp] (let [parser (:parser this)] (if (= index (count (:text tramp))) (success tramp [index this] nil index) (do (push-listener tramp [index parser] (PlusFullListener afs/EMPTY parser index [index this] tramp)))))) (defn alt-parse [this index tramp] (let [parsers (:parsers this)] (doseq [parser parsers] (push-listener tramp [index parser] (NodeListener [index this] tramp))))) (defn alt-full-parse [this index tramp] (let [parsers (:parsers this)] (doseq [parser parsers] (push-full-listener tramp [index parser] (NodeListener [index this] tramp))))) (defn ordered-alt-parse [this index tramp] (let [parser1 (:parser1 this) parser2 (:parser2 this) node-key-parser1 [index parser1] node-key-parser2 [index parser2] listener (NodeListener [index this] tramp)] (push-listener tramp node-key-parser1 listener) (push-negative-listener tramp node-key-parser1 #(push-listener tramp node-key-parser2 listener)))) (defn ordered-alt-full-parse [this index tramp] (let [parser1 (:parser1 this) parser2 (:parser2 this) node-key-parser1 [index parser1] node-key-parser2 [index parser2] listener (NodeListener [index this] tramp)] (push-full-listener tramp node-key-parser1 listener) (push-negative-listener tramp node-key-parser1 #(push-full-listener tramp node-key-parser2 listener)))) (defn opt-parse [this index tramp] (let [parser (:parser this)] (push-listener tramp [index parser] (NodeListener [index this] tramp)) (success tramp [index this] nil index))) (defn opt-full-parse [this index tramp] (let [parser (:parser this)] (push-full-listener tramp [index parser] (NodeListener [index this] tramp)) (if (= index (count (:text tramp))) (success tramp [index this] nil index) (fail tramp [index this] index {:tag :optional :expecting :end-of-string})))) (defn non-terminal-parse [this index tramp] (let [parser (get-parser (:grammar tramp) (:keyword this))] (push-listener tramp [index parser] (NodeListener [index this] tramp)))) (defn non-terminal-full-parse [this index tramp] (let [parser (get-parser (:grammar tramp) (:keyword this))] (push-full-listener tramp [index parser] (NodeListener [index this] tramp)))) (defn lookahead-parse [this index tramp] (let [parser (:parser this)] (push-listener tramp [index parser] (LookListener [index this] tramp)))) (defn lookahead-full-parse [this index tramp] (if (= index (count (:text tramp))) (lookahead-parse this index tramp) (fail tramp [index this] index {:tag :lookahead :expecting :end-of-string}))) ;(declare negative-parse?) ;(defn negative-lookahead-parse ; [this index tramp] ; (let [parser (:parser this) ; remaining-text (sub-sequence (:text tramp) index)] ; (if (negative-parse? (:grammar tramp) parser remaining-text) ; (success tramp [index this] nil index) ; (fail tramp index :negative-lookahead)))) (defn negative-lookahead-parse [this index tramp] (let [parser (:parser this) node-key [index parser]] (if (result-exists? tramp node-key) (fail tramp [index this] index {:tag :negative-lookahead}) (do (push-listener tramp node-key (let [fail-send (delay (fail tramp [index this] index {:tag :negative-lookahead :expecting {:NOT (print/combinators->str parser)}}))] (fn [result] (force fail-send)))) (push-negative-listener tramp node-key #(when (not (result-exists? tramp node-key)) (success tramp [index this] nil index))))))) (defn epsilon-parse [this index tramp] (success tramp [index this] nil index)) (defn epsilon-full-parse [this index tramp] (if (= index (count (:text tramp))) (success tramp [index this] nil index) (fail tramp [index this] index {:tag :Epsilon :expecting :end-of-string}))) ;; Parsing functions (defn start-parser [tramp parser partial?] (if partial? (push-listener tramp [0 parser] (TopListener tramp)) (push-full-listener tramp [0 parser] (TopListener tramp)))) (defn parses [grammar start text partial?] (profile (clear!)) (let [tramp (make-tramp grammar text) parser (nt start)] (start-parser tramp parser partial?) (if-let [all-parses (run tramp)] all-parses (with-meta () (fail/augment-failure @(:failure tramp) text))))) (defn parse [grammar start text partial?] (profile (clear!)) (let [tramp (make-tramp grammar text) parser (nt start)] (start-parser tramp parser partial?) (if-let [all-parses (run tramp)] (first all-parses) (fail/augment-failure @(:failure tramp) text)))) ;; The node builder function is what we use to build the failure nodes ;; but we want to include start and end metadata as well. (defn build-node-with-meta [node-builder tag content start end] (with-meta (node-builder tag content) {::start-index start ::end-index end})) (defn build-total-failure-node [node-builder start text] (let [build-failure-node (build-node-with-meta node-builder :instaparse/failure text 0 (count text)), build-start-node (build-node-with-meta node-builder start build-failure-node 0 (count text))] build-start-node)) (defn parses-total-after-fail [grammar start text fail-index partial? node-builder] ;(dprintln "Parses-total-after-fail") (let [tramp (make-tramp grammar text fail-index node-builder) parser (nt start)] (log tramp "Parse failure. Restarting for total parse.") (start-parser tramp parser partial?) (if-let [all-parses (run tramp)] all-parses (list (build-total-failure-node node-builder start text))))) (defn merge-meta "A variation on with-meta that merges the existing metamap into the new metamap, rather than overwriting the metamap entirely." [obj metamap] (with-meta obj (merge metamap (meta obj)))) (defn parses-total [grammar start text partial? node-builder] (profile (clear!)) (let [all-parses (parses grammar start text partial?)] (if (seq all-parses) all-parses (merge-meta (parses-total-after-fail grammar start text (:index (meta all-parses)) partial? node-builder) (meta all-parses))))) (defn parse-total-after-fail [grammar start text fail-index partial? node-builder] ;(dprintln "Parse-total-after-fail") (let [tramp (make-tramp grammar text fail-index node-builder) parser (nt start)] (log tramp "Parse failure. Restarting for total parse.") (start-parser tramp parser partial?) (if-let [all-parses (run tramp)] (first all-parses) (build-total-failure-node node-builder start text)))) (defn parse-total [grammar start text partial? node-builder] (profile (clear!)) (let [result (parse grammar start text partial?)] (if-not (instance? Failure result) result (merge-meta (parse-total-after-fail grammar start text (:index result) partial? node-builder) result)))) ;; Variation, but not for end-user ;(defn negative-parse? ; "takes pre-processed grammar and parser" ; [grammar parser text] ; (let [tramp (make-tramp grammar text)] ; (push-listener tramp [0 parser] (TopListener tramp)) ; (empty? (run tramp)))) ; instaparse-1.4.7/src/instaparse/line_col.cljc000066400000000000000000000105341311220471200212510ustar00rootroot00000000000000(ns instaparse.line-col (:require [instaparse.transform] [instaparse.util :refer [throw-illegal-argument-exception]])) ; Function to annotate parse-tree with line and column metadata. (defrecord Cursor [^int index ^long line ^long column]) (defn- advance-cursor [^Cursor cursor ^String text new-index] (let [new-index (int new-index)] (assert (<= (.-index cursor) new-index)) (if (= (.-index cursor) new-index) cursor (loop [index (.-index cursor), line (.-line cursor), column (.-column cursor)] (cond (= index new-index) (Cursor. index line column) (= (.charAt text index) \newline) (recur (inc index) (inc line) 1) :else (recur (inc index) line (inc column))))))) (defn- make-line-col-fn "Given a string `text`, returns a function that takes an index into the string, and returns a cursor, including line and column information. For efficiency, inputs must be fed into the function in increasing order." [^String text] (let [cursor-state (atom (Cursor. 0 1 1))] (fn line-col [i] (swap! cursor-state advance-cursor text i) @cursor-state))) (defn- hiccup-add-line-col-spans [line-col-fn parse-tree] (let [m (meta parse-tree), start-index (:instaparse.gll/start-index m), end-index (:instaparse.gll/end-index m)] (if (and start-index end-index) (let [start-cursor (line-col-fn start-index), children (doall (map (partial hiccup-add-line-col-spans line-col-fn) (next parse-tree))), end-cursor (line-col-fn end-index)] (with-meta (into [(first parse-tree)] children) (merge (meta parse-tree) {:instaparse.gll/start-line (:line start-cursor) :instaparse.gll/start-column (:column start-cursor) :instaparse.gll/end-line (:line end-cursor) :instaparse.gll/end-column (:column end-cursor)}))) parse-tree))) (defn- enlive-add-line-col-spans [line-col-fn parse-tree] (let [m (meta parse-tree), start-index (:instaparse.gll/start-index m), end-index (:instaparse.gll/end-index m)] (if (and start-index end-index) (let [start-cursor (line-col-fn start-index), children (doall (map (partial enlive-add-line-col-spans line-col-fn) (:content parse-tree))), end-cursor (line-col-fn end-index)] (with-meta (assoc parse-tree :content children) (merge (meta parse-tree) {:instaparse.gll/start-line (:line start-cursor) :instaparse.gll/start-column (:column start-cursor) :instaparse.gll/end-line (:line end-cursor) :instaparse.gll/end-column (:column end-cursor)}))) parse-tree))) (defn add-line-col-spans "Given a string `text` and a `parse-tree` for text, return parse tree with its metadata annotated with line and column info. The info can then be found in the metadata map under the keywords: :instaparse.gll/start-line, :instaparse.gll/start-column, :instaparse.gll/end-line, :instaparse.gll/end-column The start is inclusive, the end is exclusive. Lines and columns are 1-based." [text parse-tree] (let [line-col-fn (make-line-col-fn text)] (cond (nil? parse-tree) nil (and (map? parse-tree) (:tag parse-tree)) ; This is an enlive tree-seq (enlive-add-line-col-spans line-col-fn parse-tree) (and (vector? parse-tree) (keyword? (first parse-tree))) ; This is a hiccup tree-seq (hiccup-add-line-col-spans line-col-fn parse-tree) (and (sequential? parse-tree) (map? (first parse-tree)) (:tag (first parse-tree))) ; This is an enlive tree with hidden root tag (instaparse.transform/map-preserving-meta (partial enlive-add-line-col-spans line-col-fn) parse-tree) (and (sequential? parse-tree) (vector? (first parse-tree)) (keyword? (first (first parse-tree)))) ; This is a hiccup tree with hidden root tag (instaparse.transform/map-preserving-meta (partial hiccup-add-line-col-spans line-col-fn) parse-tree) (instance? instaparse.gll.Failure parse-tree) ; pass failures through unchanged parse-tree :else (throw-illegal-argument-exception "Invalid parse-tree, not recognized as either enlive or hiccup format.")))) instaparse-1.4.7/src/instaparse/macros.clj000066400000000000000000000012471311220471200206070ustar00rootroot00000000000000(ns instaparse.macros) (defmacro defclone [here there] (if (contains? &env :locals) ;; cljs `(def ~here ~there) ;; clj `(do (def ~here ~there) (alter-meta! (var ~here) assoc :doc (:doc (meta (var ~there))) :arglists (:arglists (meta (var ~there))) :file (:file (meta (var ~there))) :line (:line (meta (var ~there))) :column (:column (meta (var ~there)))) (var ~here)))) (defmacro set-global-var! [v value] (if (contains? &env :locals) ;; cljs `(set! ~v ~value) ;; clj `(alter-var-root (var ~v) (constantly ~value)))) instaparse-1.4.7/src/instaparse/print.cljc000066400000000000000000000067501311220471200206260ustar00rootroot00000000000000(ns instaparse.print "Facilities for taking parsers and grammars, and converting them to strings. Used for pretty-printing." (:require [clojure.string :as str])) (declare combinators->str) ; mutual recursion (defn paren-for-tags [tag-set hidden? parser] (if (and (not hidden?) (tag-set (parser :tag))) (str "(" (combinators->str parser false) ")") (combinators->str parser false))) (def paren-for-compound (partial paren-for-tags #{:alt :ord :cat})) (defn regexp-replace "Replaces whitespace characters with escape sequences for better printing" [s] (case s "\n" "\\n" "\b" "\\b" "\f" "\\f" "\r" "\\r" "\t" "\\t" s)) (defn regexp->str [r] (str/replace (str "#\"" #?(:clj (str r) :cljs (subs (.-source r) 1)) "\"") #"[\s]" regexp-replace)) #?(:clj (defn char-range->str [{:keys [lo hi]}] (if (= lo hi) (format "%%x%04x" lo) (format "%%x%04x-%04x" lo hi))) :cljs (do (defn number->hex-padded [n] (if (<= n 0xFFF) (.substr (str "0000" (.toString n 16)) -4) (.toString n 16))) (defn char-range->str [{:keys [lo hi]}] (if (= lo hi) (str "%x" (number->hex-padded lo)) (str "%x" (number->hex-padded lo) "-" (number->hex-padded hi)))))) (defn combinators->str "Stringifies a parser built from combinators" ([p] (combinators->str p false)) ([{:keys [parser parser1 parser2 parsers tag] :as p} hidden?] (if (and (not hidden?) (:hide p)) (str \< (combinators->str p true) \>) (case tag :epsilon "\u03b5" :opt (str (paren-for-compound hidden? parser) "?") :plus (str (paren-for-compound hidden? parser) "+") :star (str (paren-for-compound hidden? parser) "*") :rep (if (not= (:min p) (:max p)) (str (paren-for-compound hidden? parser) \{ (:min p) \, (:max p) \}) (str (paren-for-compound hidden? parser) \{ (:min p)\})) :alt (str/join " | " (map (partial paren-for-tags #{:ord} hidden?) parsers)) :ord (str (paren-for-tags #{:alt} hidden? parser1) " / " (paren-for-tags #{:alt} hidden? parser2)) :cat (str/join " " (map (partial paren-for-tags #{:alt :ord} hidden?) parsers)) :string (with-out-str (pr (:string p))) :string-ci (with-out-str (pr (:string p))) :char (char-range->str p) :regexp (regexp->str (:regexp p)) :nt (subs (str (:keyword p)) 1) :look (str "&" (paren-for-compound hidden? parser)) :neg (str "!" (paren-for-compound hidden? parser)))))) (defn rule->str "Takes a non-terminal symbol and a parser built from combinators, and returns a string for the rule." [non-terminal parser] (if (= (-> parser :red :reduction-type) :raw) (str \< (name non-terminal) \> " = " (combinators->str parser)) (str (name non-terminal) " = " (combinators->str parser)))) (defn Parser->str "Takes a Parser object, i.e., something with a grammar map and a start production keyword, and stringifies it." [{grammar :grammar start :start-production}] (str/join \newline (cons ; Put starting production first (rule->str start (grammar start)) ; Then the others (for [[non-terminal parser] grammar :when (not= non-terminal start)] (rule->str non-terminal parser))))) instaparse-1.4.7/src/instaparse/reduction.cljc000066400000000000000000000036651311220471200214700ustar00rootroot00000000000000(ns instaparse.reduction (:require [instaparse.auto-flatten-seq :as afs] [instaparse.util :refer [throw-illegal-argument-exception]])) ;; utilities (defn singleton? [s] (and (seq s) (not (next s)))) ;; red is a reduction combinator for expert use only ;; because it is used internally to control the tree tags that ;; are displayed, so adding a different reduction would change ;; that behavior. (defn red [parser f] (assoc parser :red f)) ;; Flattening and reductions (def raw-non-terminal-reduction {:reduction-type :raw}) (defn HiccupNonTerminalReduction [key] {:reduction-type :hiccup :key key}) (defn EnliveNonTerminalReduction [key] {:reduction-type :enlive, :key key}) (def ^:constant reduction-types {:hiccup HiccupNonTerminalReduction :enlive EnliveNonTerminalReduction}) (def ^:constant node-builders ; A map of functions for building a node that only has one item ; These functions are used in total-parse mode to build failure nodes {:enlive (fn [tag item] {:tag tag :content (list item)}) :hiccup (fn [tag item] [tag item])}) (def standard-non-terminal-reduction :hiccup) (defn apply-reduction [f result] (case (:reduction-type f) :raw (afs/conj-flat afs/EMPTY result) :hiccup (afs/convert-afs-to-vec (afs/conj-flat (afs/auto-flatten-seq [(:key f)]) result)) :enlive (let [content (afs/conj-flat afs/EMPTY result)] {:tag (:key f), :content (if (zero? (count content)) nil content)}) (f result))) (defn apply-standard-reductions ([grammar] (apply-standard-reductions standard-non-terminal-reduction grammar)) ([reduction-type grammar] (if-let [reduction (reduction-types reduction-type)] (into {} (for [[k v] grammar] (if (:red v) [k v] [k (assoc v :red (reduction k))]))) (throw-illegal-argument-exception "Invalid output format " reduction-type ". Use :enlive or :hiccup.")))) instaparse-1.4.7/src/instaparse/repeat.cljc000066400000000000000000000230061311220471200207430ustar00rootroot00000000000000(ns instaparse.repeat (:require [instaparse.gll :as gll #?@(:clj [:refer [profile]])] [instaparse.combinators-source :as c] [instaparse.auto-flatten-seq :as afs] [instaparse.viz :as viz] [instaparse.reduction :as red] [instaparse.failure :as fail]) #?(:cljs (:require-macros [instaparse.gll :refer [profile]]))) (defn empty-result? [result] (or (and (vector? result) (= (count result) 1)) (and (map? result) (contains? result :tag) (empty? (get result :content))) (empty? result))) (def ^:constant failure-signal (gll/->Failure nil nil)) (defn get-end (#?(:clj ^long [parse] :cljs ^number [parse]) (let [[start end] (viz/span parse)] (if end (long end) (count parse)))) (#?(:clj ^long [parse ^long index] :cljs ^number [parse ^number index]) (let [[start end] (viz/span parse)] (if end (long end) (+ index (count parse)))))) (defn parse-from-index [grammar initial-parser text segment index] (let [tramp (gll/make-tramp grammar text segment)] (gll/push-listener tramp [index initial-parser] (gll/TopListener tramp)) (gll/run tramp))) (defn select-parse "Returns either: [a-parse end-index a-list-of-valid-follow-up-parses] [a-parse end-index nil] (successfully reached end of text) nil (hit a dead-end with this strategy)" [grammar initial-parser text segment index parses] ;(clojure.pprint/pprint parses) (let [length (count text)] (loop [parses (seq parses)] (when parses (let [parse (first parses) [start end] (viz/span parse) end (if end end (+ index (count parse)))] (cond (= end length) [parse end nil] :else (if-let [follow-ups (seq (parse-from-index grammar initial-parser text segment end))] [parse end follow-ups] (recur (next parses))))))))) (defn repeat-parse-hiccup ([grammar initial-parser root-tag text segment] (repeat-parse-hiccup grammar initial-parser root-tag text segment 0)) ([grammar initial-parser root-tag text segment index] (let [length (count text) first-result (parse-from-index grammar initial-parser text segment index)] (loop [index (long index) parses (afs/auto-flatten-seq [root-tag]) [parse end follow-ups :as selection] (select-parse grammar initial-parser text segment index first-result)] (cond (nil? selection) failure-signal (= index end) failure-signal (nil? follow-ups) (gll/safe-with-meta (afs/convert-afs-to-vec (afs/conj-flat parses parse)) {:optimize :memory :instaparse.gll/start-index 0 :instaparse.gll/end-index length}) :else (recur (long end) (afs/conj-flat parses parse) (select-parse grammar initial-parser text segment end follow-ups))))))) (defn repeat-parse-enlive ([grammar initial-parser root-tag text segment] (repeat-parse-enlive grammar initial-parser root-tag text segment 0)) ([grammar initial-parser root-tag text segment index] (let [length (count text) first-result (parse-from-index grammar initial-parser text segment index)] (loop [index (long index) parses afs/EMPTY [parse end follow-ups :as selection] (select-parse grammar initial-parser text segment index first-result)] (cond (nil? selection) failure-signal (= index end) failure-signal (nil? follow-ups) (gll/safe-with-meta {:tag root-tag :content (seq (afs/conj-flat parses parse))} {:optimize :memory :instaparse.gll/start-index 0 :instaparse.gll/end-index length}) :else (recur (long end) (afs/conj-flat parses parse) (select-parse grammar initial-parser text segment end follow-ups))))))) (defn repeat-parse-no-tag ([grammar initial-parser text segment] (repeat-parse-no-tag grammar initial-parser text segment 0)) ([grammar initial-parser text segment index] (let [length (count text) first-result (parse-from-index grammar initial-parser text segment index)] (loop [index (long index) parses afs/EMPTY [parse end follow-ups :as selection] (select-parse grammar initial-parser text segment index first-result)] (cond (nil? selection) failure-signal (= index end) failure-signal (nil? follow-ups) (gll/safe-with-meta (afs/conj-flat parses parse) {:optimize :memory :instaparse.gll/start-index 0 :instaparse.gll/end-index length}) :else (recur (long end) (afs/conj-flat parses parse) (select-parse grammar initial-parser text segment end follow-ups))))))) (defn repeat-parse ([grammar initial-parser output-format text] (repeat-parse-no-tag grammar initial-parser text (gll/text->segment text))) ([grammar initial-parser output-format root-tag text] {:pre [(#{:hiccup :enlive} output-format)]} (cond (= output-format :hiccup) (repeat-parse-hiccup grammar initial-parser root-tag text (gll/text->segment text)) (= output-format :enlive) (repeat-parse-enlive grammar initial-parser root-tag text (gll/text->segment text))))) (defn repeat-parse-with-header ([grammar header-parser repeating-parser output-format root-tag text] (let [segment (gll/text->segment text) length (count text) header-results (parse-from-index grammar header-parser text segment 0)] (if (or (empty? header-results) (:hide header-parser)) failure-signal (let [header-result (apply max-key get-end header-results) end (get-end header-result) repeat-result (repeat-parse-no-tag grammar (:parser repeating-parser) text segment end) span-meta {:optimize :memory :instaparse.gll/start-index 0 :instaparse.gll/end-index length}] (if (or (instance? instaparse.gll.Failure repeat-result) (and (= (:tag repeating-parser) :star) (empty-result? repeat-result))) failure-signal (case output-format :enlive (gll/safe-with-meta {:tag root-tag :content (afs/conj-flat (afs/conj-flat afs/EMPTY header-result) repeat-result)} span-meta) :hiccup (gll/safe-with-meta (afs/convert-afs-to-vec (afs/conj-flat (afs/conj-flat (afs/auto-flatten-seq [root-tag]) header-result) repeat-result)) span-meta) (gll/safe-with-meta (afs/conj-flat (afs/conj-flat afs/EMPTY header-result) repeat-result) span-meta)))))))) (defn try-repeating-parse-strategy-with-header [grammar text start-production start-rule output-format] (gll/profile (gll/clear!)) (let [parsers (:parsers start-rule) repeating-parser (last parsers)] (if (not (and (= (:tag start-rule) :cat) (#{:star :plus} (:tag repeating-parser)) (not (:hide repeating-parser)) (not (:hide (:parser repeating-parser))))) failure-signal (let [header-parser (apply c/cat (butlast parsers))] (if (= (:red start-rule) red/raw-non-terminal-reduction) (repeat-parse-with-header grammar header-parser repeating-parser nil start-production text) (repeat-parse-with-header grammar header-parser repeating-parser output-format start-production text)))))) (defn try-repeating-parse-strategy [parser text start-production] (let [grammar (:grammar parser) output-format (:output-format parser) start-rule (get grammar start-production)] (profile (gll/clear!)) (cond (= (:hide start-rule) true) failure-signal (= (:red start-rule) red/raw-non-terminal-reduction) (cond (= (:tag start-rule) :star) (repeat-parse grammar (:parser start-rule) output-format text) (= (:tag start-rule) :plus) (let [result (repeat-parse grammar (:parser start-rule) output-format text)] (if (empty-result? result) failure-signal result)) :else (try-repeating-parse-strategy-with-header grammar text start-production start-rule output-format)) (= (:tag start-rule) :star) (repeat-parse grammar (:parser start-rule) output-format start-production text) (= (:tag start-rule) :plus) (let [result (repeat-parse grammar (:parser start-rule) output-format start-production text)] (if (empty-result? result) failure-signal result)) :else (try-repeating-parse-strategy-with-header grammar text start-production start-rule output-format)))) (defn used-memory-optimization? [tree] (= :memory (-> tree meta :optimize)))instaparse-1.4.7/src/instaparse/transform.cljc000066400000000000000000000053531311220471200215030ustar00rootroot00000000000000(ns instaparse.transform "Functions to transform parse trees" (:require [instaparse.gll] [instaparse.util :refer [throw-illegal-argument-exception]])) (defn map-preserving-meta [f l] (with-meta (map f l) (meta l))) (defn merge-meta "This variation of the merge-meta in gll does nothing if obj is not something that can have a metamap attached." [obj metamap] (if #?(:clj (instance? clojure.lang.IObj obj) :cljs (satisfies? IWithMeta obj)) (instaparse.gll/merge-meta obj metamap) obj)) (defn- enlive-transform [transform-map parse-tree] (let [transform (transform-map (:tag parse-tree))] (cond transform (merge-meta (apply transform (map (partial enlive-transform transform-map) (:content parse-tree))) (meta parse-tree)) (:tag parse-tree) (assoc parse-tree :content (map (partial enlive-transform transform-map) (:content parse-tree))) :else parse-tree))) (defn- hiccup-transform [transform-map parse-tree] (if (and (sequential? parse-tree) (seq parse-tree)) (if-let [transform (transform-map (first parse-tree))] (merge-meta (apply transform (map (partial hiccup-transform transform-map) (next parse-tree))) (meta parse-tree)) (with-meta (into [(first parse-tree)] (map (partial hiccup-transform transform-map) (next parse-tree))) (meta parse-tree))) parse-tree)) (defn transform "Takes a transform map and a parse tree (or seq of parse-trees). A transform map is a mapping from tags to functions that take a node's contents and return a replacement for the node, i.e., {:node-tag (fn [child1 child2 ...] node-replacement), :another-node-tag (fn [child1 child2 ...] node-replacement)}" [transform-map parse-tree] ; Detect what kind of tree this is (cond (string? parse-tree) ; This is a leaf of the tree that should pass through unchanged parse-tree (and (map? parse-tree) (:tag parse-tree)) ; This is an enlive tree-seq (enlive-transform transform-map parse-tree) (and (vector? parse-tree) (keyword? (first parse-tree))) ; This is a hiccup tree-seq (hiccup-transform transform-map parse-tree) (sequential? parse-tree) ; This is either a sequence of parse results, or a tree ; with a hidden root tag. (map-preserving-meta (partial transform transform-map) parse-tree) (instance? instaparse.gll.Failure parse-tree) ; pass failures through unchanged parse-tree :else (throw-illegal-argument-exception "Invalid parse-tree, not recognized as either enlive or hiccup format."))) instaparse-1.4.7/src/instaparse/util.cljc000066400000000000000000000004221311220471200204350ustar00rootroot00000000000000(ns instaparse.util) (defn throw-runtime-exception [& message] (-> (apply str message) #?(:clj RuntimeException.) throw)) (defn throw-illegal-argument-exception [& message] (-> (apply str message) #?(:clj IllegalArgumentException.) throw)) instaparse-1.4.7/src/instaparse/viz.clj000066400000000000000000000075651311220471200201440ustar00rootroot00000000000000(ns instaparse.viz (:import java.io.IOException)) (try (require '[rhizome.viz :as r]) (catch Exception e (require '[instaparse.viz-not-found :as r]))) (defn span "Takes a subtree of the parse tree and returns a [start-index end-index] pair indicating the span of text parsed by this subtree. start-index is inclusive and end-index is exclusive, as is customary with substrings. Returns nil if no span metadata is attached." [tree] (let [m (meta tree) s (:instaparse.gll/start-index m) e (:instaparse.gll/end-index m)] (when (and s e) [s e]))) (def rhizome-newline ;; Prior to Rhizome 0.2.5., \ was not an escape character so \n needed extra escaping. (when-let [escape-chars (try (ns-resolve (find-ns 'rhizome.dot) 'escapable-characters) (catch Exception e nil))] (if (= escape-chars "|{}\"") "\\n" "\n"))) (defn- hiccup-tree-viz "visualize instaparse hiccup output as a rhizome graph. Requires rhizome: https://github.com/ztellman/rhizome" [mytree options] (r/tree->image sequential? rest mytree :node->descriptor (fn [n] {:label (if (sequential? n) (apply str (first n) (when (span n) [rhizome-newline (span n)])) (with-out-str (pr n)))}) :options options)) (defn- enlive-tree-viz "visualize enlive trees" [mytree options] (r/tree->image (comp seq :content) :content mytree :node->descriptor (fn [n] {:label (if (and (map? n) (:tag n)) (apply str (:tag n) (when (span n) [rhizome-newline (span n)])) (with-out-str (pr n)))}) :options options)) (defn tree-type [tree] (cond (and (map? tree) (:tag tree)) :enlive (and (vector? tree) (keyword? (first tree))) :hiccup (empty? tree) :nil (seq? tree) :rootless :else :invalid)) (defn fake-root "Create a root for a rootless tree" [children] (case (tree-type (first children)) :enlive {:tag :hidden-root-tag :content children} :hiccup (into [:hidden-root-tag] children) :nil nil :invalid)) (defn tree-viz "Creates a graphviz visualization of the parse tree. Optional keyword arguments: :output-file :buffered-image (return a java.awt.image.BufferedImage object) or :output-file output-file (will save the tree image to output-file) :options options (options passed along to rhizome) Important: This function will only work if you have added rhizome to your dependencies, and installed graphviz on your system. See https://github.com/ztellman/rhizome for more information." [tree & {output-file :output-file options :options}] {:pre [(not= (tree-type tree) :invalid)]} (let [ttype (tree-type tree)] (if (= ttype :rootless) (tree-viz (fake-root tree) :output-file output-file :options options) (let [image (try (case (tree-type tree) :enlive (enlive-tree-viz tree options) (:hiccup :nil) (hiccup-tree-viz tree options)) (catch IOException e (throw (UnsupportedOperationException. "\n\nYou appear to have rhizome in your dependencies, but have not installed GraphViz on your system. \nSee https://github.com/ztellman/rhizome for more information.\n"))))] (cond (= output-file :buffered-image) image output-file (r/save-image image output-file) :else (r/view-image image))))))instaparse-1.4.7/src/instaparse/viz.cljs000066400000000000000000000007151311220471200203150ustar00rootroot00000000000000(ns instaparse.viz) (defn span "Takes a subtree of the parse tree and returns a [start-index end-index] pair indicating the span of text parsed by this subtree. start-index is inclusive and end-index is exclusive, as is customary with substrings. Returns nil if no span metadata is attached." [tree] (let [m (meta tree) s (:instaparse.gll/start-index m) e (:instaparse.gll/end-index m)] (when (and s e) [s e]))) instaparse-1.4.7/src/instaparse/viz_not_found.clj000066400000000000000000000031611311220471200222030ustar00rootroot00000000000000(ns instaparse.viz-not-found "This file is a stub, so that a meaningful error will be returned if rhizome is not in your project's dependencies") (defn view-tree [& args] (throw (UnsupportedOperationException. "\n\nVisualization of parse trees is only supported if you have rhizome among your project dependencies and graphviz installed on your computer.\n Visit https://github.com/ztellman/rhizome to find out the version info to put in your project.clj file and for links to the graphviz installer."))) (defn tree->image [& args] (throw (UnsupportedOperationException. "\n\nVisualization of parse trees is only supported if you have rhizome among your project dependencies and graphviz installed on your computer.\n Visit https://github.com/ztellman/rhizome to find out the version info to put in your project.clj file and for links to the graphviz installer."))) (defn view-image [& args] (throw (UnsupportedOperationException. "\n\nVisualization of parse trees is only supported if you have rhizome among your project dependencies and graphviz installed on your computer.\n Visit https://github.com/ztellman/rhizome to find out the version info to put in your project.clj file and for links to the graphviz installer."))) (defn save-image [& args] (throw (UnsupportedOperationException. "\n\nVisualization of parse trees is only supported if you have rhizome among your project dependencies and graphviz installed on your computer.\n Visit https://github.com/ztellman/rhizome to find out the version info to put in your project.clj file and for links to the graphviz installer.")))instaparse-1.4.7/test/000077500000000000000000000000001311220471200146445ustar00rootroot00000000000000instaparse-1.4.7/test/data/000077500000000000000000000000001311220471200155555ustar00rootroot00000000000000instaparse-1.4.7/test/data/abnf_uri.txt000066400000000000000000000056741311220471200201170ustar00rootroot00000000000000URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty URI-reference = URI / relative-ref absolute-URI = scheme ":" hier-part [ "?" query ] relative-ref = relative-part [ "?" query ] [ "#" fragment ] relative-part = "//" authority path-abempty / path-absolute / path-noscheme / path-empty scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / ".") authority = [ userinfo "@" ] host [ ":" port ] userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) host = IP-literal / IPv4address / reg-name port = *DIGIT IP-literal = "[" ( IPv6address / IPvFuture ) "]" IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" ) IPv6address = 6( h16 ":" ) ls32 / "::" 5( h16 ":" ) ls32 / [ h16 ] "::" 4( h16 ":" ) ls32 / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 / [ *4( h16 ":" ) h16 ] "::" ls32 / [ *5( h16 ":" ) h16 ] "::" h16 / [ *6( h16 ":" ) h16 ] "::" h16 = 1*4HEXDIG ls32 = ( h16 ":" h16 ) / IPv4address IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet dec-octet = DIGIT ; 0-9 / %x31-39 DIGIT ; 10-99 / "1" 2DIGIT ; 100-199 / "2" %x30-34 DIGIT ; 200-249 / "25" %x30-35 ; 250-255 reg-name = *( unreserved / pct-encoded / sub-delims ) path = path-abempty ; begins with "/" or is empty / path-absolute ; begins with "/" but not "//" / path-noscheme ; begins with a non-colon segment / path-rootless ; begins with a segment / path-empty ; zero characters path-abempty = *( "/" segment ) path-absolute = "/" [ segment-nz *( "/" segment ) ] path-noscheme = segment-nz-nc *( "/" segment ) path-rootless = segment-nz *( "/" segment ) path-empty = 0pchar segment = *pchar segment-nz = 1*pchar segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":" pchar = unreserved / pct-encoded / sub-delims / ":" / "@" query = *( pchar / "/" / "?" ) fragment = *( pchar / "/" / "?" ) pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" ; commentinstaparse-1.4.7/test/data/defparser_grammar.txt000066400000000000000000000000171311220471200217750ustar00rootroot00000000000000S = #'a' | 'b' instaparse-1.4.7/test/data/phone_uri.txt000066400000000000000000000032651311220471200203140ustar00rootroot00000000000000 telephone-uri = "tel:" telephone-subscriber telephone-subscriber = global-number / local-number global-number = global-number-digits *par local-number = local-number-digits *par context *par par = parameter / extension / isdn-subaddress isdn-subaddress = ";isub=" 1*uric extension = ";ext=" 1*phonedigit context = ";phone-context=" descriptor descriptor = domainname / global-number-digits global-number-digits = "+" *phonedigit DIGIT *phonedigit local-number-digits = *phonedigit-hex (HEXDIG / "*" / "#") *phonedigit-hex domainname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum / alphanum *( alphanum / "-" ) alphanum toplabel = ALPHA / ALPHA *( alphanum / "-" ) alphanum parameter = ";" pname ["=" pvalue ] pname = 1*( alphanum / "-" ) pvalue = 1*paramchar paramchar = param-unreserved / unreserved / pct-encoded unreserved = alphanum / mark mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")" pct-encoded = "%" HEXDIG HEXDIG param-unreserved = "[" / "]" / "/" / ":" / "&" / "+" / "$" phonedigit = DIGIT / [ visual-separator ] phonedigit-hex = HEXDIG / "*" / "#" / [ visual-separator ] visual-separator = "-" / "." / "(" / ")" alphanum = ALPHA / DIGIT reserved = ";" / "/" / "?" / ":" / "@" / "&" / "=" / "+" / "$" / "," uric = reserved / unreserved / pct-encodedinstaparse-1.4.7/test/instaparse/000077500000000000000000000000001311220471200170155ustar00rootroot00000000000000instaparse-1.4.7/test/instaparse/abnf_test.cljc000066400000000000000000000306411311220471200216230ustar00rootroot00000000000000(ns instaparse.abnf-test (:require #?(:clj [instaparse.core :refer [parser parses defparser]] :cljs [instaparse.core :refer [parser parses] :refer-macros [defparser]]) [instaparse.core-test :refer [parsers-similar?]] [instaparse.combinators :refer [abnf]] #?(:clj [clojure.test :refer [deftest are is]] :cljs [cljs.test]) #?(:clj [clojure.java.io :as io])) #?(:cljs (:require-macros [cljs.test :refer [is are deftest]]))) (defparser uri-parser "test/data/abnf_uri.txt" :input-format :abnf :instaparse.abnf/case-insensitive true) (defparser phone-uri-parser "test/data/phone_uri.txt" :input-format :abnf :instaparse.abnf/case-insensitive true) #?(:clj (deftest slurping-test (is (parsers-similar? uri-parser (binding [instaparse.abnf/*case-insensitive* true] (parser "test/data/abnf_uri.txt" :input-format :abnf :instaparse.abnf/case-insensitive true)) (binding [instaparse.abnf/*case-insensitive* true] (parser (io/resource "data/abnf_uri.txt") :input-format :abnf :instaparse.abnf/case-insensitive true)) (binding [instaparse.abnf/*case-insensitive* true] (parser (slurp "test/data/abnf_uri.txt") :input-format :abnf :instaparse.abnf/case-insensitive true))) "Verify that defparser, auto-slurp from string filename, auto-slurp from resource (URL), and manual slurp all return equivalent parsers."))) (deftest abnf-uri (are [x y] (= x y) (uri-parser "http://www.google.com") [:URI [:SCHEME [:ALPHA "h"] [:ALPHA "t"] [:ALPHA "t"] [:ALPHA "p"]] ":" [:HIER-PART "//" [:AUTHORITY [:HOST [:REG-NAME [:UNRESERVED [:ALPHA "w"]] [:UNRESERVED [:ALPHA "w"]] [:UNRESERVED [:ALPHA "w"]] [:UNRESERVED "."] [:UNRESERVED [:ALPHA "g"]] [:UNRESERVED [:ALPHA "o"]] [:UNRESERVED [:ALPHA "o"]] [:UNRESERVED [:ALPHA "g"]] [:UNRESERVED [:ALPHA "l"]] [:UNRESERVED [:ALPHA "e"]] [:UNRESERVED "."] [:UNRESERVED [:ALPHA "c"]] [:UNRESERVED [:ALPHA "o"]] [:UNRESERVED [:ALPHA "m"]]]]] [:PATH-ABEMPTY]]] (uri-parser "ftp://ftp.is.co.za/rfc/rfc1808.txt") [:URI [:SCHEME [:ALPHA "f"] [:ALPHA "t"] [:ALPHA "p"]] ":" [:HIER-PART "//" [:AUTHORITY [:HOST [:REG-NAME [:UNRESERVED [:ALPHA "f"]] [:UNRESERVED [:ALPHA "t"]] [:UNRESERVED [:ALPHA "p"]] [:UNRESERVED "."] [:UNRESERVED [:ALPHA "i"]] [:UNRESERVED [:ALPHA "s"]] [:UNRESERVED "."] [:UNRESERVED [:ALPHA "c"]] [:UNRESERVED [:ALPHA "o"]] [:UNRESERVED "."] [:UNRESERVED [:ALPHA "z"]] [:UNRESERVED [:ALPHA "a"]]]]] [:PATH-ABEMPTY "/" [:SEGMENT [:PCHAR [:UNRESERVED [:ALPHA "r"]]] [:PCHAR [:UNRESERVED [:ALPHA "f"]]] [:PCHAR [:UNRESERVED [:ALPHA "c"]]]] "/" [:SEGMENT [:PCHAR [:UNRESERVED [:ALPHA "r"]]] [:PCHAR [:UNRESERVED [:ALPHA "f"]]] [:PCHAR [:UNRESERVED [:ALPHA "c"]]] [:PCHAR [:UNRESERVED [:DIGIT "1"]]] [:PCHAR [:UNRESERVED [:DIGIT "8"]]] [:PCHAR [:UNRESERVED [:DIGIT "0"]]] [:PCHAR [:UNRESERVED [:DIGIT "8"]]] [:PCHAR [:UNRESERVED "."]] [:PCHAR [:UNRESERVED [:ALPHA "t"]]] [:PCHAR [:UNRESERVED [:ALPHA "x"]]] [:PCHAR [:UNRESERVED [:ALPHA "t"]]]]]]] (uri-parser "mailto:John.Doe@example.com") [:URI [:SCHEME [:ALPHA "m"] [:ALPHA "a"] [:ALPHA "i"] [:ALPHA "l"] [:ALPHA "t"] [:ALPHA "o"]] ":" [:HIER-PART [:PATH-ROOTLESS [:SEGMENT-NZ [:PCHAR [:UNRESERVED [:ALPHA "J"]]] [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "h"]]] [:PCHAR [:UNRESERVED [:ALPHA "n"]]] [:PCHAR [:UNRESERVED "."]] [:PCHAR [:UNRESERVED [:ALPHA "D"]]] [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "e"]]] [:PCHAR "@"] [:PCHAR [:UNRESERVED [:ALPHA "e"]]] [:PCHAR [:UNRESERVED [:ALPHA "x"]]] [:PCHAR [:UNRESERVED [:ALPHA "a"]]] [:PCHAR [:UNRESERVED [:ALPHA "m"]]] [:PCHAR [:UNRESERVED [:ALPHA "p"]]] [:PCHAR [:UNRESERVED [:ALPHA "l"]]] [:PCHAR [:UNRESERVED [:ALPHA "e"]]] [:PCHAR [:UNRESERVED "."]] [:PCHAR [:UNRESERVED [:ALPHA "c"]]] [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "m"]]]]]]] (uri-parser "tel:+1-816-555-1212") [:URI [:SCHEME [:ALPHA "t"] [:ALPHA "e"] [:ALPHA "l"]] ":" [:HIER-PART [:PATH-ROOTLESS [:SEGMENT-NZ [:PCHAR [:SUB-DELIMS "+"]] [:PCHAR [:UNRESERVED [:DIGIT "1"]]] [:PCHAR [:UNRESERVED "-"]] [:PCHAR [:UNRESERVED [:DIGIT "8"]]] [:PCHAR [:UNRESERVED [:DIGIT "1"]]] [:PCHAR [:UNRESERVED [:DIGIT "6"]]] [:PCHAR [:UNRESERVED "-"]] [:PCHAR [:UNRESERVED [:DIGIT "5"]]] [:PCHAR [:UNRESERVED [:DIGIT "5"]]] [:PCHAR [:UNRESERVED [:DIGIT "5"]]] [:PCHAR [:UNRESERVED "-"]] [:PCHAR [:UNRESERVED [:DIGIT "1"]]] [:PCHAR [:UNRESERVED [:DIGIT "2"]]] [:PCHAR [:UNRESERVED [:DIGIT "1"]]] [:PCHAR [:UNRESERVED [:DIGIT "2"]]]]]]] (uri-parser "telnet://192.0.2.16:80/") [:URI [:SCHEME [:ALPHA "t"] [:ALPHA "e"] [:ALPHA "l"] [:ALPHA "n"] [:ALPHA "e"] [:ALPHA "t"]] ":" [:HIER-PART "//" [:AUTHORITY [:HOST [:REG-NAME [:UNRESERVED [:DIGIT "1"]] [:UNRESERVED [:DIGIT "9"]] [:UNRESERVED [:DIGIT "2"]] [:UNRESERVED "."] [:UNRESERVED [:DIGIT "0"]] [:UNRESERVED "."] [:UNRESERVED [:DIGIT "2"]] [:UNRESERVED "."] [:UNRESERVED [:DIGIT "1"]] [:UNRESERVED [:DIGIT "6"]]]] ":" [:PORT [:DIGIT "8"] [:DIGIT "0"]]] [:PATH-ABEMPTY "/" [:SEGMENT]]]] (uri-parser "urn:oasis:names:specification:docbook:dtd:xml:4.1.2") [:URI [:SCHEME [:ALPHA "u"] [:ALPHA "r"] [:ALPHA "n"]] ":" [:HIER-PART [:PATH-ROOTLESS [:SEGMENT-NZ [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "a"]]] [:PCHAR [:UNRESERVED [:ALPHA "s"]]] [:PCHAR [:UNRESERVED [:ALPHA "i"]]] [:PCHAR [:UNRESERVED [:ALPHA "s"]]] [:PCHAR ":"] [:PCHAR [:UNRESERVED [:ALPHA "n"]]] [:PCHAR [:UNRESERVED [:ALPHA "a"]]] [:PCHAR [:UNRESERVED [:ALPHA "m"]]] [:PCHAR [:UNRESERVED [:ALPHA "e"]]] [:PCHAR [:UNRESERVED [:ALPHA "s"]]] [:PCHAR ":"] [:PCHAR [:UNRESERVED [:ALPHA "s"]]] [:PCHAR [:UNRESERVED [:ALPHA "p"]]] [:PCHAR [:UNRESERVED [:ALPHA "e"]]] [:PCHAR [:UNRESERVED [:ALPHA "c"]]] [:PCHAR [:UNRESERVED [:ALPHA "i"]]] [:PCHAR [:UNRESERVED [:ALPHA "f"]]] [:PCHAR [:UNRESERVED [:ALPHA "i"]]] [:PCHAR [:UNRESERVED [:ALPHA "c"]]] [:PCHAR [:UNRESERVED [:ALPHA "a"]]] [:PCHAR [:UNRESERVED [:ALPHA "t"]]] [:PCHAR [:UNRESERVED [:ALPHA "i"]]] [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "n"]]] [:PCHAR ":"] [:PCHAR [:UNRESERVED [:ALPHA "d"]]] [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "c"]]] [:PCHAR [:UNRESERVED [:ALPHA "b"]]] [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "k"]]] [:PCHAR ":"] [:PCHAR [:UNRESERVED [:ALPHA "d"]]] [:PCHAR [:UNRESERVED [:ALPHA "t"]]] [:PCHAR [:UNRESERVED [:ALPHA "d"]]] [:PCHAR ":"] [:PCHAR [:UNRESERVED [:ALPHA "x"]]] [:PCHAR [:UNRESERVED [:ALPHA "m"]]] [:PCHAR [:UNRESERVED [:ALPHA "l"]]] [:PCHAR ":"] [:PCHAR [:UNRESERVED [:DIGIT "4"]]] [:PCHAR [:UNRESERVED "."]] [:PCHAR [:UNRESERVED [:DIGIT "1"]]] [:PCHAR [:UNRESERVED "."]] [:PCHAR [:UNRESERVED [:DIGIT "2"]]]]]]] (uri-parser "ldap://[2001:db8::7]/c=GB?objectClass?one") [:URI [:SCHEME [:ALPHA "l"] [:ALPHA "d"] [:ALPHA "a"] [:ALPHA "p"]] ":" [:HIER-PART "//" [:AUTHORITY [:HOST [:IP-LITERAL "[" [:IPV6ADDRESS [:H16 [:HEXDIG "2"] [:HEXDIG "0"] [:HEXDIG "0"] [:HEXDIG "1"]] ":" [:H16 [:HEXDIG "d"] [:HEXDIG "b"] [:HEXDIG "8"]] "::" [:H16 [:HEXDIG "7"]]] "]"]]] [:PATH-ABEMPTY "/" [:SEGMENT [:PCHAR [:UNRESERVED [:ALPHA "c"]]] [:PCHAR [:SUB-DELIMS "="]] [:PCHAR [:UNRESERVED [:ALPHA "G"]]] [:PCHAR [:UNRESERVED [:ALPHA "B"]]]]]] "?" [:QUERY [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "b"]]] [:PCHAR [:UNRESERVED [:ALPHA "j"]]] [:PCHAR [:UNRESERVED [:ALPHA "e"]]] [:PCHAR [:UNRESERVED [:ALPHA "c"]]] [:PCHAR [:UNRESERVED [:ALPHA "t"]]] [:PCHAR [:UNRESERVED [:ALPHA "C"]]] [:PCHAR [:UNRESERVED [:ALPHA "l"]]] [:PCHAR [:UNRESERVED [:ALPHA "a"]]] [:PCHAR [:UNRESERVED [:ALPHA "s"]]] [:PCHAR [:UNRESERVED [:ALPHA "s"]]] "?" [:PCHAR [:UNRESERVED [:ALPHA "o"]]] [:PCHAR [:UNRESERVED [:ALPHA "n"]]] [:PCHAR [:UNRESERVED [:ALPHA "e"]]]]])) (deftest phone-uri (are [x y] (= x y) (phone-uri-parser "tel:+1-201-555-0123") [:TELEPHONE-URI "tel:" [:TELEPHONE-SUBSCRIBER [:GLOBAL-NUMBER [:GLOBAL-NUMBER-DIGITS "+" [:DIGIT "1"] [:PHONEDIGIT [:VISUAL-SEPARATOR "-"]] [:PHONEDIGIT [:DIGIT "2"]] [:PHONEDIGIT [:DIGIT "0"]] [:PHONEDIGIT [:DIGIT "1"]] [:PHONEDIGIT [:VISUAL-SEPARATOR "-"]] [:PHONEDIGIT [:DIGIT "5"]] [:PHONEDIGIT [:DIGIT "5"]] [:PHONEDIGIT [:DIGIT "5"]] [:PHONEDIGIT [:VISUAL-SEPARATOR "-"]] [:PHONEDIGIT [:DIGIT "0"]] [:PHONEDIGIT [:DIGIT "1"]] [:PHONEDIGIT [:DIGIT "2"]] [:PHONEDIGIT [:DIGIT "3"]]]]]])) (def abnf-german "Testing the ABNF regular expressions" (parser " ; a parser for the German programming language ; http://esolangs.org/wiki/German S = <*1space> (A / B) *( (A / B)) <*1space> A = #'BEER' B = #'SCHNITZEL' space = #'\\s+' " :input-format :abnf)) (deftest german (are [x y] (= x y) (abnf-german " BEER SCHNITZEL BEER BEER SCHNITZEL SCHNITZEL BEER BEER BEER ") [:S [:A "BEER"] [:B "SCHNITZEL"] [:A "BEER"] [:A "BEER"] [:B "SCHNITZEL"] [:B "SCHNITZEL"] [:A "BEER"] [:A "BEER"] [:A "BEER"]])) (def abnf-abc "Trying the \"equal amount of A's, B's, and C's\" parser in ABNF, to test the lookahead" (parser "S = &(A 'c') 1*'a' B A = 'a' [A] 'b' = 'b' [B] 'c'" :input-format :abnf)) (deftest abc (are [x y] (= x y) (abnf-abc "aaaabbbbcccc") [:S "a" "a" "a" "a" "b" "b" "b" "b" "c" "c" "c" "c"] (abnf-abc "aaabbbc" :total true) [:S "a" "a" "a" "b" "b" "b" "c" [:instaparse/failure ""] [:instaparse/failure ""]])) (def reps "Testing the different kinds of repetitions" (parser "S = A B C D E FG A = *'a' B = 2*'b' C = *2'c' D = 2'd' E = 2*4'e' FG = 2('f' 'g')" :input-format :abnf)) (deftest rep-test (are [x] (not (instance? instaparse.gll.Failure x)) (reps "aabbccddeefgfg") (reps "bbbbbbddeeeefgfg") (reps "bbcddeefgfg"))) (deftest rep-test-errors (are [x] (instance? instaparse.gll.Failure x) (reps "") (reps "bccddeefgfg") (reps "aaaabbbbcccddeefgfg") (reps "aabbccddeefg") (reps "aabbccddeeffgg"))) (def regex-chars "Testing %d42-91. The boundary chars are \"*\" and \"[\", which normally aren't allowed in a regex." (parser "S = %d42-91" :input-format :abnf)) (deftest regex-char-test (doseq [i (range 1 (inc 100)) :let [c (char i)]] (if (<= 42 i 91) (is (not (instance? instaparse.gll.Failure (regex-chars (str c))))) (is (instance? instaparse.gll.Failure (regex-chars (str c))))))) (deftest unicode-test (let [poop "\uD83D\uDCA9"] ; U+1F4A9 PILE OF POO (let [parser1 (parser "S = %x1F4A9" :input-format :abnf)] (are [x y] (= x y) (parses parser1 poop) [[:S poop]]) (are [x] (instance? instaparse.gll.Failure x) (parser1 (str poop poop)) (parser1 (str (first poop))) ;; shouldn't work on the surrogate characters individually (parser1 (str (second poop))))) (let [parser2 (parser "S = %x1F4A8-1F4A9" :input-format :abnf)] (are [x y] (= x y) (parses parser2 poop) [[:S poop]]) (are [x] (instance? instaparse.gll.Failure x) (parser2 (str poop poop)) (parser2 (str (first poop))) (parser2 (str (second poop))))) (let [parser3 (parser "S = %x1F4A9.1F4A9.1F4A9" :input-format :abnf)] (are [x y] (= x y) (parses parser3 (str poop poop poop)) [[:S poop poop poop]]) (are [x] (instance? instaparse.gll.Failure x) (parser3 (str poop)))) ;; it would be cool if EBNF supported unicode in a parser spec ;; (ABNF doesn't allow that though) (let [parser4 (parser (str "S = '" poop "'*"))] (are [x y] (= x y) (parses parser4 (str poop poop poop)) [[:S poop poop poop]]) (are [x] (instance? instaparse.gll.Failure x) (parser4 (str (first poop))) (parser4 (str (second poop))) (parser4 (str poop poop (first poop))))))) (deftest abnf-combinator-test (let [p (parser (merge {:S (abnf "A / B")} (abnf " = 1*'a'") {:B (abnf "'='")}) :start :S)] (are [x y] (= y x) (p "aAaa") [:S "a" "a" "a" "a"] (p "=") [:S [:B "="]]))) instaparse-1.4.7/test/instaparse/auto_flatten_seq_test.cljc000066400000000000000000000031551311220471200242520ustar00rootroot00000000000000(ns instaparse.auto-flatten-seq-test (:require [instaparse.auto-flatten-seq :refer [auto-flatten-seq conj-flat convert-afs-to-vec]] #?(:clj [clojure.test :refer [deftest are is]] :cljs [cljs.test])) #?(:cljs (:require-macros [cljs.test :refer [deftest are is]]))) (defn rand-mutation [v iv] (let [rnd (int (rand-int 3))] (case rnd 0 (let [n (rand-int 50000)] [(conj v n) (conj-flat iv n) rnd]) 2 (let [i (rand-int 64), r (auto-flatten-seq (repeat i (rand-int 50000)))] [(into v r) (conj-flat iv r) rnd]) 1 (let [i (rand-int 64), r (auto-flatten-seq (repeat i (rand-int 50000)))] [(into (vec (seq r)) v) (conj-flat r iv) rnd])))) (deftest rand-incremental-vector-test (is (= (conj-flat (auto-flatten-seq [:s]) nil) [:s])) (loop [v (vec (range 100)) iv (auto-flatten-seq (range 100)) n 50 loops 20] (let [[v iv rnd] (rand-mutation v iv)] (cond (zero? loops) nil (zero? n) (recur (vec (range 100)) (auto-flatten-seq (range 100)) 50 (dec loops)) :else (do (is (= (count v) (count iv))) (is (= v iv)) (is (= iv v)) (is (= (hash v) (hash iv))) (is (= (seq v) (seq iv))) (is (= v (convert-afs-to-vec iv))) (is (= (convert-afs-to-vec iv) v)) (is (= (type (empty (convert-afs-to-vec iv))) (type v))) (is (= (hash v) (hash (convert-afs-to-vec iv)))) (recur v iv (dec n) loops)))))) (defn depth [v] (cond (empty? v) 0 (sequential? (first v)) (max (inc (depth (first v))) (depth (rest v))) :else (depth (rest v)))) instaparse-1.4.7/test/instaparse/core_test.cljc000066400000000000000000000657701311220471200216600ustar00rootroot00000000000000(ns instaparse.core-test #?(:clj (:refer-clojure :exclude [cat read-string])) (:require #?(:clj [clojure.test :refer [deftest are is]] :cljs [cljs.test :as t]) #?(:clj [clojure.edn :refer [read-string]] :cljs [cljs.reader :refer [read-string]]) #?(:clj [instaparse.core :as insta :refer [defparser]] :cljs [instaparse.core :as insta :refer-macros [defparser]]) [instaparse.cfg :refer [ebnf]] [instaparse.line-col :as lc] [instaparse.combinators-source :refer [Epsilon opt plus star rep alt ord cat string-ci string string-ci regexp nt look neg hide hide-tag]] [clojure.walk :as walk]) #?(:cljs (:require-macros [cljs.test :refer [is are deftest run-tests]]))) (defn parsers-similar? "Tests if parsers are equal." [& parsers] (->> parsers ;; Ugh. Regexes have to be specially handled because ;; (= #"a" #"a") => false (map (partial walk/prewalk (fn [form] (cond (instance? instaparse.core.Parser form) (into {} form) (instance? #?(:clj java.util.regex.Pattern :cljs js/RegExp) form) [::regex (str form)] :else form)))) (apply =))) (def as-and-bs (insta/parser "S = AB* AB = A B A = 'a'+ B = 'b'+")) (def as-and-bs-regex (insta/parser "S = AB* AB = A B A = #'a'+ B = #'b'+")) (def long-string (apply str (concat (repeat 20000 \a) (repeat 20000 \b)))) (def as-and-bs-alternative (insta/parser "S:={AB} ; AB ::= (A, B) A : \"a\" + ; B ='b' + ;")) (def as-and-bs-enlive (insta/parser "S = AB* AB = A B A = 'a'+ B = 'b'+" :output-format :enlive)) (def as-and-bs-variation1 (insta/parser "S = AB* AB = 'a'+ 'b'+")) (def as-and-bs-variation2 (insta/parser "S = ('a'+ 'b'+)*")) (def paren-ab (insta/parser "paren-wrapped = '(' seq-of-A-or-B ')' seq-of-A-or-B = ('a' | 'b')*")) (def paren-ab-hide-parens (insta/parser "paren-wrapped = <'('> seq-of-A-or-B <')'> seq-of-A-or-B = ('a' | 'b')*")) (def paren-ab-manually-flattened (insta/parser "paren-wrapped = <'('> ('a' | 'b')* <')'>")) (def paren-ab-hide-tag (insta/parser "paren-wrapped = <'('> seq-of-A-or-B <')'> = ('a' | 'b')*")) (def paren-ab-hide-both-tags (insta/parser " = <'('> seq-of-A-or-B <')'> = ('a' | 'b')*")) (def addition (insta/parser "plus = plus <'+'> plus | num num = #'[0-9]'+")) (def addition-e (insta/parser "plus = plus <'+'> plus | num num = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'" :output-format :enlive)) (def words-and-numbers (insta/parser "sentence = token ( token)* = word | number whitespace = #'\\s+' word = #'[a-zA-Z]+' number = #'[0-9]+'")) (def words-and-numbers-one-character-at-a-time (insta/parser "sentence = token ( token)* = word | number whitespace = #'\\s+' word = letter+ number = digit+ = #'[a-zA-Z]' = #'[0-9]'")) (def words-and-numbers-enlive (insta/parser "sentence = token ( token)* = word | number whitespace = #'\\s+' word = letter+ number = digit+ = #'[a-zA-Z]' = #'[0-9]'" :output-format :enlive)) (defparser words-and-numbers-enlive-defparser "sentence = token ( token)* = word | number whitespace = #'\\s+' word = letter+ number = digit+ = #'[a-zA-Z]' = #'[0-9]'" :output-format :enlive) (insta/transform {:word str, :number (comp read-string str)} (words-and-numbers-one-character-at-a-time "abc 123 def")) (def ambiguous (insta/parser "S = A A A = 'a'*")) (def not-ambiguous (insta/parser "S = A A A = #'a*'")) (def repeated-a (insta/parser "S = 'a'+")) (def lookahead-example (insta/parser "S = &'ab' ('a' | 'b')+")) (def negative-lookahead-example (insta/parser "S = !'ab' ('a' | 'b')+")) (def abc (insta/parser "S = &(A 'c') 'a'+ B A = 'a' A? 'b' = 'b' B? 'c'")) (def abc-grammar-map {:S (cat (look (cat (nt :A) (string "c"))) (plus (string "a")) (nt :B)) :A (cat (string "a") (opt (nt :A)) (string "b")) :B (hide-tag (cat (string "b") (opt (nt :B)) (string "c")))}) (def ambiguous-tokenizer (insta/parser "sentence = token ( token)* = keyword | identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keyword = 'cond' | 'defn'")) (def unambiguous-tokenizer (insta/parser "sentence = token ( token)* = keyword | !keyword identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keyword = 'cond' | 'defn'")) (def unambiguous-tokenizer-improved (insta/parser "sentence = token ( token)* = keyword | !keyword identifier whitespace = #'\\s+' end-of-string = !#'[\\s\\S]' identifier = #'[a-zA-Z]+' keyword = ('cond' | 'defn') &(whitespace | end-of-string)")) (def unambiguous-tokenizer-improved2 (insta/parser "sentence = token ( token)* = keyword | !(keyword (whitespace | end-of-string)) identifier whitespace = #'\\s+' end-of-string = !#'[\\s\\S]' identifier = #'[a-zA-Z]+' keyword = 'cond' | 'defn'")) (def unambiguous-tokenizer-improved3 (insta/parser "sentence = token ( token)* = keyword | !keyword identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keyword = #'cond\\b' | #'defn\\b'")) (def preferential-tokenizer (insta/parser "sentence = token ( token)* = keyword / identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keyword = 'cond' | 'defn'")) (def ord-test (insta/parser "S = Even / Odd Even = 'aa'* Odd = 'a'+")) (def ord2-test (insta/parser "S = token ( token)* ws = #'\\s+' keyword = 'hello' | 'bye' identifier = #'\\S+' token = keyword / identifier ")) (def even-odd (insta/parser "S = Even | Odd eos = !#'.' Even = 'aa'* Odd = !(Even eos) 'a'+")) (def arithmetic (insta/parser "expr = add-sub = mul-div | add | sub add = add-sub <'+'> mul-div sub = add-sub <'-'> mul-div = term | mul | div mul = mul-div <'*'> term div = mul-div <'/'> term = number | <'('> add-sub <')'> number = #'[0-9]+'")) (def combo-build-example (insta/parser (merge {:S (alt (nt :A) (nt :B))} (ebnf "A = 'a'*") {:B (ebnf "'b'+")}) :start :S)) (def tricky-ebnf-build "https://github.com/Engelberg/instaparse/issues/107" (insta/parser (merge {:S (alt (nt :A) (nt :B))} (ebnf " = '='*") {:B (ebnf "'b' '='")}) :start :S)) (def whitespace (insta/parser "whitespace = #'\\s+'")) (def auto-whitespace-example (insta/parser "S = A B = 'foo' = #'\\d+'" :auto-whitespace whitespace)) (def words-and-numbers-auto-whitespace (insta/parser "sentence = token+ = word | number word = #'[a-zA-Z]+' number = #'[0-9]+'" :auto-whitespace whitespace)) (def auto-whitespace-example2 (insta/parser "S = A B = 'foo' = #'\\d+'" :auto-whitespace :standard)) (def words-and-numbers-auto-whitespace2 (insta/parser "sentence = token+ = word | number word = #'[a-zA-Z]+' number = #'[0-9]+'" :auto-whitespace :standard)) (def whitespace-or-comments-v1 (insta/parser "ws-or-comment = #'\\s+' | comment comment = '(*' inside-comment* '*)' inside-comment = ( !('*)' | '(*') #'.' ) | comment")) (def whitespace-or-comments-v2 (insta/parser "ws-or-comments = #'\\s+' | comments comments = comment+ comment = '(*' inside-comment* '*)' inside-comment = !( '*)' | '(*' ) #'.' | comment")) (def whitespace-or-comments (insta/parser "ws-or-comments = #'\\s+' | comments comments = comment+ comment = '(*' inside-comment* '*)' inside-comment = !'*)' !'(*' #'.' | comment" :auto-whitespace whitespace)) (def words-and-numbers-auto-whitespace-and-comments (insta/parser "sentence = token+ = word | number word = #'[a-zA-Z]+' number = #'[0-9]+'" :auto-whitespace whitespace-or-comments)) (def eat-a (insta/parser "Aeater = #'[a]'+" :output-format :enlive)) (def int-or-double (insta/parser "ws = #'\\s+'; Int = #'[0-9]+'; Double = #'[0-9]+\\.[0-9]*|\\.[0-9]+'; = Int | Double; Input = ConstExpr ConstExpr;" :start :Input)) (deftest parsing-tutorial (are [x y] (= x y) (as-and-bs "aaaaabbbaaaabb") [:S [:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]] #?@(:clj [(as-and-bs (StringBuilder. "aaaaabbbaaaabb")) [:S [:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]]) (as-and-bs "aaaaabbbaaaabb") (as-and-bs "aaaaabbbaaaabb" :optimize :memory) (as-and-bs-enlive "aaaaabbbaaaabb") '{:tag :S, :content ({:tag :AB, :content ({:tag :A, :content ("a" "a" "a" "a" "a")} {:tag :B, :content ("b" "b" "b")})} {:tag :AB, :content ({:tag :A, :content ("a" "a" "a" "a")} {:tag :B, :content ("b" "b")})})} (as-and-bs-enlive "aaaaabbbaaaabb") (as-and-bs-enlive "aaaaabbbaaaabb" :optimize :memory) (as-and-bs-variation1 "aaaaabbbaaaabb") [:S [:AB "a" "a" "a" "a" "a" "b" "b" "b"] [:AB "a" "a" "a" "a" "b" "b"]] (as-and-bs-variation1 "aaaaabbbaaaabb") (as-and-bs-variation1 "aaaaabbbaaaabb" :optimize :memory) (as-and-bs-variation2 "aaaaabbbaaaabb") [:S "a" "a" "a" "a" "a" "b" "b" "b" "a" "a" "a" "a" "b" "b"] (as-and-bs-variation2 "aaaaabbbaaaabb") (as-and-bs-variation2 "aaaaabbbaaaabb" :optimize :memory) (paren-ab "(aba)") [:paren-wrapped "(" [:seq-of-A-or-B "a" "b" "a"] ")"] (paren-ab "(aba)") (paren-ab "(aba)" :optimize :memory) (paren-ab-hide-parens "(aba)") [:paren-wrapped [:seq-of-A-or-B "a" "b" "a"]] (paren-ab-hide-parens "(aba)") (paren-ab-hide-parens "(aba)" :optimize :memory) (paren-ab-manually-flattened "(aba)") [:paren-wrapped "a" "b" "a"] (paren-ab-manually-flattened "(aba)") (paren-ab-manually-flattened "(aba)" :optimize :memory) (paren-ab-hide-tag "(aba)") [:paren-wrapped "a" "b" "a"] (paren-ab-hide-tag "(aba)") (paren-ab-hide-tag "(aba)" :optimize :memory) (insta/transform {:num read-string :plus +} (addition "1+2+3+4+5")) 15 (insta/transform {:num read-string :plus +} (insta/parses addition "1+2+3+4+5")) (repeat 14 15) (insta/transform {:num read-string :plus +} (addition-e "1+2+3+4+5")) 15 ((insta/parser "S = 'a' S | '' ") "aaaa") [:S "a" [:S "a" [:S "a" [:S "a" [:S]]]]] ((insta/parser "S = 'a' S | '' ") "aaaa") ((insta/parser "S = 'a' S | '' ") "aaaa" :optimize :memory) ((insta/parser "S = S 'a' | Epsilon") "aaaa") [:S [:S [:S [:S [:S] "a"] "a"] "a"] "a"] ((insta/parser "S = S 'a' | Epsilon") "aaaa") ((insta/parser "S = S 'a' | Epsilon") "aaaa" :optimize :memory) (set (insta/parses ambiguous "aaaaaa")) (set '([:S [:A "a"] [:A "a" "a" "a" "a" "a"]] [:S [:A "a" "a" "a" "a" "a" "a"] [:A]] [:S [:A "a" "a"] [:A "a" "a" "a" "a"]] [:S [:A "a" "a" "a"] [:A "a" "a" "a"]] [:S [:A "a" "a" "a" "a"] [:A "a" "a"]] [:S [:A "a" "a" "a" "a" "a"] [:A "a"]] [:S [:A] [:A "a" "a" "a" "a" "a" "a"]])) (insta/parses not-ambiguous "aaaaaa") '([:S [:A "aaaaaa"] [:A ""]]) (lookahead-example "abaaaab") [:S "a" "b" "a" "a" "a" "a" "b"] (lookahead-example "abaaaab") (lookahead-example "abaaaab" :optimize :memory) (insta/failure? (lookahead-example "bbaaaab")) true (lookahead-example "bbaaaab") (lookahead-example "bbaaaab" :optimize :memory) (insta/failure? (negative-lookahead-example "abaaaab")) true (insta/parses (insta/parser "Regex = (CharNonRange | Range) + Range = Char <'-'> Char CharNonRange = Char ! ('-' Char) Char = #'[-x]' | 'c' (! 'd') 'x'") "x-cx") '([:Regex [:Range [:Char "x"] [:Char "c" "x"]]]) (negative-lookahead-example "abaaaab") (negative-lookahead-example "abaaaab" :optimize :memory) (negative-lookahead-example "bbaaaab") [:S "b" "b" "a" "a" "a" "a" "b"] (negative-lookahead-example "bbaaaab") (negative-lookahead-example "bbaaaab" :optimize :memory) (insta/parses ambiguous-tokenizer "defn my cond") '([:sentence [:identifier "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:identifier "defn"] [:identifier "my"] [:keyword "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:keyword "cond"]]) (insta/parses unambiguous-tokenizer "defn my cond") '([:sentence [:keyword "defn"] [:identifier "my"] [:keyword "cond"]]) (insta/parses preferential-tokenizer "defn my cond") '([:sentence [:keyword "defn"] [:identifier "my"] [:keyword "cond"]] [:sentence [:identifier "defn"] [:identifier "my"] [:keyword "cond"]] [:sentence [:keyword "defn"] [:identifier "my"] [:identifier "cond"]] [:sentence [:identifier "defn"] [:identifier "my"] [:identifier "cond"]]) (insta/parses repeated-a "aaaaaa") '([:S "a" "a" "a" "a" "a" "a"]) (insta/parse repeated-a "aaaaaa" :partial true) [:S "a"] (insta/parses repeated-a "aaaaaa" :partial true) '([:S "a"] [:S "a" "a"] [:S "a" "a" "a"] [:S "a" "a" "a" "a"] [:S "a" "a" "a" "a" "a"] [:S "a" "a" "a" "a" "a" "a"]) (words-and-numbers-one-character-at-a-time "abc 123 def") [:sentence [:word "a" "b" "c"] [:number "1" "2" "3"] [:word "d" "e" "f"]] (words-and-numbers-one-character-at-a-time "abc 123 def") (words-and-numbers-one-character-at-a-time "abc 123 def" :optimize :memory) (insta/transform {:word str, :number (comp read-string str)} (words-and-numbers-one-character-at-a-time "abc 123 def")) [:sentence "abc" 123 "def"] (->> (words-and-numbers-enlive "abc 123 def") (insta/transform {:word str, :number (comp read-string str)})) {:tag :sentence, :content ["abc" 123 "def"]} (->> (words-and-numbers-enlive-defparser "abc 123 def") (insta/transform {:word str, :number (comp read-string str)})) {:tag :sentence, :content ["abc" 123 "def"]} (arithmetic "1-2/(3-4)+5*6") [:expr [:add [:sub [:number "1"] [:div [:number "2"] [:sub [:number "3"] [:number "4"]]]] [:mul [:number "5"] [:number "6"]]]] (arithmetic "1-2/(3-4)+5*6") (arithmetic "1-2/(3-4)+5*6" :optimize :memory) (->> (arithmetic "1-2/(3-4)+5*6") (insta/transform {:add +, :sub -, :mul *, :div /, :number read-string :expr identity})) 33 (paren-ab-hide-both-tags "(aba)") '("a" "b" "a") (paren-ab-hide-both-tags "(aba)") (paren-ab-hide-both-tags "(aba)" :optimize :memory) (combo-build-example "aaaaa") [:S [:A "a" "a" "a" "a" "a"]] (combo-build-example "aaaaa") (combo-build-example "aaaaa" :optimize :memory) (combo-build-example "bbbbb") [:S [:B "b" "b" "b" "b" "b"]] (combo-build-example "bbbbb") (combo-build-example "bbbbb" :optimize :memory) (tricky-ebnf-build "===") [:S "=" "=" "="] (tricky-ebnf-build "b=") [:S [:B "b" "="]] ((insta/parser "S = ('a'?)+") "") [:S] ((insta/parser "S = ('a'?)+") "") ((insta/parser "S = ('a'?)+") "" :optimize :memory) ((insta/parser "a = b c . b = 'b' . c = 'c' .") "bc") [:a [:b "b"] [:c "c"]] (paren-ab-hide-parens "(ababa)" :unhide :content) [:paren-wrapped "(" [:seq-of-A-or-B "a" "b" "a" "b" "a"] ")"] (paren-ab-hide-parens "(ababa)" :unhide :all) [:paren-wrapped "(" [:seq-of-A-or-B "a" "b" "a" "b" "a"] ")"] (paren-ab-hide-tag "(ababa)" :unhide :tags) [:paren-wrapped [:seq-of-A-or-B "a" "b" "a" "b" "a"]] (paren-ab-hide-tag "(ababa)" :unhide :all) [:paren-wrapped "(" [:seq-of-A-or-B "a" "b" "a" "b" "a"] ")"] (insta/parses words-and-numbers "ab 123 cd" :unhide :all) '([:sentence [:token [:word "ab"]] [:whitespace " "] [:token [:number "123"]] [:whitespace " "] [:token [:word "cd"]]]) ((insta/parser "S = epsilon") "") [:S] (words-and-numbers-auto-whitespace " abc 123 45 de ") [:sentence [:word "abc"] [:number "123"] [:number "45"] [:word "de"]] (words-and-numbers-auto-whitespace2 " abc 123 45 de ") [:sentence [:word "abc"] [:number "123"] [:number "45"] [:word "de"]] (words-and-numbers-auto-whitespace-and-comments " abc 123 (* 456 *) (* (* 7*) 89 *) def ") [:sentence [:word "abc"] [:number "123"] [:word "def"]] (insta/parses eat-a "aaaaaaaabbbbbb" :total true) '({:tag :Aeater, :content ("a" "a" "a" "a" "a" "a" "a" "a" {:tag :instaparse/failure, :content ("bbbbbb")})}) (int-or-double "31 0.2") [:Input [:Int "31"] [:Double "0.2"]] ((insta/parser "S=#'\\s*'") " ") [:S " "] ((insta/parser "S = #'a+'") "aaaaaa") [:S "aaaaaa"] ((insta/parser "S = 'a' / eps") "a") [:S "a"] ((insta/parser "S = 'a' / eps") "") [:S] (insta/failure? ((insta/parser "S = 'a'+") "AaaAaa")) true ((insta/parser "S = 'a'+" :string-ci true) "AaaAaa") [:S "a" "a" "a" "a" "a" "a"] ((insta/parser "S = %x30.31" :input-format :abnf) "01") [:S "0" "1"] (auto-whitespace-example "foo 123") [:S "foo" "123"] (auto-whitespace-example2 "foo 123") [:S "foo" "123"] (insta/failure? ((insta/parser "f = #'asdf'" ) "")) true (insta/transform {:ADD +} [:ADD 10 5]) 15 (->> "a" ((insta/parser " = 'a'")) (insta/transform {})) '("a") )) (defn spans [t] (if (sequential? t) (cons (insta/span t) (map spans (next t))) t)) (defn spans-hiccup-tag [t] (if (sequential? t) (cons {:tag (first t) :span (insta/span t)} (map spans (next t))) t)) (defn spans-enlive [t] (if (map? t) (assoc t :span (insta/span t) :content (map spans-enlive (:content t))) t)) (deftest span-tests (are [x y] (= x y) (spans (as-and-bs "aaaabbbaabbab")) '([0 13] ([0 7] ([0 4] "a" "a" "a" "a") ([4 7] "b" "b" "b")) ([7 11] ([7 9] "a" "a") ([9 11] "b" "b")) ([11 13] ([11 12] "a") ([12 13] "b"))) (spans (as-and-bs "aaaabbbaabbab")) (spans (as-and-bs "aaaabbbaabbab" :optimize :memory)) (spans ((insta/parser "S = 'a' S | '' ") "aaaa")) '([0 4] "a" ([1 4] "a" ([2 4] "a" ([3 4] "a" ([4 4]))))) (spans ((insta/parser "S = 'a' S | '' ") "aaaa")) (spans ((insta/parser "S = 'a' S | '' ") "aaaa" :optimize :memory)) (spans (as-and-bs "aaaaabbbaacabb" :total true)) '([0 14] ([0 8] ([0 5] "a" "a" "a" "a" "a") ([5 8] "b" "b" "b")) ([8 14] ([8 10] "a" "a") ([10 14] ([10 14] "cabb")))) (spans (as-and-bs "aaaaabbbaacabb" :total true)) (spans (as-and-bs "aaaaabbbaacabb" :total true :optimize :memory)) (spans-enlive (as-and-bs-enlive "aaaaabbbaacabb" :total true)) '{:span [0 14], :tag :S, :content ({:span [0 8], :tag :AB, :content ({:span [0 5], :tag :A, :content ("a" "a" "a" "a" "a")} {:span [5 8], :tag :B, :content ("b" "b" "b")})} {:span [8 14], :tag :AB, :content ({:span [8 10], :tag :A, :content ("a" "a")} {:span [10 14], :tag :B, :content ({:span [10 14], :tag :instaparse/failure, :content ("cabb")})})})} (spans-enlive (as-and-bs-enlive "aaaabbbaabbab")) '{:span [0 13], :tag :S, :content ({:span [0 7], :tag :AB, :content ({:span [0 4], :tag :A, :content ("a" "a" "a" "a")} {:span [4 7], :tag :B, :content ("b" "b" "b")})} {:span [7 11], :tag :AB, :content ({:span [7 9], :tag :A, :content ("a" "a")} {:span [9 11], :tag :B, :content ("b" "b")})} {:span [11 13], :tag :AB, :content ({:span [11 12], :tag :A, :content ("a")} {:span [12 13], :tag :B, :content ("b")})})} (spans-enlive (as-and-bs-enlive "aaaabbbaabbab")) (spans-enlive (as-and-bs-enlive "aaaabbbaabbab" :optimize :memory)) (->> (words-and-numbers-enlive "abc 123 def") (insta/transform {:word (comp (partial array-map :word) str), :number (comp (partial array-map :number) read-string str)})) {:tag :sentence, :content [{:word "abc"} {:number 123} {:word "def"}]} (->> (words-and-numbers-enlive "abc 123 def") (insta/transform {:word (comp (partial array-map :word) str), :number (comp (partial array-map :number) read-string str)}) spans-enlive) '{:span [0 11], :tag :sentence, :content ({:content (), :span [0 3], :word "abc"} {:content (), :span [4 7], :number 123} {:content (), :span [8 11], :word "def"})})) (defn round-trip [parser] (insta/parser (prn-str parser))) (deftest round-trip-test (are [p] (= (prn-str p) (prn-str (round-trip p))) as-and-bs as-and-bs-regex as-and-bs-variation1 as-and-bs-variation2 paren-ab paren-ab-hide-parens paren-ab-manually-flattened paren-ab-hide-tag paren-ab-hide-both-tags addition addition-e words-and-numbers words-and-numbers-one-character-at-a-time ambiguous not-ambiguous repeated-a lookahead-example negative-lookahead-example abc ambiguous-tokenizer unambiguous-tokenizer preferential-tokenizer ord-test ord2-test even-odd arithmetic whitespace words-and-numbers-auto-whitespace whitespace-or-comments-v1 whitespace-or-comments-v2 whitespace-or-comments words-and-numbers-auto-whitespace eat-a int-or-double)) (defn hiccup-line-col-spans [t] (if (sequential? t) (cons (meta t) (map hiccup-line-col-spans (next t))) t)) (defn enlive-line-col-spans [t] (if (map? t) (cons (meta t) (map enlive-line-col-spans (:content t))) t)) (deftest line-col-test (let [text1 "abc\ndef\ng\nh\ni", h (words-and-numbers text1) e (words-and-numbers-enlive text1) hlc (lc/add-line-col-spans text1 h) elc (lc/add-line-col-spans text1 e)] (is (= (enlive-line-col-spans elc) '({:instaparse.gll/end-column 2, :instaparse.gll/end-line 5, :instaparse.gll/start-column 1, :instaparse.gll/start-line 1, :instaparse.gll/start-index 0, :instaparse.gll/end-index 13} ({:instaparse.gll/end-column 4, :instaparse.gll/end-line 1, :instaparse.gll/start-column 1, :instaparse.gll/start-line 1, :instaparse.gll/start-index 0, :instaparse.gll/end-index 3} "a" "b" "c") ({:instaparse.gll/end-column 4, :instaparse.gll/end-line 2, :instaparse.gll/start-column 1, :instaparse.gll/start-line 2, :instaparse.gll/start-index 4, :instaparse.gll/end-index 7} "d" "e" "f") ({:instaparse.gll/end-column 2, :instaparse.gll/end-line 3, :instaparse.gll/start-column 1, :instaparse.gll/start-line 3, :instaparse.gll/start-index 8, :instaparse.gll/end-index 9} "g") ({:instaparse.gll/end-column 2, :instaparse.gll/end-line 4, :instaparse.gll/start-column 1, :instaparse.gll/start-line 4, :instaparse.gll/start-index 10, :instaparse.gll/end-index 11} "h") ({:instaparse.gll/end-column 2, :instaparse.gll/end-line 5, :instaparse.gll/start-column 1, :instaparse.gll/start-line 5, :instaparse.gll/start-index 12, :instaparse.gll/end-index 13} "i")))) (is (= (hiccup-line-col-spans hlc) '({:instaparse.gll/end-column 2, :instaparse.gll/end-line 5, :instaparse.gll/start-column 1, :instaparse.gll/start-line 1, :instaparse.gll/start-index 0, :instaparse.gll/end-index 13} ({:instaparse.gll/end-column 4, :instaparse.gll/end-line 1, :instaparse.gll/start-column 1, :instaparse.gll/start-line 1, :instaparse.gll/start-index 0, :instaparse.gll/end-index 3} "abc") ({:instaparse.gll/end-column 4, :instaparse.gll/end-line 2, :instaparse.gll/start-column 1, :instaparse.gll/start-line 2, :instaparse.gll/start-index 4, :instaparse.gll/end-index 7} "def") ({:instaparse.gll/end-column 2, :instaparse.gll/end-line 3, :instaparse.gll/start-column 1, :instaparse.gll/start-line 3, :instaparse.gll/start-index 8, :instaparse.gll/end-index 9} "g") ({:instaparse.gll/end-column 2, :instaparse.gll/end-line 4, :instaparse.gll/start-column 1, :instaparse.gll/start-line 4, :instaparse.gll/start-index 10, :instaparse.gll/end-index 11} "h") ({:instaparse.gll/end-column 2, :instaparse.gll/end-line 5, :instaparse.gll/start-column 1, :instaparse.gll/start-line 5, :instaparse.gll/start-index 12, :instaparse.gll/end-index 13} "i")))))) (deftest print-test ;; In scenarios when AutoFlattenSeq or FlattenOnDemandVector is ;; returned to the user, does the parse output print properly? (let [parser-str " = <'('> seq-of-A-or-B <')'> seq-of-A-or-B = ('a' | 'b')*" input "(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)"] ;; input is 33 "a"s to trigger FlattenOnDemandVector in hiccup ;; output format (doseq [[output-mode expected-output] [[:hiccup (list (into [:seq-of-A-or-B] (repeat 33 "a")))] [:enlive (list {:tag :seq-of-A-or-B :content (repeat 33 "a")})]] :let [p (insta/parser parser-str :output-format output-mode) actual-output (p input)]] (is (= expected-output actual-output)) (is (= (with-out-str (prn expected-output)) (with-out-str (prn actual-output)))) (is (= (with-out-str (println expected-output)) (with-out-str (println actual-output)))) (is (= (str expected-output) (str actual-output)))))) (deftest invoke-test (let [parser (insta/parser "S = 'a'") text "a"] (are [x] (= [:S "a"] (parser text)) (parser text 0 0) (parser text 0 0 1 1) (parser text 0 0 1 1 2 2) (parser text 0 0 1 1 2 2 3 3) (parser text 0 0 1 1 2 2 3 3 4 4) (parser text 0 0 1 1 2 2 3 3 4 4 5 5) (parser text 0 0 1 1 2 2 3 3 4 4 5 5 6 6) (parser text 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7) (parser text 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8) (parser text 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9) (parser text 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10)))) #?(:cljs (defn ^:export run [] (run-tests))) instaparse-1.4.7/test/instaparse/defparser_test.cljc000066400000000000000000000042531311220471200226700ustar00rootroot00000000000000(ns instaparse.defparser-test (:require #?(:clj [clojure.test :as t :refer [deftest are is]] :cljs [cljs.test :as t :refer-macros [deftest are is]]) #?(:clj [instaparse.core :as insta :refer [defparser]] :cljs [instaparse.core :as insta :refer-macros [defparser]]) [instaparse.combinators :as c] [instaparse.core-test :refer [parsers-similar?]])) (defparser p1 "S = #'a' | 'b'") (defparser p2 [:S (c/alt (c/regexp #"a") (c/string "b"))]) (defparser p3 {:S (c/alt (c/regexp #"a") (c/string "b"))} :start :S) (defparser p4 "test/data/defparser_grammar.txt") (def p5 (insta/parser "S = #'a' | 'b'")) (deftest defparser-test-standard (is (parsers-similar? p1 p2 p3 p4 p5)) #?(:clj (are [x y] (thrown? y (eval (quote x))) (instaparse.core/defparser p6 "test/data/parser_not_found.txt") Exception (instaparse.core/defparser p7 "test/data/defparser_grammar.txt" :no-slurp true) Exception ;; We catch up front when someone tries to do something overly ;; complicated in the macro-time options (instaparse.core/defparser p8 "S = #'a' | 'b'" :input-format (do :ebnf)) AssertionError))) (defparser a1 "S = #'a' / 'b'" :input-format :abnf) (def a2 (insta/parser "S = #'a' / 'b'" :input-format :abnf)) (defparser a3 "S = #'a' | 'b'" :input-format :ebnf, :string-ci true) (deftest defparser-test-abnf (is (parsers-similar? a1 a2 a3))) (defparser ws1 "S = ( 'a')+ ; = #'\\s+'") (defparser ws2 "S = 'a'+" :auto-whitespace :standard) (defparser ws3 "S = 'a'+" :auto-whitespace (insta/parser "whitespace = #'\\s+'")) (let [ws (insta/parser "whitespace = #'\\s+'")] (defparser ws4 "S = 'a'+" :auto-whitespace ws)) (def ws5 (insta/parser "S = 'a'+" :auto-whitespace :standard)) (defparser ws6 " = #'\\s+'; S = ( 'a')+ " :start :S) (deftest defparser-test-auto-whitespace (is (parsers-similar? ws1 ws2 ws3 ws4 ws5 ws6))) (defparser e1 "S = 'a'+" :output-format :enlive) (def e2 (insta/parser "S = 'a'+" :output-format :enlive)) (deftest defparser-test-enlive (is (parsers-similar? e1 e2)) (is (= (e2 "a") (e1 "a")))) instaparse-1.4.7/test/instaparse/grammars.cljc000066400000000000000000000452611311220471200214730ustar00rootroot00000000000000(ns instaparse.grammars #?(:clj (:refer-clojure :exclude [cat])) (:require #?(:clj [clojure.test :refer [deftest are]] :cljs [cljs.test :as t]) [instaparse.reduction :refer [apply-standard-reductions]] [instaparse.combinators :refer [Epsilon opt plus star rep alt ord cat string string-ci regexp nt look neg hide hide-tag ebnf abnf]] [instaparse.gll :as gll] [instaparse.core :as insta]) #?(:cljs (:require-macros [cljs.test :refer [is are deftest run-tests testing]]))) (defn- parse [grammar start text] (gll/parse (apply-standard-reductions grammar) start text false)) (defn- parses [grammar start text] (gll/parses (apply-standard-reductions grammar) start text false)) (defn- eparse [grammar start text] (gll/parse (apply-standard-reductions :enlive grammar) start text false)) (defn- eparses [grammar start text] (gll/parses (apply-standard-reductions :enlive grammar) start text false)) ;; Grammars built with combinators (def grammar1 {:s (alt (string "a") (string "aa") (string "aaa"))}) (def grammar2 {:s (alt (string "a") (string "b"))}) (def grammar3 {:s (alt (cat (string "a") (nt :s)) Epsilon)}) (def grammar4 {:y (string "b") :x (cat (string "a") (nt :y))}) (def grammar5 {:s (cat (string "a") (string "b") (string "c"))}) (def grammar6 {:s (alt (cat (string "a") (nt :s)) (string "a"))}) (def grammar7 {:s (alt (cat (string "a") (nt :s)) Epsilon)}) (def grammar8 {:s (alt (cat (string "a") (nt :s) Epsilon) (string "a"))}) (def grammar9 {:s (alt (cat (string "a") (nt :s)) (cat (string "b") (nt :s)) Epsilon)}) (def grammar10 {:s (alt (cat (nt :s) (string "a") ) (cat (nt :s) (string "b") ) Epsilon)}) (def grammar11 {:s (alt (cat (nt :s) (string "a")) (string "a"))}) (def grammar12 {:s (alt (nt :a) (nt :a) (nt :a)) :a (alt (cat (nt :s) (string "a")) (string "a"))}) (def grammar13 {:s (nt :a) :a (alt (cat (nt :s) (string "a")) (string "a"))}) (def amb-grammar {:s (alt (string "b") (cat (nt :s) (nt :s)) (cat (nt :s) (nt :s) (nt :s)))}) (def paren-grammar {:s (alt (cat (string "(") (string ")")) (cat (string "(") (nt :s) (string ")")) (cat (nt :s) (nt :s)))}) (def non-ll-grammar {:s (alt (nt :a) (nt :b)) :a (alt (cat (string "a") (nt :a) (string "b")) Epsilon) :b (alt (cat (string "a") (nt :b) (string "bb")) Epsilon)}) (def grammar14 {:s (cat (opt (string "a")) (string "b"))}) (def grammar15 {:s (cat (opt (string "a")) (opt (string "b")))}) (def grammar16 {:s (plus (string "a"))}) (def grammar17 {:s (cat (plus (string "a")) (string "b"))}) (def grammar18 {:s (cat (plus (string "a")) (string "a"))}) (def grammar19 {:s (cat (string "a") (plus (alt (string "b") (string "c"))))}) (def grammar20 {:s (cat (string "a") (plus (cat (string "b") (string "c"))))}) (def grammar21 {:s (cat (string "a") (plus (alt (string "b") (string "c"))) (string "b"))}) (def grammar22 {:s (star (string "a"))}) (def grammar23 {:s (cat (star (string "a")) (string "b"))}) (def grammar24 {:s (cat (star (string "a")) (string "a"))}) (def grammar25 {:s (cat (string "a") (star (alt (string "b") (string "c"))))}) (def grammar26 {:s (cat (string "a") (star (cat (string "b") (string "c"))))}) (def grammar27 {:s (cat (string "a") (star (alt (string "b") (string "c"))) (string "b"))}) (def grammar28 {:s (regexp "a[0-9]b+c")}) (def grammar29 {:s (plus (opt (string "a")))}) (def grammar30 {:s (alt (nt :a) (nt :b)) :a (plus (cat (string "a") (string "b"))) :b (plus (cat (string "a") (string "b")))}) ;equal: [zero one | one zero] ;; equal number of "0"s and "1"s. ; ;zero: "0" equal | equal "0" ;; has an extra "0" in it. ; ;one: "1" equal | equal "1" ;; has an extra "1" in it. (def equal-zeros-ones {:equal (opt (alt (cat (nt :zero) (nt :one)) (cat (nt :one) (nt :zero)))) :zero (alt (cat (string "0") (nt :equal)) (cat (nt :equal) (string "0"))) :one (alt (cat (string "1") (nt :equal)) (cat (nt :equal) (string "1")))}) (def grammar31 {:equal (alt (cat (string "0") (nt :equal) (string "1")) (cat (string "1") (nt :equal) (string "0")) (cat (nt :equal) (nt :equal)) Epsilon)}) ; Another slow one (def grammar32 {:s (alt (string "0") (cat (nt :s) (nt :s)) Epsilon)}) (def grammar33 {:s (alt (cat (nt :s) (nt :s)) Epsilon)}) (def grammar34 {:s (alt (nt :s) Epsilon)}) (def grammar35 {:s (opt (cat (nt :s) (nt :s)))}) (def grammar36 {:s (cat (opt (nt :s)) (nt :s))}) (def grammar37 {:s (cat (nt :s) (opt (nt :s)))}) (def grammar38 {:s (regexp "a[0-9](bc)+")}) (def grammar39 {:s (cat (string "0") (hide (string "1"))(string "2"))}) (def grammar40 {:s (nt :aa) :aa (hide-tag (alt Epsilon (cat (string "a") (nt :aa))))}) (def grammar41 {:s (cat (string "b") (plus (string "a")))}) (def grammar42 {:s (cat (string "b") (star (string "a")))}) (def grammar43 {:s (cat (star (string "a")) (string "b"))}) (def grammar44 {:s (cat (look (string "ab")) (nt :ab)) :ab (plus (alt (string "a") (string "b")))}) (def grammar45 {:s (cat (nt :ab) (look (string "ab"))) :ab (plus (alt (string "a") (string "b")))}) (def grammar46 {:s (cat (nt :ab) (look Epsilon)) :ab (plus (alt (string "a") (string "b")))}) (def grammar47 {:s (cat (neg (string "ab")) (nt :ab)) :ab (plus (alt (string "a") (string "b")))}) (def grammar48 {:s (cat (nt :ab) (neg (string "ab"))) :ab (plus (alt (string "a") (string "b")))}) (def grammar49 {:s (cat (nt :ab) (neg Epsilon)) :ab (plus (alt (string "a") (string "b")))}) ; Grammar for odd number of a's. (def grammar50 {:s (alt (cat (string "a") (nt :s) (string "a")) (string "a"))}) (def grammar51 {:s (hide-tag (alt (cat (string "a") (nt :s) (string "a")) (string "a")))}) (def grammar52 {:s (hide-tag (alt (cat (string "a") (nt :s) (string "b")) (string "a")))}) (def grammar53 {:s (hide-tag (alt (cat (string "a") (nt :s) (string "a")) (string "b")))}) (def grammar54 {:s (cat (string "a") (star (string "aa")))}) (def grammar55 {:s (alt (cat (string "a") (nt :s) (opt (string "a"))) (string "a"))}) (def grammar56 {:s (alt (string "a") (cat (string "a") (nt :s) (string "a")) )}) ;; PEG grammars (def grammar57 {:s (ord (plus (string "aa")) (plus (string "a")))}) (def grammar58 {:s (cat (ord (plus (string "aa")) (plus (string "a"))) (string "b"))}) (def grammar59 {:S (cat (look (cat (nt :A) (string "c"))) (plus (string "a")) (nt :B) (neg (ord (string "a") (string "b") (string "c")))) :A (cat (string "a") (opt (nt :A)) (string "b")) :B (hide-tag (cat (string "b") (opt (nt :B)) (string "c")))}) (def grammar60 {:Expr (ord (nt :Product) (nt :Sum) (nt :Value)) :Product (cat (nt :Expr) (star (cat (alt (string "*") (string "/")) (nt :Expr)))) :Sum (cat (nt :Expr) (star (cat (alt (string "+") (string "-")) (nt :Expr)))) :Value (alt (regexp "[0-9]+") (cat (string "(") (nt :Expr) (string ")")))}) (def grammar61 {:Expr (alt (nt :Product) (nt :Value)) :Product (cat (nt :Expr) (star (cat (alt (string "*") (string "/")) (nt :Expr)))) :Value (alt (string "[0-9]+") (cat (string "(") (nt :Expr) (string ")")))}) (def grammar62 {:Expr (alt (nt :Product) (string "0")) :Product (plus (nt :Expr))}) (def grammar63 {:Expr (alt (nt :Expr) (string "0"))}) (def grammar64 {:Expr (hide-tag (alt (nt :Product) (cat (neg (nt :Product)) (nt :Sum)) (cat (neg (nt :Product)) (neg (nt :Sum)) (nt :Value)))) :Product (cat (nt :Expr) (star (cat (alt (string "*") (string "/")) (nt :Expr)))) :Sum (cat (nt :Expr) (star (cat (alt (string "+") (string "-")) (nt :Expr)))) :Value (alt (regexp "[0-9]+") (cat (string "(") (nt :Expr) (string ")")))}) (def grammar65 {:s (cat (alt (plus (string "aa")) (cat (neg (plus (string "aa"))) (plus (string "a")))) (string "b"))}) (def grammar66 {:s (neg (nt :s))}) (def grammar67 {:s (cat (neg (nt :s)) (string "0"))}) (def grammar68 {:s (cat (neg (nt :a)) (string "0")) :a (neg (nt :s))}) (def grammar69 {:s (cat (neg (nt :a)) (string "abc")) :a (cat (neg (string "b")) (string "c"))}) (def grammar70 {:s (cat (neg (nt :a)) (string "abc")) :a (cat (neg (string "b")) (string "a"))}) (deftest testing-grammars (are [x y] (= x y) (parse grammar1 :s "a") [:s "a"] (parse grammar1 :s "aa") [:s "aa"] (parse grammar1 :s "aaa") [:s "aaa"] (insta/failure? (parse grammar1 :s "b")) true (parse grammar2 :s "b") [:s "b"] (parse grammar3 :s "aaaaa") [:s "a" [:s "a" [:s "a" [:s "a" [:s "a" [:s]]]]]] (eparse grammar3 :s "aaa") '{:tag :s, :content ("a" {:tag :s, :content ("a" {:tag :s, :content ("a" {:tag :s, :content nil})})})} (parse grammar4 :x "ab") [:x "a" [:y "b"]] (parse grammar5 :s "abc") [:s "a" "b" "c"] (eparse grammar5 :s "abc") '{:tag :s, :content ("a" "b" "c")} (parse grammar6 :s "aaaa") [:s "a" [:s "a" [:s "a" [:s "a"]]]] (parse grammar7 :s "aaaa") [:s "a" [:s "a" [:s "a" [:s "a" [:s]]]]] (parses grammar8 :s "aaaaa") '([:s "a" [:s "a" [:s "a" [:s "a" [:s "a"]]]]]) (eparse grammar9 :s "aaa") '{:tag :s, :content ("a" {:tag :s, :content ("a" {:tag :s, :content ("a" {:tag :s, :content nil})})})} (parse grammar9 :s "bbb") [:s "b" [:s "b" [:s "b" [:s]]]] (parses grammar10 :s "aaaa") '([:s [:s [:s [:s [:s] "a"] "a"] "a"] "a"]) (eparses grammar10 :s "bb") '({:tag :s, :content ({:tag :s, :content ({:tag :s, :content nil} "b")} "b")}) (parses grammar11 :s "aaaa") '([:s [:s [:s [:s "a"] "a"] "a"] "a"]) (parses grammar12 :s "aaa") '([:s [:a [:s [:a [:s [:a "a"]] "a"]] "a"]]) (parses grammar13 :s "aaa") '([:s [:a [:s [:a [:s [:a "a"]] "a"]] "a"]]) (parses amb-grammar :s "b") '([:s "b"]) (parses amb-grammar :s "bb") '([:s [:s "b"] [:s "b"]]) (parses amb-grammar :s "bbb") '([:s [:s "b"] [:s [:s "b"] [:s "b"]]] [:s [:s "b"] [:s "b"] [:s "b"]] [:s [:s [:s "b"] [:s "b"]] [:s "b"]]) (set (parses amb-grammar :s "bbbb")) (set '([:s [:s "b"] [:s [:s "b"] [:s [:s "b"] [:s "b"]]]] [:s [:s [:s "b"] [:s "b"]] [:s "b"] [:s "b"]] [:s [:s [:s "b"] [:s [:s "b"] [:s "b"]]] [:s "b"]] [:s [:s "b"] [:s [:s "b"] [:s "b"]] [:s "b"]] [:s [:s "b"] [:s "b"] [:s [:s "b"] [:s "b"]]] [:s [:s [:s "b"] [:s "b"]] [:s [:s "b"] [:s "b"]]] [:s [:s "b"] [:s [:s "b"] [:s "b"] [:s "b"]]] [:s [:s [:s "b"] [:s "b"] [:s "b"]] [:s "b"]] [:s [:s [:s [:s "b"] [:s "b"]] [:s "b"]] [:s "b"]] [:s [:s "b"] [:s [:s [:s "b"] [:s "b"]] [:s "b"]]])) (parses paren-grammar :s "(()())()") '([:s [:s "(" [:s [:s "(" ")"] [:s "(" ")"]] ")"] [:s "(" ")"]]) (parse non-ll-grammar :s "aabb") [:s [:a "a" [:a "a" [:a] "b"] "b"]] (insta/failure? (parse non-ll-grammar :s "aabbb")) true (parse non-ll-grammar :s "aabbbb") [:s [:b "a" [:b "a" [:b] "bb"] "bb"]] (parse grammar14 :s "b") [:s "b"] (parse grammar14 :s "ab") [:s "a" "b"] (parse grammar15 :s "ab") [:s "a" "b"] (parse grammar15 :s "b") [:s "b"] (parse grammar15 :s "") [:s] (parse grammar16 :s "aaaa") [:s "a" "a" "a" "a"] (parses grammar17 :s "aaaab") '([:s "a" "a" "a" "a" "b"]) (parses grammar18 :s "aaaa") '([:s "a" "a" "a" "a"]) (parse grammar19 :s "abbcbc") [:s "a" "b" "b" "c" "b" "c"] (parse grammar20 :s "abcbcbc") [:s "a" "b" "c" "b" "c" "b" "c"] (insta/failure? (parse grammar20 :s "a")) true (parse grammar22 :s "") [:s] (parse grammar22 :s "aaa") [:s "a" "a" "a"] (parse grammar23 :s "b") [:s "b"] (parse grammar23 :s "aab") [:s "a" "a" "b"] (parse grammar24 :s "a") [:s "a"] (parse grammar24 :s "aaa") [:s "a" "a" "a"] (parse grammar25 :s "a") [:s "a"] (parse grammar25 :s "abbc") [:s "a" "b" "b" "c"] (parse grammar26 :s "a") [:s "a"] (parse grammar26 :s "abc") [:s "a" "b" "c"] (parse grammar28 :s "a4bbbc") [:s "a4bbbc"] (parses grammar29 :s "aaaaa") '([:s "a" "a" "a" "a" "a"]) (parses grammar30 :s "ababab") '([:s [:b "a" "b" "a" "b" "a" "b"]] [:s [:a "a" "b" "a" "b" "a" "b"]]) (count (parses equal-zeros-ones :equal "00110110")) 448 (parse grammar31 :equal "00110110") [:equal [:equal "0" [:equal [:equal "0" [:equal] "1"] [:equal "1" [:equal] "0"]] "1"] [:equal "1" [:equal] "0"]] (parse grammar32 :s "0000") [:s [:s "0"] [:s [:s "0"] [:s [:s "0"] [:s "0"]]]] (insta/failure? (parse grammar33 :s "0000")) true (insta/failure? (parse grammar34 :s "0000")) true (insta/failure? (parse grammar35 :s "0000")) true (insta/failure? (parse grammar36 :s "0000")) true (insta/failure? (parse grammar37 :s "0000")) true (parse grammar33 :s "") [:s] (parse grammar34 :s "") [:s] (parse grammar35 :s "") [:s] (insta/failure? (parse grammar36 :s "")) true (insta/failure? (parse grammar37 :s "")) true (parse grammar38 :s "a2bcbc") [:s "a2bcbc"] (parse grammar39 :s "012") [:s "0" "2"] (eparse grammar39 :s "012") '{:tag :s, :content ("0" "2")} (parse grammar40 :s "aaa") [:s "a" "a" "a"] (eparse grammar40 :s "aaa") '{:tag :s, :content ("a" "a" "a")} (parse grammar41 :s "baaaa") [:s "b" "a" "a" "a" "a"] (parse grammar42 :s "baaaa") [:s "b" "a" "a" "a" "a"] (insta/failure? (parse grammar41 :s "b")) true (parse grammar42 :s "b") [:s "b"] (parse grammar43 :s "b") [:s "b"] (parse grammar43 :s "ab") [:s "a" "b"] (parse grammar44 :s "abbab") [:s [:ab "a" "b" "b" "a" "b"]] (insta/failure? (parse grammar44 :s "bbab")) true (parse grammar46 :s "babaab") [:s [:ab "b" "a" "b" "a" "a" "b"]] (insta/failure? (parse grammar45 :s "babaab")) true (parse grammar47 :s "babaab") [:s [:ab "b" "a" "b" "a" "a" "b"]] (insta/failure? (parse grammar47 :s "abbab")) true (parse grammar48 :s "abab") [:s [:ab "a" "b" "a" "b"]] (insta/failure? (parse grammar49 :s "ababa")) true (parse grammar50 :s "aaa") [:s "a" [:s "a"] "a"] (insta/failure? (parse grammar50 :s "aa")) true (parse grammar51 :s "aaa") '("a" "a" "a") (eparse grammar51 :s "aaa") '("a" "a" "a") (eparse grammar52 :s "aab") '("a" "a" "b") (eparse grammar53 :s "aba") '("a" "b" "a") (parse grammar54 :s "aaa") [:s "a" "aa"] (parses grammar55 :s "aaa") '([:s "a" [:s "a"] "a"] [:s "a" [:s "a" [:s "a"]]]) (parses grammar56 :s "aaa") '([:s "a" [:s "a"] "a"]) (parses grammar57 :s "aaaa") '([:s "aa" "aa"] [:s "a" "a" "a" "a"]) (parses grammar57 :s "aaaaa") '([:s "a" "a" "a" "a" "a"]) (parses grammar58 :s "aaaab") '([:s "aa" "aa" "b"] [:s "a" "a" "a" "a" "b"]) (parses grammar58 :s "aaaaab") '([:s "a" "a" "a" "a" "a" "b"]) (parses grammar59 :S "aaabbbccc") '([:S "a" "a" "a" "b" "b" "b" "c" "c" "c"]) (parses grammar65 :s "aaaab") '([:s "aa" "aa" "b"]) (parses grammar65 :s "aaaaab") '() (parses grammar67 :s "0") '([:s "0"]) (parses grammar68 :s "0") '() (parses grammar69 :s "abc") '([:s "abc"]) (parses grammar70 :s "abc") () )) instaparse-1.4.7/test/instaparse/repeat_test.cljc000066400000000000000000000125101311220471200221700ustar00rootroot00000000000000(ns instaparse.repeat-test (:require #?(:clj [clojure.test :refer [deftest are]] :cljs [cljs.test :as t]) [instaparse.core :as insta] [instaparse.repeat :as repeat]) #?(:cljs (:require-macros [cljs.test :refer [are deftest]]))) (def user-parser "content = user-block* user-block = (user before-section after-section < blank-line* >) user = prefix separator number separator name newline before-section = < before > lines error-line* after-section = < after > lines = < 'BEFORE' newline > = < 'AFTER' newline > = < 'User' > = line* = <#'\\s+'> subscription newline = ( '(no dates!)' | 'FIXUP!' ) newline blank-line = #'\\s*\n' name = #'.*' (*WIP why infinite loop?*) subscription = !prefix #'.*?(?=\\s+-)' < separator > date date = #'.*' = <'\n'> = <#'[ -]+'> number = #'[0-9]+' ") (deftest memory-optimize-test (are [grammar text optimize?] (let [parser (insta/parser grammar) parser-enlive (insta/parser grammar :output-format :enlive) tree1 (parser text) tree2 (parser text :optimize :memory) tree3 (parser-enlive text) tree4 (parser-enlive text :optimize :memory)] (and (= tree1 tree2) (= tree3 tree4) (= optimize? (repeat/used-memory-optimization? tree2)) (= optimize? (repeat/used-memory-optimization? tree4)))) ;user-parser text true "S = 'ab'*" "ababab" true "S = 'ab'*" "abababd" false "S = 'ab'*" "" false " = 'ab'*" "ababab" true " = 'ab'*" "abababd" false " = 'ab'*" "" false "S = <'ab'>*" "ababab" false "S = <'ab'*>" "ababab" false "S = A*; A = 'a'" "aaaa" true "S = A*; A = 'a'" "aaaad" false "S = A*; A = 'a'" "" false " = A*; A = 'a'" "aaaa" true " = A*; A = 'a'" "aaaad" false " = A*; A = 'a'" "" false "S = *; A = 'a'" "aaaa" false "S = ; A = 'a'" "aaaa" false "S = 'ab'+" "ababab" true "S = 'ab'+" "abababd" false "S = 'ab'+" "" false " = 'ab'+" "ababab" true " = 'ab'+" "abababd" false " = 'ab'+" "" false "S = <'ab'>+" "ababab" false "S = <'ab'+>" "ababab" false "S = A+; A = 'a'" "aaaa" true "S = A+; A = 'a'" "aaaad" false "S = A+; A = 'a'" "" false " = A+; A = 'a'" "aaaa" true " = A+; A = 'a'" "aaaad" false " = A+; A = 'a'" "" false "S = +; A = 'a'" "aaaa" false "S = ; A = 'a'" "aaaa" false "S = 'c' 'ab'*" "cababab" true "S = 'c' 'ab'*" "cabababd" false "S = 'c' 'ab'*" "dababab" false "S = 'c' 'ab'*" "c" false "S = 'c' 'ab'*" "" false " = 'c' 'ab'*" "cababab" true " = 'c' 'ab'*" "cabababd" false " = 'c' 'ab'*" "dcababab" false " = 'c' 'ab'*" "c" false " = 'c' 'ab'*" "" false "S = 'c' <'ab'>*" "cababab" false "S = 'c' <'ab'*>" "cababab" false "S = <'c'> <'ab'>*" "cababab" false "S = <'c'> 'ab'*" "cababab" false "S = 'c' A*; A = 'a'" "caaaa" true "S = 'c' A*; A = 'a'" "caaaad" false "S = 'c' A*; A = 'a'" "dcaaaad" false "S = 'c' A*; A = 'a'" "c" false " = 'c' A*; A = 'a'" "caaaa" true " = 'c' A*; A = 'a'" "caaaad" false " = 'c' A*; A = 'a'" "daaaad" false " = 'c' A*; A = 'a'" "c" false "S = 'c' *; A = 'a'" "caaaa" false "S = 'c' ; A = 'a'" "caaaa" false "S = 'c' 'ab'+" "cababab" true "S = 'c' 'ab'+" "dababab" false "S = 'c' 'ab'+" "abababd" false "S = 'c' 'ab'+" "c" false "S = 'c' 'ab'+" "" false " = 'c' 'ab'+" "cababab" true " = 'c' 'ab'+" "cabababd" false " = 'c' 'ab'+" "dcababab" false " = 'c' 'ab'+" "c" false " = 'c' 'ab'+" "" false "S = 'c' <'ab'>+" "cababab" false "S = 'c' <'ab'+>" "cababab" false "S = <'c'> <'ab'>+" "cababab" false "S = <'c'> 'ab'+" "cababab" false "S = 'c' A+; A = 'a'" "caaaa" true "S = 'c' A+; A = 'a'" "caaaad" false "S = 'c' A+; A = 'a'" "dcaaaa" false "S = 'c' A+; A = 'a'" "c" false " = 'c' A+; A = 'a'" "caaaa" true " = 'c' A+; A = 'a'" "caaaad" false " = 'c' A+; A = 'a'" "dcaaaa" false " = 'c' A+; A = 'a'" "c" false "S = 'c' +; A = 'a'" "caaaa" false "S = 'c' ; A = 'a'" "caaaa" false "S = C A+; C = 'c'; A = 'a'" "caaaa" true "S = C A+; C = 'c'; = 'a'" "caaaa" true "S = C A+; = 'c'; A = 'a'" "caaaa" true "S = C A+; = 'c'; = 'a'" "caaaa" true "S = A+; C = 'c'; A = 'a'" "caaaa" false "S = C A+; C = 'c'; A = 'a'" "caaaad" false "S = C A+; C = 'c'; A = 'a'" "dcaaaa" false "S = C A+; C = 'c'; A = 'a'" "c" false " = C A+; C = 'c'; A = 'a'" "caaaa" true " = A+; C = 'c'; A = 'a'" "caaaa" false " = C A+; C = 'c'; A = 'a'" "caaaad" false " = C A+; C = 'c'; A = 'a'" "dcaaaa" false " = C A+; C = 'c'; A = 'a'" "c" false "S = C +; C = 'c'; A = 'a'" "caaaa" false "S = C ; C = 'c'; A = 'a'" "caaaa" false )) instaparse-1.4.7/test/instaparse/specs.cljc000066400000000000000000000014251311220471200207710ustar00rootroot00000000000000(ns instaparse.specs) (def cfg1 "S = 'a'") (def cfg2 "S = X X = Y Y = Z") (def cfg3 "S = X | Y Y = A Z Z = 'a'") (def cfg4 "S := A B | C C := (A | B) C") (def cfg5 "S=A?") (def cfg6 "S =(A | B)?") (def cfg7 "S = A, B?, (C C)*, D+, E") (def cfg8 " = (C | D)") (def cfg9 "S = A, &B") (def cfg10 "S = &B A") (def cfg11 "S = &B+ A") (def cfg12 "S = !B A") (def cfg13 "S = !&B A") (def cfg15 "S = 'a' S | Epsilon; C = 'b'. D = A") (def cfg16 "S = 'a' / 'b'") (def cfg17 "S = 'a' / 'b' | 'c'") (def cfg18 "S = 'a' | 'b' / 'c'") (def cfg19 "S = A ('a' | 'b')+ A = !B B = 'a' !'b'") (def cfg20 "(* A comment about this grammar *split* (across) lines *) (* And some (* nested *) comments *) S = (A*) A = 'a'") instaparse-1.4.7/test/instaparse/viz_test.clj000066400000000000000000000024761311220471200213670ustar00rootroot00000000000000(ns instaparse.viz-test (:require instaparse.core) (:use instaparse.viz)) (def make-tree-e "simple tree parser" (instaparse.core/parser "tree: node* node: leaf | <'('> node (<'('> node <')'>)* node* <')'> leaf: #'a+' " :output-format :enlive)) (def make-tree-h "simple tree parser" (instaparse.core/parser "tree: node* node: leaf | <'('> node (<'('> node <')'>)* node* <')'> leaf: #'a+' " :output-format :hiccup)) (def make-tree-se "simple tree parser" (instaparse.core/parser ": node* node: leaf | <'('> node (<'('> node <')'>)* node* <')'> leaf: #'a+' " :output-format :enlive)) (def make-tree-sh "simple tree parser" (instaparse.core/parser ": node* node: leaf | <'('> node (<'('> node <')'>)* node* <')'> leaf: #'a+' " :output-format :hiccup)) (defn view-test-trees [t] (tree-viz (make-tree-e "((a)((a)))(a)")) (Thread/sleep t) (tree-viz (make-tree-h "((a)((a)))(a)")) (Thread/sleep t) (tree-viz (make-tree-sh "((a)((a)))(a)")) (Thread/sleep t) (tree-viz (make-tree-se "((a)((a)))(a)")) (Thread/sleep t) (tree-viz (make-tree-e "")) (Thread/sleep t) (tree-viz (make-tree-se "")))