pax_global_header 0000666 0000000 0000000 00000000064 12470552204 0014513 g ustar 00root root 0000000 0000000 52 comment=b9273e28867979385c0f1eb4f68ffb9f45367817
parsley-clojure-0.9.3/ 0000775 0000000 0000000 00000000000 12470552204 0014644 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/README.md 0000664 0000000 0000000 00000017746 12470552204 0016142 0 ustar 00root root 0000000 0000000 # Parsley
## Parsnip
Parsley has been a test bed and a proof of concept for total incremental parsers. However it suffers from severe limitations (mainly revolving around lookaheads, both at the lexeme and production level) which hinder further development and acceptance.
Further development of the concepts and techniques explored in Parsley will occur in [Parsnip](https://github.com/cgrand/parsnip/).
## Introduction
Parsley generates *total and truly incremental parsers*.
Total: a Parsley parser *yields a parse-tree for any input string*.
Truly incremental: a Parsley parser can operate as a text buffer, in best cases
recomputing the parse-tree after a sequence of edits happens in *logarithmic
time* (worst case: it behaves like a restartable parser).
Parsley parsers have *no separate lexer*, this allows for better compositionality
of grammars.
For now Parsley uses the same technique (for lexer-less parsing) as described
in this paper:
Context-Aware Scanning for Parsing Extensible Languages
http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible-Language
(I independently rediscovered this technique and dubbed it LR+.)
Without a separate lexer, a language is entirely defined by its grammar.
A grammar is an alternation of keywords (non-terminal names) and other values.
A keyword and another value form a production rule.
## Specifying grammars
A simple grammar is:
:expr #{"x" ["(" :expr* ")"]}
`x` `()` `(xx)` `((x)())` are recognized by this grammar.
By default the main production of a grammar is the first one.
A production right value is a combination of:
* strings and regexes (terminals -- the set of terminal types is broader and
even open, more later)
* keywords (non-terminals) which can be suffixed by `*`, `+` or `?` to denote
repetitions or options.
* sets to denote an alternative
* vectors to denote a sequence. Inside vectors `:*`, `:+` and `:?` are postfix unary
operators. That is `["ab" :+]` denotes a non-empty repetition of the `ab`
string
A production left value is always a keyword. If this keyword is suffixed by `-`,
no node will be generated in the parse-tree for this rule, its child nodes are
inlined in the parent node. Rules with such names are called anonymous rules.
An anonymous rule must be referred to by its base name (without the `-`).
These two grammars specify the same language but the resulting parse-trees will
be different (additional `:expr-rep` nodes):
:expr #{"x" ["(" :expr* ")"]}
:expr #{"x" :expr-rep}
:expr-rep ["(" :expr* ")"]
These two grammars specify the same language and the same parse-trees:
:expr #{"x" ["(" :expr* ")"]}
:expr #{"x" :expr-rep}
:expr-rep- ["(" :expr* ")"]
## Creating parsers
A parser is created using the `parser` or `make-parser` functions.
(require '[net.cgrand.parsley :as p])
(def p (p/parser :expr #{"x" ["(" :expr* ")"]}))
(pprint (p "(x(x))"))
{:tag :net.cgrand.parsley/root,
:content
[{:tag :expr,
:content
["("
{:tag :expr, :content ["x"]}
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}
")"]}]}
; running on malformed input with garbage
(pprint (p "a(zldxn(dez)"))
{:tag :net.cgrand.parsley/unfinished,
:content
[{:tag :net.cgrand.parsley/unexpected, :content ["a"]}
{:tag :net.cgrand.parsley/unfinished,
:content
["("
{:tag :net.cgrand.parsley/unexpected, :content ["zld"]}
{:tag :expr, :content ["x"]}
{:tag :net.cgrand.parsley/unexpected, :content ["n"]}
{:tag :expr,
:content
["("
{:tag :net.cgrand.parsley/unexpected, :content ["dez"]}
")"]}]}]}
## Creating buffers
Creating a buffer, editing it and getting its resulting parse-tree:
(-> p p/incremental-buffer (p/edit 0 0 "(") (p/edit 1 0 "(x)") p/parse-tree pprint)
{:tag :net.cgrand.parsley/unfinished,
:content
[{:tag :net.cgrand.parsley/unfinished,
:content
["("
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}]}]}
Incremental parsing at work:
=> (def p (p/parser :expr #{"x" "\n" ["(" :expr* ")"]}))
#'net.cgrand.parsley/p
=> (let [line (apply str "\n" (repeat 10 "((x))"))
input (str "(" (apply str (repeat 1000 line)) ")")
buf (p/incremental-buffer p)
buf (p/edit buf 0 0 input)]
(time (p/parse-tree buf))
(time (p/parse-tree (-> buf (p/edit 2 0 "(") (p/edit 51002 0 ")"))))
nil)
"Elapsed time: 508.834 msecs"
"Elapsed time: 86.038 msecs"
nil
Hence, *reparsing the buffer only took a fraction of the original time* despite
the buffer having been modified at the start and at the end.
## Incremental parsing
The input string is split into _chunks_ (lines by default) and chunks are always
reparsed as a whole, so don't experiment with incremental parsing with 1-line
inputs!
Let's look at a bit more complex example:
=> (def p (p/parser {:main :expr*
:space :ws?
:make-node (fn [tag content] {:tag tag :content content :id (gensym)})}
:ws #"\s+"
:expr #{#"\w+" ["(" :expr* ")"]}))
This example introduces the option map: if the first arg to `parser` is a map
(instead of a keyword), it's a map of options. See Options for more.
The important option here is that we redefine how nodes of the parse-tree are
constructed (via the `make-node` option). We add a unique identifier to each
node.
Now let's create a 3-line input and parse it:
=> (def buf (-> p incremental-buffer (edit 0 0 "((a)\n(b)\n(c))")))
=> (-> buf parse-tree pprint)
nil
{:tag :net.cgrand.parsley/root,
:content
[{:tag :expr,
:content
["("
{:tag :expr,
:content ["(" {:tag :expr, :content ["a"], :id G__1806} ")"],
:id G__1807}
{:tag :ws, :content ["\n"], :id G__1808}
{:tag :expr,
:content ["(" {:tag :expr, :content ["b"], :id G__1809} ")"],
:id G__1810}
{:tag :ws, :content ["\n"], :id G__1811}
{:tag :expr,
:content ["(" {:tag :expr, :content ["c"], :id G__1812} ")"],
:id G__1813}
")"],
:id G__1814}],
:id G__1815}
Now, let's modify this "B" in "BOO" and parse the buffer again:
=> (-> buf (edit 6 1 "BOO") parse-tree pprint)
nil
{:tag :net.cgrand.parsley/root,
:content
[{:tag :expr,
:content
["("
{:tag :expr,
:content ["(" {:tag :expr, :content ["a"], :id G__1806} ")"],
:id G__1807}
{:tag :ws, :content ["\n"], :id G__1818}
{:tag :expr,
:content ["(" {:tag :expr, :content ["BOO"], :id G__1819} ")"],
:id G__1820}
{:tag :ws, :content ["\n"], :id G__1811}
{:tag :expr,
:content ["(" {:tag :expr, :content ["c"], :id G__1812} ")"],
:id G__1813}
")"],
:id G__1821}],
:id G__1822}
-----
We can spot that 5 out of the 10 nodes are shared with the previous parse-tree.
## Options
`:main` specifies the root production, by default this is the the first
production of the grammar.
`:root-tag` specifies the tag name to use for the root node
(`:net.cgrand.parsley/root` by default).
`:space` specifies a production which will be interspersed between every symbol
(terminal or not) *except in a sequence created with `unspaced`.*
`:make-node` specifies a function whose arglist is `[tag children-vec]` which
returns a new node. By default create instances the Node record with keys `tag`
and `content`.
`:make-unexpected` specifies a 1-arg function which converts a string (of
unexpected characters) to a node. By defaut delegates to `:make-node`.
`:make-leaf` specifies a 1-arg function which converts a string (token) to a
node, by default behaves like identity.
parsley-clojure-0.9.3/epl-v10.html 0000664 0000000 0000000 00000030560 12470552204 0016722 0 ustar 00root root 0000000 0000000
Eclipse Public License - Version 1.0
Eclipse Public License - v 1.0
THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE
PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR
DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS
AGREEMENT.
1. DEFINITIONS
"Contribution" means:
a) in the case of the initial Contributor, the initial
code and documentation distributed under this Agreement, and
b) in the case of each subsequent Contributor:
i) changes to the Program, and
ii) additions to the Program;
where such changes and/or additions to the Program
originate from and are distributed by that particular Contributor. A
Contribution 'originates' from a Contributor if it was added to the
Program by such Contributor itself or anyone acting on such
Contributor's behalf. Contributions do not include additions to the
Program which: (i) are separate modules of software distributed in
conjunction with the Program under their own license agreement, and (ii)
are not derivative works of the Program.
"Contributor" means any person or entity that distributes
the Program.
"Licensed Patents" mean patent claims licensable by a
Contributor which are necessarily infringed by the use or sale of its
Contribution alone or when combined with the Program.
"Program" means the Contributions distributed in accordance
with this Agreement.
"Recipient" means anyone who receives the Program under
this Agreement, including all Contributors.
2. GRANT OF RIGHTS
a) Subject to the terms of this Agreement, each
Contributor hereby grants Recipient a non-exclusive, worldwide,
royalty-free copyright license to reproduce, prepare derivative works
of, publicly display, publicly perform, distribute and sublicense the
Contribution of such Contributor, if any, and such derivative works, in
source code and object code form.
b) Subject to the terms of this Agreement, each
Contributor hereby grants Recipient a non-exclusive, worldwide,
royalty-free patent license under Licensed Patents to make, use, sell,
offer to sell, import and otherwise transfer the Contribution of such
Contributor, if any, in source code and object code form. This patent
license shall apply to the combination of the Contribution and the
Program if, at the time the Contribution is added by the Contributor,
such addition of the Contribution causes such combination to be covered
by the Licensed Patents. The patent license shall not apply to any other
combinations which include the Contribution. No hardware per se is
licensed hereunder.
c) Recipient understands that although each Contributor
grants the licenses to its Contributions set forth herein, no assurances
are provided by any Contributor that the Program does not infringe the
patent or other intellectual property rights of any other entity. Each
Contributor disclaims any liability to Recipient for claims brought by
any other entity based on infringement of intellectual property rights
or otherwise. As a condition to exercising the rights and licenses
granted hereunder, each Recipient hereby assumes sole responsibility to
secure any other intellectual property rights needed, if any. For
example, if a third party patent license is required to allow Recipient
to distribute the Program, it is Recipient's responsibility to acquire
that license before distributing the Program.
d) Each Contributor represents that to its knowledge it
has sufficient copyright rights in its Contribution, if any, to grant
the copyright license set forth in this Agreement.
3. REQUIREMENTS
A Contributor may choose to distribute the Program in object code
form under its own license agreement, provided that:
a) it complies with the terms and conditions of this
Agreement; and
b) its license agreement:
i) effectively disclaims on behalf of all Contributors
all warranties and conditions, express and implied, including warranties
or conditions of title and non-infringement, and implied warranties or
conditions of merchantability and fitness for a particular purpose;
ii) effectively excludes on behalf of all Contributors
all liability for damages, including direct, indirect, special,
incidental and consequential damages, such as lost profits;
iii) states that any provisions which differ from this
Agreement are offered by that Contributor alone and not by any other
party; and
iv) states that source code for the Program is available
from such Contributor, and informs licensees how to obtain it in a
reasonable manner on or through a medium customarily used for software
exchange.
When the Program is made available in source code form:
a) it must be made available under this Agreement; and
b) a copy of this Agreement must be included with each
copy of the Program.
Contributors may not remove or alter any copyright notices contained
within the Program.
Each Contributor must identify itself as the originator of its
Contribution, if any, in a manner that reasonably allows subsequent
Recipients to identify the originator of the Contribution.
4. COMMERCIAL DISTRIBUTION
Commercial distributors of software may accept certain
responsibilities with respect to end users, business partners and the
like. While this license is intended to facilitate the commercial use of
the Program, the Contributor who includes the Program in a commercial
product offering should do so in a manner which does not create
potential liability for other Contributors. Therefore, if a Contributor
includes the Program in a commercial product offering, such Contributor
("Commercial Contributor") hereby agrees to defend and
indemnify every other Contributor ("Indemnified Contributor")
against any losses, damages and costs (collectively "Losses")
arising from claims, lawsuits and other legal actions brought by a third
party against the Indemnified Contributor to the extent caused by the
acts or omissions of such Commercial Contributor in connection with its
distribution of the Program in a commercial product offering. The
obligations in this section do not apply to any claims or Losses
relating to any actual or alleged intellectual property infringement. In
order to qualify, an Indemnified Contributor must: a) promptly notify
the Commercial Contributor in writing of such claim, and b) allow the
Commercial Contributor to control, and cooperate with the Commercial
Contributor in, the defense and any related settlement negotiations. The
Indemnified Contributor may participate in any such claim at its own
expense.
For example, a Contributor might include the Program in a commercial
product offering, Product X. That Contributor is then a Commercial
Contributor. If that Commercial Contributor then makes performance
claims, or offers warranties related to Product X, those performance
claims and warranties are such Commercial Contributor's responsibility
alone. Under this section, the Commercial Contributor would have to
defend claims against the other Contributors related to those
performance claims and warranties, and if a court requires any other
Contributor to pay any damages as a result, the Commercial Contributor
must pay those damages.
5. NO WARRANTY
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS
PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION,
ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY
OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely
responsible for determining the appropriateness of using and
distributing the Program and assumes all risks associated with its
exercise of rights under this Agreement , including but not limited to
the risks and costs of program errors, compliance with applicable laws,
damage to or loss of data, programs or equipment, and unavailability or
interruption of operations.
6. DISCLAIMER OF LIABILITY
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT
NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING
WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR
DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
7. GENERAL
If any provision of this Agreement is invalid or unenforceable under
applicable law, it shall not affect the validity or enforceability of
the remainder of the terms of this Agreement, and without further action
by the parties hereto, such provision shall be reformed to the minimum
extent necessary to make such provision valid and enforceable.
If Recipient institutes patent litigation against any entity
(including a cross-claim or counterclaim in a lawsuit) alleging that the
Program itself (excluding combinations of the Program with other
software or hardware) infringes such Recipient's patent(s), then such
Recipient's rights granted under Section 2(b) shall terminate as of the
date such litigation is filed.
All Recipient's rights under this Agreement shall terminate if it
fails to comply with any of the material terms or conditions of this
Agreement and does not cure such failure in a reasonable period of time
after becoming aware of such noncompliance. If all Recipient's rights
under this Agreement terminate, Recipient agrees to cease use and
distribution of the Program as soon as reasonably practicable. However,
Recipient's obligations under this Agreement and any licenses granted by
Recipient relating to the Program shall continue and survive.
Everyone is permitted to copy and distribute copies of this
Agreement, but in order to avoid inconsistency the Agreement is
copyrighted and may only be modified in the following manner. The
Agreement Steward reserves the right to publish new versions (including
revisions) of this Agreement from time to time. No one other than the
Agreement Steward has the right to modify this Agreement. The Eclipse
Foundation is the initial Agreement Steward. The Eclipse Foundation may
assign the responsibility to serve as the Agreement Steward to a
suitable separate entity. Each new version of the Agreement will be
given a distinguishing version number. The Program (including
Contributions) may always be distributed subject to the version of the
Agreement under which it was received. In addition, after a new version
of the Agreement is published, Contributor may elect to distribute the
Program (including its Contributions) under the new version. Except as
expressly stated in Sections 2(a) and 2(b) above, Recipient receives no
rights or licenses to the intellectual property of any Contributor under
this Agreement, whether expressly, by implication, estoppel or
otherwise. All rights in the Program not expressly granted under this
Agreement are reserved.
This Agreement is governed by the laws of the State of New York and
the intellectual property laws of the United States of America. No party
to this Agreement will bring a legal action under this Agreement more
than one year after the cause of action arose. Each party waives its
rights to a jury trial in any resulting litigation.
parsley-clojure-0.9.3/pom.xml 0000664 0000000 0000000 00000003353 12470552204 0016165 0 ustar 00root root 0000000 0000000
4.0.0
net.cgrand
parsley
0.9.1
parsley
a generator of total and truly incremental parsers
http://github.com/cgrand/parsley/
d24a5040fd8417371f17366dd3e16fdc37a41a9d
src
test
resources
test-resources
central
http://repo1.maven.org/maven2
clojars
http://clojars.org/repo/
org.clojure
clojure
[1.2.0,)
net.cgrand
regex
1.1.0
parsley-clojure-0.9.3/project.clj 0000664 0000000 0000000 00000000444 12470552204 0017006 0 ustar 00root root 0000000 0000000 (defproject net.cgrand/parsley "0.9.3"
:description "a generator of total and truly incremental parsers"
:url "http://github.com/cgrand/parsley/"
:license {:name "Eclipse Public License 1.0"}
:dependencies [[org.clojure/clojure "1.4.0"]
[net.cgrand/regex "1.1.0"]])
parsley-clojure-0.9.3/src/ 0000775 0000000 0000000 00000000000 12470552204 0015433 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/src/net/ 0000775 0000000 0000000 00000000000 12470552204 0016221 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/src/net/cgrand/ 0000775 0000000 0000000 00000000000 12470552204 0017457 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/src/net/cgrand/parsley.clj 0000664 0000000 0000000 00000007716 12470552204 0021643 0 ustar 00root root 0000000 0000000 ; Copyright (c) Christophe Grand. All rights reserved.
; The use and distribution terms for this software are covered by the
; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
; which can be found in the file epl-v10.html at the root of this distribution.
; By using this software in any fashion, you are agreeing to be bound by
; the terms of this license.
; You must not remove this notice, or any other, from this software.
(ns net.cgrand.parsley
"A total truly incremental parser generator. Grammars are expressed
in a value-based DSL."
(:require [net.cgrand.parsley.lrplus :as core]
[net.cgrand.parsley.fold :as f]
[net.cgrand.parsley.tree :as t]
[net.cgrand.parsley.util :as u]
[net.cgrand.parsley.grammar :as g]))
(defrecord Node [tag content]) ; for memory efficiency
(defn- stepper [table options-map]
(let [options-map (merge
{:make-node #(Node. %1 %2)
:make-leaf nil} ; nil for identity
options-map)
options-map (if-not (:make-unexpected options-map)
(let [make-node (:make-node options-map)
make-leaf (or (:make-leaf options-map) identity)]
(assoc options-map :make-unexpected
#(make-node ::unexpected [(make-leaf %)])))
options-map)
ops (select-keys options-map [:make-node :make-leaf :make-unexpected])]
^{::options options-map} ; feeling dirty, metadata make me uneasy
(fn self
([s]
(let [a (self core/zero s) b (self a nil)]
(-> (f/stitch a b) (nth 2) f/finish)))
([state s]
(core/step table ops state s)))))
(defn- flatten-rules [rules]
(if (map? rules)
(apply concat rules)
rules))
(defn make-parser [options-map rules]
(-> (apply g/grammar options-map (flatten-rules rules)) core/lr-table core/totalize
core/number-states (stepper options-map)))
(defn parser [options-map & rules]
(let [[options-map rules] (if-not (map? options-map)
[{} (cons options-map rules)]
[options-map rules])]
(make-parser options-map rules)))
(defn unspaced
"Creates an unspaced sequence."
[& specs]
(apply g/unspaced specs))
(defn- memoize-parser [f]
(let [cache (atom nil)]
(fn [input]
(u/cond
[last-result @cache
new-result (f/rebase last-result input)]
(if (identical? last-result new-result)
last-result
(reset! cache new-result))
(reset! cache (f input))))))
(defn- memoize1 [parser s]
(memoize-parser #(parser % s)))
(defn- memoize2 [mpa mpb]
(memoize-parser #(let [a (mpa %)
b (mpb a)]
(f/stitch a b))))
(defn- memoize-1shot [f]
(let [cache (atom [(Object.) nil])]
(fn [& args]
(let [[cargs cr] @cache]
(if (= args cargs)
cr
(let [r (apply f args)]
(reset! cache [args r])
r))))))
(defn- memoize-eof [parser]
(let [mp (memoize1 parser nil)]
(memoize-1shot #(-> (f/stitch % (mp %)) (nth 2) f/finish))))
(defn incremental-buffer
"Creates an empty incremental buffer for the specified parser."
[parser]
{:buffer (t/buffer {:unit #(memoize1 parser %)
:plus memoize2
:chunk #(.split ^String % "(?<=\n)")
:left #(subs %1 0 %2)
:right subs
:cat str})
:eof-parser (memoize-eof parser)
:options (::options (meta parser))})
(defn edit
"Returns a new buffer reflecting the specified edit."
[incremental-buffer offset length s]
(update-in incremental-buffer [:buffer] t/edit offset length s))
(defn parse-tree
"Returns the parse-tree."
[incremental-buffer]
(let [f (t/value (:buffer incremental-buffer))
a (f core/zero)]
((:eof-parser incremental-buffer) a)))
parsley-clojure-0.9.3/src/net/cgrand/parsley/ 0000775 0000000 0000000 00000000000 12470552204 0021136 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/src/net/cgrand/parsley/fold.clj 0000664 0000000 0000000 00000010273 12470552204 0022557 0 ustar 00root root 0000000 0000000 ; Copyright (c) Christophe Grand. All rights reserved.
; The use and distribution terms for this software are covered by the
; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
; which can be found in the file epl-v10.html at the root of this distribution.
; By using this software in any fashion, you are agreeing to be bound by
; the terms of this license.
; You must not remove this notice, or any other, from this software.
(ns net.cgrand.parsley.fold
(:require [net.cgrand.parsley.util :as u])
(:refer-clojure :exclude [cat]))
(defprotocol EphemeralFolding
(unexpected! [this s])
(node! [this tag n])
(leaf! [this s])
(cat! [this pfq]))
; nodes is a collection of nodes (returned by make-node)
; pending is an alternate collection of tag, coll of nodes, integer
(deftype FoldingQueue [^java.util.ArrayList pending ^java.util.ArrayList nodes
^java.util.ArrayList offsets
make-node make-leaf make-unexpected]
EphemeralFolding
(unexpected! [this s]
(when make-unexpected
(.add nodes (make-unexpected s))
this))
(node! [this tag N]
(let [n (.size offsets)]
(u/cond
(or (> N n) (neg? N))
(do
(doto pending
(.add tag)
(.add (vec nodes))
(.add (- N n)))
(.clear nodes)
(.clear offsets))
:let [m (- n N)
offset (.get offsets m)
_ (-> offsets (.subList (inc m) n) .clear)]
tag
(let [tail (.subList nodes offset (.size nodes))
children (vec (.toArray tail))]
(.clear tail)
(.add nodes (make-node tag children))))
this))
(leaf! [this s]
(let [leaf (if make-leaf (make-leaf s) s)
offset (.size nodes)]
(.add offsets offset)
(.add nodes leaf)
this))
(cat! [this pfq]
(doseq [[tag pnodes n] (partition 3 (:pending pfq))]
(.addAll nodes pnodes)
(.node! this tag n))
(let [n (.size nodes)]
(.addAll nodes (:nodes pfq))
(doseq [offset (:offsets pfq)]
(.add offsets (+ n offset))))
this)
clojure.lang.IDeref
(deref [this]
{:pending (vec pending) :nodes (vec nodes) :offsets (vec offsets)
:make-node make-node :make-leaf make-leaf :make-unexpected make-unexpected}))
(defn folding-queue
[{:keys [pending nodes offsets make-node make-leaf make-unexpected]
:or {pending [] nodes [] offsets []}}]
(FoldingQueue. (java.util.ArrayList. ^java.util.Collection pending)
(java.util.ArrayList. ^java.util.Collection nodes)
(java.util.ArrayList. ^java.util.Collection offsets)
make-node make-leaf make-unexpected))
(defn cat [a b]
@(cat! (folding-queue a) b))
(defn finish [pfq]
(u/cond
:let [{:keys [pending nodes offsets make-node]} pfq]
(and (seq nodes) (seq pending))
nil
[[x & xs] (seq nodes)]
(when-not xs x)
[[[tag pnodes n] & xs] (seq (partition 3 pending))]
(when (and (not xs) (neg? n) tag) (make-node tag pnodes))))
(defn stitchability
"Returns :full, or a number (the number of states on A stack which remains untouched)
when rebasing is possible or nil."
[a b]
(u/cond
:let [[a-end a-watermark a-events a-start] a
[b-end b-watermark b-events b-start] b]
(= a-end b-start) :full
:let [[a-stack a-rem] a-end
[b-stack b-rem] b-start]
:when (= a-rem b-rem)
:let [b-tail (subvec b-stack b-watermark)
n (- (count a-stack) (count b-tail))]
:when (not (neg? n))
(= b-tail (subvec a-stack n)) n))
(defn rebase [b a]
(u/cond
:when-let [st (stitchability a b)]
(= :full st) b
; if it's not full, it's partial
:let [[a-end] a
[b-end b-watermark b-events] b
[a-stack] a-end
[b-stack b-rem] b-end
b-tail (subvec b-stack b-watermark)
a-stub (subvec a-stack 0 st)]
[[(into (vec a-stub) b-tail) b-rem] st b-events a-end]))
(defn stitch
[a b]
(when (and a b)
(let [[a-end a-watermark a-events a-start] a
[b-end b-watermark b-events b-start] b]
[b-end (min a-watermark b-watermark) (cat a-events b-events) a-start])))
parsley-clojure-0.9.3/src/net/cgrand/parsley/functional_trees.clj 0000664 0000000 0000000 00000002331 12470552204 0025173 0 ustar 00root root 0000000 0000000 (ns net.cgrand.parsley.functional-trees
"Views on functional trees."
(:require [net.cgrand.parsley.views :as v]))
;; Functional trees are trees-as-functions (a subset of objects-as-functions)
;; A functional tree is a function of two arguments: leaf and node.
;; leaf and node are two functions:
;; * leaf takes one string and returns a value in a view domain,
;; * node takes a tag (keyword) and a collection of children functional trees
;; and returns a value in the same view domain as leaf's.
;; When a functional tree represent a leaf, leaf is called, when it's a inner
;; node, node is called.
;;
;; The purpose of these functional trees (over maps for example) is that they
;; can be memoizing and the caches are associated with the trees, not with
;; the functions.
;;
;; It's an expedient implementation, better impl welcome.
;;
;; Canonicalizing may be interesting for all keywords, single spaces etc.
(defn fleaf
"Constructs a functional leaf."
[s]
(memoize (fn [leaf node]
(leaf s))))
(defn fnode
"Constructs a functional node." [tag content]
(memoize (fn [leaf node]
(node tag content))))
(extend-type clojure.lang.Fn
v/ViewableNode
(compute [f leaf node]
(f leaf node)))
parsley-clojure-0.9.3/src/net/cgrand/parsley/grammar.clj 0000664 0000000 0000000 00000015407 12470552204 0023265 0 ustar 00root root 0000000 0000000 ; Copyright (c) Christophe Grand. All rights reserved.
; The use and distribution terms for this software are covered by the
; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
; which can be found in the file epl-v10.html at the root of this distribution.
; By using this software in any fashion, you are agreeing to be bound by
; the terms of this license.
; You must not remove this notice, or any other, from this software.
(ns net.cgrand.parsley.grammar
(:require [net.cgrand.parsley.util :as u]))
(alias 'p 'net.cgrand.parsley) ; avoid circular dependency
;; A grammar production right-hand side consists of a combination of:
;; * vectors (sequence)
;; * sets (alternatives)
;; * keywords (non-terminal or operators: :*, :+, :?)
;; * antything else is considered as a literal (a "matcher")
(defprotocol RuleFragment
(unsugar [fragment] "Remove sugarized forms")
(collect [fragment unspaced top-rulename] "Collect \"anonymous\" productions")
(develop [fragment rewrite space]
"normalize as a seq of seqs of keywords and terminals" ))
(defrecord Unspaced [item]
RuleFragment
(unsugar [this]
(Unspaced. (unsugar item)))
(collect [this unspaced top-rulename]
(collect item true top-rulename))
(develop [this rewrite space]
(rewrite item #{[]})))
(defn unspaced [& specs]
(Unspaced. (vec specs)))
(defrecord Root [item]
RuleFragment
(unsugar [this]
(Root. (unsugar item)))
(collect [this unspaced top-rulename]
(collect item unspaced top-rulename))
(develop [this rewrite space]
(for [s1 space x (rewrite item space) s2 (if (empty? x) #{[]} space)]
(concat s1 x s2))))
(defrecord Repeat+ [item]
RuleFragment
(unsugar [this]
(Repeat+. (unsugar item)))
(collect [this unspaced top-rulename]
(let [kw (keyword (gensym (str top-rulename "_repeat+_")))
alt #{[kw item] item}]
(cons [this kw (if unspaced (Unspaced. alt) alt)]
(collect item unspaced top-rulename)))))
;; 2. collect new rules
(extend-protocol RuleFragment
Object
(unsugar [this]
(if (= "" this) [] this))
(collect [this unspaced top-rulename]
nil)
(develop [this rewrite space]
[[this]])
;; a ref to another rule: add support for + ? or * suffixes
clojure.lang.Keyword
(unsugar [kw]
(if-let [[_ base suffix] (re-matches #"(.*?)([+*?])" (name kw))]
(unsugar [(keyword base) (keyword suffix)])
kw))
(collect [this unspaced top-rulename]
nil)
(develop [this rewrite space]
[[this]])
;; a set denotes an alternative
clojure.lang.IPersistentSet
(unsugar [this]
(set (map unsugar this)))
(collect [items unspaced top-rulename]
(mapcat #(collect % unspaced top-rulename) items))
(develop [items rewrite space]
(mapcat #(rewrite % space) items))
;; a vector denotes a sequence, supports postfix operators :+ :? and :*
clojure.lang.IPersistentVector
(unsugar [this]
(reduce #(condp = %2
:* (conj (pop %1) #{[] (Repeat+. (peek %1))})
:+ (conj (pop %1) (Repeat+. (peek %1)))
:? (conj (pop %1) #{[](peek %1)})
(conj %1 (unsugar %2))) [] this))
(collect [items unspaced top-rulename]
(mapcat #(collect % unspaced top-rulename) items))
(develop [items rewrite space]
(reduce #(for [x (rewrite %2 space) sp space xs %1]
(concat x (and (seq x) (seq xs) sp) xs))
[()] (rseq items))))
(defn collect-new-rules
"Collect new rules for new non-terminals corresponding to repeatitions."
[grammar]
(let [collected-rules
(mapcat (fn [[k v]] (collect v false (name k))) grammar)
rewrites (into {} (for [[op k] collected-rules] [op k]))
new-rules (set (vals rewrites))
grammar (into grammar (for [[_ k v] collected-rules
:when (new-rules k)] [k v]))]
[rewrites grammar]))
;; 3. develop-alts:
(defn normalize
"Normalize grammar as a map of non-terminals to set of seqs of
terminals and non-terminals"
([grammar] (normalize grammar nil {}))
([grammar space rewrites]
(let [helper (fn helper [item space]
(if-let [rw (rewrites item)]
[[rw]]
(develop item helper space)))
space (helper space #{[]})]
(into {} (for [[k v] grammar] [k (set (helper v space))])))))
;; 4. remove-empty-prods
(defn split-empty-prods [grammar]
[(into {}
(for [[k prods] grammar]
[k (set (remove empty? prods))]))
(into {}
(for [[k prods] grammar
:when (some empty? prods)]
[k [() [k]]]))])
(defn- inline-prods [prods replacement-map]
(letfn [(inline1 [prod]
(if-let [[x & xs] (seq prod)]
(for [a (replacement-map x [[x]]) b (inline1 xs)]
(concat a b))
[()]))]
(mapcat inline1 prods)))
(defn- inline-empty-prods* [grammar]
(let [[grammar empty-prods] (split-empty-prods grammar)]
(into {}
(for [[k prods] grammar]
[k (-> prods (inline-prods empty-prods) set (disj [k]))]))))
(defn inline-empty-prods [grammar]
(u/fix-point inline-empty-prods* grammar))
(defn- private? [kw]
(.endsWith (name kw) "-"))
(defn- basename [kw]
(if (private? kw)
(let [s (name kw)
n (subs s 0 (dec (count s)))]
(if-let [ns (namespace kw)]
(keyword ns n)
(keyword n)))
kw))
(defn- unalias [kw]
(keyword (namespace kw)
(let [n (name kw)]
(subs n (inc (.indexOf n "."))))))
;;;;;;;;;;;
(defn grammar
[options-map & rules]
(let [[options-map rules] (if-not (map? options-map)
[{} (cons options-map rules)]
[options-map rules])
rules (partition 2 rules)
public-rulenames (remove private? (map first rules))
{:keys [main space root-tag]
:or {main (first public-rulenames) root-tag ::p/root space #{[]}}}
options-map
public-rulenames (-> (zipmap public-rulenames (map unalias public-rulenames))
(assoc ::p/S root-tag))
rules (concat rules
[[::p/S (Root. main)] [::space (unspaced space)]])
grammar (into {} (for [[name specs] rules]
[(basename name) (unsugar specs)]))
[rewrites grammar] (collect-new-rules grammar)
space (::space grammar)
grammar (dissoc grammar ::space)
grammar (-> grammar
(normalize space rewrites)
(assoc ::canary #{[::p/S ::eof]})
inline-empty-prods)
matches-empty (contains? (grammar ::canary) [::eof])
grammar (dissoc grammar ::canary)]
[grammar public-rulenames matches-empty]))
parsley-clojure-0.9.3/src/net/cgrand/parsley/lrplus.clj 0000664 0000000 0000000 00000027520 12470552204 0023157 0 ustar 00root root 0000000 0000000 ; Copyright (c) Christophe Grand. All rights reserved.
; The use and distribution terms for this software are covered by the
; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
; which can be found in the file epl-v10.html at the root of this distribution.
; By using this software in any fashion, you are agreeing to be bound by
; the terms of this license.
; You must not remove this notice, or any other, from this software.
(ns net.cgrand.parsley.lrplus
"LR+ is LR(0) with contextual tokenizing."
(:require [net.cgrand.parsley.fold :as f]
[net.cgrand.parsley.util :as u]
[net.cgrand.parsley.stack :as st]
[net.cgrand.regex :as re]))
(alias 'p 'net.cgrand.parsley) ; avoid circular dependency
; I independently figured out a technique similar to the one described
; in this paper: Context-Aware Scanning for Parsing Extensible Languages
; http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible-Language
;; pushdown automaton
(defrecord TableState [token-matcher reduce gotos accept eof])
(defn table-state [shifts reduce goto accept & [eof]]
(assoc (TableState. nil reduce goto accept eof) :shifts shifts))
(def incomplete [-1])
(defn complete? [m]
(when-not (or (= m incomplete) (neg? (nth m 0))) m))
(defprotocol MatcherFactory
(matcher-fn [this id]))
(extend-protocol MatcherFactory
nil
(matcher-fn [this id]
(fn [^String s eof]
nil))
Character
(matcher-fn [this id]
(let [cv (int (.charValue this))]
(fn [^String s eof]
(cond
(zero? (.length s))
(when-not eof incomplete)
(== (int (.charAt s 0)) cv)
[1 id]))))
String
(matcher-fn [this id]
(let [n (.length this)]
(fn [^String s eof]
(cond
(.startsWith s this)
[n id]
eof
nil
(.startsWith this s)
incomplete))))
java.util.regex.Pattern
(matcher-fn [this id]
(fn [s eof]
(let [m (re-matcher this s)
found (.lookingAt m)]
(cond
(and (not eof) (.hitEnd m))
incomplete
found
[(.end m) id]))))
net.cgrand.regex.Regex
(matcher-fn [this id]
(matcher-fn (:re this) id))
clojure.lang.IFn ; TODO fix
(matcher-fn [this id]
(throw (RuntimeException. "TODO"))
(fn [s eof]
(this s eof))))
(defn array-union [matchers]
(let [matchers (to-array matchers)]
(fn [s eof]
(loop [i (alength matchers) r nil]
(u/cond
(zero? i) r
:let [i (unchecked-dec i)
m (aget matchers i)
mr (m s eof)]
(= incomplete mr) incomplete
(and r mr)
(throw (Exception. (str "Ambiguous match for " (pr-str s)
" by " (pr-str matchers))))
(recur i (or r mr)))))))
(defn prefix-matcher [matcher s]
(u/cond
:when-let [r (matcher s false)]
(complete? r) (constantly r)
matcher))
(defn- matchers-union [matchers]
(let [qtable (to-array
(map (fn [cp]
(let [s (str (char cp))
ms (keep #(prefix-matcher % s) matchers)]
(when (seq ms)
(if (next ms)
(array-union (to-array ms))
(first ms))))) (range 128)))
m (array-union (to-array matchers))
on-eof (m "" true)]
(fn [^String s eof]
(u/cond
(zero? (.length s))
(if eof on-eof incomplete)
:let [cp (.codePointAt s 0)]
(< cp (int 128))
(when-let [m (aget qtable cp)]
(m s eof))
(m s eof)))))
(defn matcher [terminals terminals-ids]
(u/cond
:let [ms (seq (map #(matcher-fn % (terminals-ids %)) (set terminals)))]
(next ms)
(matchers-union ms)
ms
(first ms)
(constantly nil)))
(defn- count-unexpected [tm s eof]
(loop [n 1]
(let [suf (subs s n)]
(if (or (tm suf eof) (empty? suf))
n
(recur (inc n))))))
(defn step1
"Returns [stack water-mark buffer events] where stack is the new stack,
water-mark the number of items at the bottom of the stack which didn't took
part in this step, buffer the remaining string to be tokenized, events the
parsing events."
[^objects table ops stack rem s]
(when (nil? stack)
(throw (IllegalStateException. "Can't accept more input past EOF.")))
(let [eof (nil? s)
s (or s "")
s (if (= "" rem) s (str rem s))
fq (f/folding-queue ops)
stack (st/stack stack)]
(loop [^String s s]
(u/cond
:when-let [state (st/peek! stack)
cs (aget table state)]
[[sym n tag] (and (zero? (.length s)) (:accept cs))]
(if eof
(do
(f/node! fq tag n)
[(assoc @stack :stack nil) "" @fq])
[@stack "" @fq])
[[sym n tag] (:reduce cs)]
(let [cs (aget table (-> stack (st/popN! n) st/peek!))]
(f/node! fq tag n)
(st/push! stack (aget ^objects (:gotos cs) sym))
(recur s))
:let [tm (:token-matcher cs)]
[r (tm s eof)]
(if-let [[n state] (complete? r)]
(let [token (subs s 0 n)
s (subs s n)]
(f/leaf! fq token)
(st/push! stack state)
(recur s))
[@stack s @fq])
(not (empty? s)) ; unexpected input
(let [n (count-unexpected tm s eof)]
(f/unexpected! fq (subs s 0 n))
(recur (subs s n)))
; unexpected eof
(let [[sym n tag] (:eof cs)
cs (aget table (-> stack (st/popN! n) st/peek!))
_ (f/node! fq tag n)]
(st/push! stack (aget ^objects (:gotos cs) sym))
(recur s))))))
(def zero [[[0] ""] 0 nil nil])
(defn step [table ops state s]
(u/when-let [[[stack rem :as start]] state
[{:keys [watermark stack]} rem events]
(step1 table ops stack rem s)]
[[stack rem] watermark events start]))
;; LR+ table construction
(defn close [init-states state]
(u/fix-point (fn [state]
(let [follows (map #(first (nth % 2)) state)]
(into state (mapcat init-states follows))))
(set state)))
(defn filter-keys [map pred]
(into {} (for [kv map :when (pred (key kv))] kv)))
(defn follow-map [state]
(apply merge-with into {}
(for [[k n prod] state] {(first prod) #{[k n (next prod)]}})))
(defn transitions [close tags state]
(u/cond
:let [follows (u/map-vals (follow-map state) #(close %2))
gotos (filter-keys follows keyword?)
shifts (filter-keys (dissoc follows nil) (complement gotos))
reduces (follows nil)
accepts (filter (fn [[s _ r]] (= 0 s)) reduces)
reduces (reduce disj reduces accepts)
reduction (when-let [[sym n] (first reduces)] [sym n (tags sym)])
accept (when-let [[sym n] (first accepts)] [sym -1 (tags sym)])]
(next reduces)
(throw (Exception. ^String (apply str "at state " state "\n reduce/reduce conflict " (interpose "\n" reduces))))
(and reduction (seq shifts))
(throw (Exception. (str "at state " state "\n shift/reduce conflict " shifts "\n" reduces)))
(table-state shifts reduction gotos accept)))
(defn to-states [{:keys [gotos shifts]}]
(concat (vals gotos) (vals shifts)))
(defn lr-table [[grammar tags matches-empty]]
(let [grammar (-> grammar (dissoc ::p/S)
(assoc 0 (::p/S grammar)))
tags (assoc tags 0 (tags ::p/S))
init-states (u/map-vals grammar #(set (for [prod %2] [%1 (count prod) prod])))
close (partial close init-states)
state0 (-> 0 init-states close)
transitions (partial transitions close tags)
table (loop [table {} todo #{state0}]
(if-let [state (first todo)]
(let [transition (transitions state)
table (assoc table state transition)
new-states (remove table (to-states transition))
todo (-> todo (disj state) (into new-states))]
(recur table todo))
table))
table (assoc table 0 (assoc (table state0) :accept (when matches-empty
[0 -1 (tags 0)])))
; state0 is unreachable by construction
table (dissoc table state0)]
table))
(defn- eof-reduction [state]
(reduce (fn [[mk mn :as best] [k n prod]]
(let [n (- n (count prod))]
(if (and best (>= mn n)) best [k n])))
nil state))
(defn- unfinished-state [public accept n]
(let [state #{[::p/unfinished (inc n) nil]}
state-id [::p/unfinished (boolean public) (boolean accept) n]
transitions (transitions identity (if public #{::p/unfinished} #{}) state)
transitions (if accept
(assoc transitions :accept [::p/unfinished -1 ::p/unfinished])
transitions)]
[state-id transitions]))
(defn totalize [table]
;; I wanted to make table the only input of totalize
;; that's why the tags map is recomputed
(let [tags (into {} (for [transition (vals table)
:let [[k _ tag] (or (:reduce transition) (:accept transition))]
:when tag]
[k tag]))]
(reduce
(fn [table [state transition]]
(u/cond
(:reduce transition)
table
(= 0 state)
(let [table (if-not (:accept transition)
(assoc-in table [state :accept] [::p/unfinished -1 ::p/unfinished])
table)
[ustate utransition :as ust] (unfinished-state (tags 0) true 0)
table (assoc-in table [state :gotos ::p/unfinished] ustate)]
(conj table ust))
:let [[k n] (eof-reduction state)
tag (when (tags k) ::p/unfinished)
[ustate utransition :as ust] (unfinished-state tag (= 0 k) n)]
(conj table ust
[state (-> transition
(assoc :eof [::p/unfinished n tag])
(assoc-in [:gotos ::p/unfinished] ustate))])))
table table)))
(defn number-states [table]
(let [; number states
table-without-start (dissoc table 0)
mapping (zipmap (cons 0 (keys table-without-start)) (range))
renum (fn [m] (reduce #(update-in %1 [%2] mapping) m (keys m)))
; compute matchers which return the shifted state id
token-matcher (memoize
(fn [shifts]
(matcher (keys shifts) (comp mapping shifts))))
syms (set (for [v (vals table) [x] [(:reduce v) (:eof v) (:accept v)] :when x] x))
syms-mapping (zipmap syms (range))
renum-action #(when-let [[sym n tag] %] [(syms-mapping sym) n tag])
empty-goto (vec (repeat (count syms) nil))
renum-gotosyms (fn [goto] (reduce (fn [goto [sym state]]
(assoc goto (syms-mapping sym) state))
empty-goto goto))]
(to-array
(for [{shifts :shifts gotos :gotos :as v}
(cons (get table 0) (vals table-without-start))]
(-> v
(dissoc :shifts)
(assoc :reduce (renum-action (:reduce v))
:eof (renum-action (:eof v))
:accept (renum-action (:accept v))
:token-matcher (token-matcher shifts)
:gotos (-> gotos renum renum-gotosyms to-array)))))))
(comment
(def g
{:E #{["(" :E+ ")"]
[#"\w+"]}
:E+ #{[:E+ :E]
[:E]}})
(let [t (lr-table [g :E identity])]
(step t zero "((hello)"))
)
parsley-clojure-0.9.3/src/net/cgrand/parsley/stack.clj 0000664 0000000 0000000 00000002236 12470552204 0022740 0 ustar 00root root 0000000 0000000 ; Copyright (c) Christophe Grand. All rights reserved.
; The use and distribution terms for this software are covered by the
; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
; which can be found in the file epl-v10.html at the root of this distribution.
; By using this software in any fashion, you are agreeing to be bound by
; the terms of this license.
; You must not remove this notice, or any other, from this software.
(ns net.cgrand.parsley.stack)
(defprotocol EphemeralStack
(push! [this x])
(popN! [this n])
(peek! [this]))
(deftype Stack [^java.util.ArrayList list
^{:unsynchronized-mutable true, :tag long} wm]
EphemeralStack
(push! [this x]
(.add list x)
this)
(popN! [this n]
(let [sz (.size list)
sl (.subList list (- sz (long n)) sz)]
(.clear sl)
this))
(peek! [this]
(let [i (unchecked-dec (.size list))]
(set! wm (Math/min (long wm) (long i)))
(.get list i)))
clojure.lang.IDeref
(deref [this]
{:watermark wm
:stack (vec list)}))
(defn stack [coll]
(Stack. (java.util.ArrayList. ^java.util.Collection coll) (count coll))) parsley-clojure-0.9.3/src/net/cgrand/parsley/tree.clj 0000664 0000000 0000000 00000013221 12470552204 0022566 0 ustar 00root root 0000000 0000000 (ns net.cgrand.parsley.tree
"An incremental buffer backed by a 2-3 tree."
(:require [net.cgrand.parsley.util :as u]))
(defprotocol Node
"Protocol for inner nodes and leaves of a 2-3 buffer.
Leaves contain collections (or strings or anything sequential and finite).
The buffer maintains a reduction over all its leaves contents.
The buffer is parametrized by a map of fns, see the ops method."
(len [node] "Returns the length of the Node")
(left-cut [node offset])
(right-cut [node offset])
(value [node] "The result value for this node.")
(children [node] "For inner nodes, return their children.")
(ops [node] "Returns a map of fns with keys:
:unit [leaf-content]
turns a leaf content into a value from the result type,
:plus [a b] (associative fn)
combines two values from the result type into a value of the result type,
:chunk [leaf-content]
breaks a leaf content into a seq of leaf contents -- it controls the computational
granularity of the buffer.
:left [leaf-content offset]
returns the part of a leaf content to the left of the offset,
:right [leaf-content offset]
returns the part of a leaf content to the right of the offset,
:cat [& leaf-contents]
returns teh concatenation of the seq of leaf-contents, with no args MUST returns the proper
identity element (eg \"\" for str, () or [] or nil for concat),
:len [leaf-content]
returns the length of leaf-content."))
(defrecord Ops [unit plus chunk left right cat len])
(defn as-ops [options]
(if (instance? Ops options)
options
(into (Ops. nil nil nil nil nil nil count) options)))
(deftype InnerNode [ops val length a b c]
Node
(left-cut [this offset]
(u/cond
:let [la (len a)]
(<= offset la) (conj (left-cut a offset) nil)
:let [offset (- offset la)
lb (len b)]
(<= offset lb) (conj (left-cut b offset) [a])
:let [offset (- offset lb)]
:else (conj (left-cut c offset) [a b])))
(right-cut [this offset]
(u/cond
:let [la (len a)]
(< offset la) (conj (right-cut a offset) (if c [b c] [b]))
:let [offset (- offset la)
lb (len b)]
(or (< offset lb) (nil? c)) (conj (right-cut b offset) (when c [c]))
:let [offset (- offset lb)]
:else (conj (right-cut c offset) nil)))
(len [this]
length)
(value [this]
val)
(children [node] (if c [a b c] [a b]))
(ops [this]
ops))
(defn node [ops children]
(let [[a b c] children
plus (:plus ops)
val (plus (value a) (value b))
val (if c (plus val (value c)) val)]
(InnerNode. ops val (+ (len a) (len b) (if c (len c) 0)) a b c)))
(deftype Leaf [ops val s]
Node
(left-cut [this offset]
[((:left ops) s offset)])
(right-cut [this offset]
[((:right ops) s offset)])
(len [this]
((:len ops) s))
(value [this]
val)
(ops [this]
ops))
(defn leaf [ops s]
(Leaf. ops ((:unit ops) s) s))
(defn group
"Groups a sequence of at least two nodes into a sequence of nodes with 2 or 3 children."
[nodes]
(let [ops (ops (first nodes))]
(if (odd? (count nodes))
(cons (node ops (take 3 nodes))
(map #(node ops %) (partition 2 (drop 3 nodes))))
(map #(node ops %) (partition 2 nodes)))))
(defn- left-borrow [lefts]
(or (u/cond
:when-let [[nodes & xs] (seq lefts)]
[r (peek nodes)]
(list* (children r) (pop nodes) xs)
:let [[nodes & xs] (left-borrow xs)]
[r (peek nodes)]
(list* (children r) (pop nodes) xs))
(cons nil lefts)))
(defn- right-borrow [rights]
(or (u/cond
:when-let [[nodes & xs] (seq rights)]
[l (first nodes)]
(list* (children l) (next nodes) xs)
:let [[nodes & xs] (right-borrow xs)]
[l (first nodes)]
(list* (children l) (next nodes) xs))
(cons nil rights)))
(defn edit
"Performs an edit on the buffer. Content from offset to offset+length (excluded) is replaced
by s."
[tree offset length s]
(let [[sl & lefts] (left-cut tree offset)
[sr & rights] (right-cut tree (+ offset length))
ops (ops tree)
s ((:cat ops) sl s sr)
leaves (map #(leaf ops %) ((:chunk ops) s))]
(loop [[l & ls :as lefts] lefts [r & rs :as rights] rights nodes leaves]
#_(assert (seq nodes))
(u/cond
:let [nodes (concat l nodes r)]
(next nodes)
(recur ls rs (group nodes))
; it means that nodes has only one item and both l and r are empty
:let [lefts (left-borrow ls)]
(first lefts)
(recur lefts rights nodes)
:let [rights (right-borrow rs)]
(first rights)
(recur lefts rights nodes)
; nothing left, we have a new root!
(first nodes)))))
(defn buffer
([ops]
(let [ops (as-ops ops)]
(leaf ops ((:cat ops)))))
([ops s]
(-> (buffer ops) (edit 0 0 s))))
;;;;;;;;;;;;;;;
(comment ; demo
(def str-buffer (partial buffer {:unit identity
:plus str
:chunk #(.split ^String % "(?<=\n)")
:left #(subs %1 0 %2)
:right subs
:cat str}))
(defprotocol Treeable
(tree [treeable]))
(extend-protocol Treeable
Leaf
(tree [leaf] (.s leaf))
InnerNode
(tree [nv] (map tree (if (.c nv) [(.a nv) (.b nv) (.c nv)] [(.a nv) (.b nv)]))))
;; repl session
=> (-> "a\nb" str-buffer (edit 1 0 "c") ((juxt tree value)))
("ac\n" "b")
=> (-> "a\nb" str-buffer (edit 1 0 "cd") ((juxt tree value)))
("acd\n" "b")
=> (-> "a\nb" str-buffer (edit 1 0 "cd") (edit 2 0 "\n\n") ((juxt tree value)))
(("ac\n" "\n") ("d\n" "b"))) parsley-clojure-0.9.3/src/net/cgrand/parsley/util.clj 0000664 0000000 0000000 00000004056 12470552204 0022612 0 ustar 00root root 0000000 0000000 (ns net.cgrand.parsley.util
"Some functions and a collection of variations on Clojure's core macros.
Let's see which features end up being useful."
{:author "Christophe Grand"}
(:refer-clojure :exclude [cond when-let if-let]))
(defmacro if-let
"A variation on if-let where all the exprs in the bindings vector must be true.
Also supports :let."
([bindings then]
`(if-let ~bindings ~then nil))
([bindings then else]
(if (seq bindings)
(if (= :let (bindings 0))
`(let ~(bindings 1)
(if-let ~(subvec bindings 2) ~then ~else))
`(let [test# ~(bindings 1)]
(if test#
(let [~(bindings 0) test#]
(if-let ~(subvec bindings 2) ~then ~else))
~else)))
then)))
(defmacro when-let
"A variation on when-let where all the exprs in the bindings vector must be true.
Also supports :let."
[bindings & body]
`(if-let ~bindings (do ~@body)))
(defmacro cond
"A variation on cond which sports let bindings and implicit else:
(cond
(odd? a) 1
:let [a (quot a 2)]
(odd? a) 2
3).
Also supports :when-let and binding vectors as test expressions."
[& clauses]
(when-let [[test expr & more-clauses] (seq clauses)]
(if (next clauses)
(if-let [sym ({:let `let :when `when :when-let `when-let} test)]
(list sym expr `(cond ~@more-clauses))
(if (vector? test)
`(if-let ~test ~expr (cond ~@more-clauses))
`(if ~test ~expr (cond ~@more-clauses))))
test)))
(comment ;if one could define cond with itself it would read:
(cond
:when-let [[test expr & more-clauses] (seq clauses)]
(not (next clauses)) test
[sym ({:let `let :when `when :when-let `when-let} test)]
(list sym expr `(cond ~@more-clauses))
(vector? test)
`(if-let ~test ~expr (cond ~@more-clauses))
`(if ~test ~expr (cond ~@more-clauses))))
(defn map-vals [map f]
(into map (for [[k v] map] [k (f k v)])))
(defn fix-point [f init]
(let [v (f init)]
(if (= v init)
init
(recur f v))))
parsley-clojure-0.9.3/src/net/cgrand/parsley/views.clj 0000664 0000000 0000000 00000005466 12470552204 0023000 0 ustar 00root root 0000000 0000000 (ns net.cgrand.parsley.views)
; Right now compute is responsible for memoization
(defprotocol ViewableNode
(compute [n leaf node]
"Applies either the leaf or node function to the node at hand.
The leaf function expects one argument: a string.
The node function expects two arguments: a keyword and a collection of
children nodes."))
; TODO: make Node and String Viewable + memoization
(defn view
"Creates a simple recursive view function. The leaf function is passed
the string stored into a leaf and the node function gets the tag and
the sequence of values returned by the view recursively called on the
children nodes."
[leaf node]
(letfn [(node* [tag fs]
(node tag (map v fs)))
(v [n]
(compute n leaf node*))]
v))
(defn nonrec-view
"Creates a view function. Unlike view, the second arg passed to node is a
collection of children functional trees."
[leaf node]
(fn [n] (compute n leaf node)))
(def content
"View that returns the nodes (or nil) of a tree. Unlike :content works on
all viewable trees"
(nonrec-view (constantly nil) (fn [_ nodes] nodes)))
(def tag
"View that returns the tag (or nil) of a functional tree. Unlike :tag works
on all viewable trees."
(nonrec-view (constantly nil) (fn [tag _] tag)))
(def text
(view identity #(apply str %2)))
(def length
(view count #(reduce + %2)))
(def offsets
"View which returns a sorted-map of relative end offsets to children nodes."
(nonrec-view (constantly nil)
(fn [_ nodes]
(into (sorted-map) (next (reductions
(fn [[offset _] node]
[(+ offset (length node)) node])
[0 nil]
nodes))))))
;; zipper on viewable trees
#_(defn fzip [root]
(z/zipper content content (fn [node children]
(fnode (tag node) (vec children)))))
;; we need indexed/measured zippers!
#_(defn offset-at
"Returns the offset at the *start* of the node pointed by the loc. "
[loc]
(if-let [ploc (z/up loc)]
(let [roffsets (offsets (z/node ploc))
n (count (z/lefts loc))
roffset (if (zero? n) 0 (key (first (drop (dec n) roffsets))))]
(+ roffset (offset-at ploc)))
0))
(defn path-to [node offset]
(loop [node node offset offset path []]
(if-let [offsets (offsets node)]
(let [[o n] (first (subseq offsets >= offset))
so (- o (length n))
ro (- offset so)]
(recur n ro (conj path [node offset])))
(conj path [node offset]))))
#_(defn loc-at [loc offset]
(if (and (z/branch? loc) (seq (z/children loc)))
(let [cloc (z/down loc)
lefts (subseq (offsets loc) < offset)])
)
)
parsley-clojure-0.9.3/test/ 0000775 0000000 0000000 00000000000 12470552204 0015623 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/test/net/ 0000775 0000000 0000000 00000000000 12470552204 0016411 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/test/net/cgrand/ 0000775 0000000 0000000 00000000000 12470552204 0017647 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/test/net/cgrand/parsley/ 0000775 0000000 0000000 00000000000 12470552204 0021326 5 ustar 00root root 0000000 0000000 parsley-clojure-0.9.3/test/net/cgrand/parsley/test.clj 0000664 0000000 0000000 00000006351 12470552204 0023004 0 ustar 00root root 0000000 0000000 (ns net.cgrand.parsley.test
(:require [net.cgrand.parsley :as p])
(:require [net.cgrand.parsley.lrplus :as core])
(:require [net.cgrand.parsley.util :as u])
(:require [net.cgrand.parsley.views :as v])
(:require [net.cgrand.parsley.functional-trees :as f])
(:use clojure.test))
(defn- unexpected? [x]
(when (and (vector? x) (= ::p/unexpected (first x)))
(second x)))
(defn v [node]
(if (map? node)
(reduce (fn [v x]
(u/if-let [b (unexpected? x)
a (unexpected? (peek v))]
(conj (pop v) [::p/unexpected (str a b)])
(conj v x)))
[(:tag node)] (map v (:content node)))
node))
(deftest empty-grammar
(let [eg (p/parser {:main []})]
(are [s t] (= (v (eg s)) t)
"" [::p/root]
"abcdef" [::p/root [::p/unexpected "abcdef"]]
" " [::p/root [::p/unexpected " "]]
" a " [::p/root [::p/unexpected " a "]])))
(deftest empty-whitespaced-grammar
(let [eg (p/parser {:main []
:space :ws?
:root-tag :root}
:ws #" +")]
(are [s t] (= (v (eg s)) t)
"" [:root]
"abcdef" [:root [::p/unexpected "abcdef"]]
" " [:root [:ws " "]]
" a " [:root [:ws " "] [::p/unexpected "a "]])))
(def sexpr (p/parser {:main :expr*
:space :ws?
:root-tag :root}
:ws #"\s+"
:expr- #{:vector :list :map :set :symbol}
:symbol #"[a-zA-Z-]+"
:vector ["[" :expr* "]"]
:list ["(" :expr* ")"]
:map ["{" :expr* "}"]
:set ["#{" :expr* "}"]))
(deftest sexpr-once
(are [s t] (= (v (sexpr s)) t)
"" [:root]
"hello world" [:root [:symbol "hello"] [:ws " "] [:symbol "world"]]
" hello " [:root [:ws " "] [:symbol "hello"] [:ws " "]]
"(hello #{world kitty})" [:root [:list "(" [:symbol "hello"] [:ws " "]
[:set "#{" [:symbol "world"] [:ws " "]
[:symbol "kitty"] "}"] ")"]]
"(hello #{world kitty])" [::p/unfinished
[::p/unfinished "(" [:symbol "hello"] [:ws " "]
[::p/unfinished "#{" [:symbol "world"] [:ws " "]
[:symbol "kitty"] [::p/unexpected "])"]]]]
"hello 123 world" [:root [:symbol "hello"] [:ws " "] [::p/unexpected "123 "]
[:symbol "world"]]))
;; views
(def fexpr (p/parser {:main :expr*
:space :ws?
:root-tag :root
:make-node f/fnode
:make-leaf f/fleaf}
:ws #"\s+"
:expr- #{:vector :list :map :set :symbol}
:symbol #"[a-zA-Z-]+"
:vector ["[" :expr* "]"]
:list ["(" :expr* ")"]
:map ["{" :expr* "}"]
:set ["#{" :expr* "}"]))
(def input "(hello #{world kitty})")
(def ftree (fexpr input))
(deftest views
(are [v r] (= (v ftree) r)
v/length (count input)
v/text input
(v/view (constantly 0) (fn [_ xs] (reduce + 1 xs))) 8))
(deftest path-to
(is (= (-> (v/path-to ftree 10) peek first v/text) "world")))