pax_global_header00006660000000000000000000000064136121533140014511gustar00rootroot0000000000000052 comment=5346231e14d31ab5bbc3cbc014a31b405d40ef39 lark-0.8.1/000077500000000000000000000000001361215331400124505ustar00rootroot00000000000000lark-0.8.1/.gitignore000066400000000000000000000001371361215331400144410ustar00rootroot00000000000000*.pyc *.pyo /.tox /lark_parser.egg-info/** tags .vscode .idea .ropeproject .cache /dist /build lark-0.8.1/.gitmodules000066400000000000000000000001721361215331400146250ustar00rootroot00000000000000[submodule "tests/test_nearley/nearley"] path = tests/test_nearley/nearley url = https://github.com/Hardmath123/nearley lark-0.8.1/.travis.yml000066400000000000000000000002541361215331400145620ustar00rootroot00000000000000dist: xenial language: python python: - "2.7" - "3.4" - "3.5" - "3.6" - "3.7" - "pypy2.7-6.0" - "pypy3.5-6.0" install: pip install tox-travis script: - tox lark-0.8.1/LICENSE000066400000000000000000000020401361215331400134510ustar00rootroot00000000000000Copyright © 2017 Erez Shinan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. lark-0.8.1/MANIFEST.in000066400000000000000000000002551361215331400142100ustar00rootroot00000000000000include README.md LICENSE docs/* examples/*.py examples/*.png examples/*.lark tests/*.py tests/*.lark tests/grammars/* tests/test_nearley/*.py tests/test_nearley/grammars/* lark-0.8.1/README.md000066400000000000000000000176311361215331400137370ustar00rootroot00000000000000# Lark - a modern parsing library for Python Parse any context-free grammar, FAST and EASY! **Beginners**: Lark is not just another parser. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs a parse-tree for you, without additional code on your part. **Experts**: Lark implements both Earley(SPPF) and LALR(1), and several different lexers, so you can trade-off power and speed, according to your requirements. It also provides a variety of sophisticated features and utilities. Lark can: - Parse all context-free grammars, and handle any ambiguity - Build a parse-tree automagically, no construction code required - Outperform all other Python libraries when using LALR(1) (Yes, including PLY) - Run on every Python interpreter (it's pure-python) - Generate a stand-alone parser (for LALR(1) grammars) And many more features. Read ahead and find out. Most importantly, Lark will save you time and prevent you from getting parsing headaches. ### Quick links - [Documentation @readthedocs](https://lark-parser.readthedocs.io/) - [Cheatsheet (PDF)](/docs/lark_cheatsheet.pdf) - [Tutorial](/docs/json_tutorial.md) for writing a JSON parser. - Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - [Gitter chat](https://gitter.im/lark-parser/Lobby) ### Install Lark $ pip install lark-parser Lark has no dependencies. [![Build Status](https://travis-ci.org/lark-parser/lark.svg?branch=master)](https://travis-ci.org/lark-parser/lark) ### Syntax Highlighting Lark provides syntax highlighting for its grammar files (\*.lark): - [Sublime Text & TextMate](https://github.com/lark-parser/lark_syntax) - [vscode](https://github.com/lark-parser/vscode-lark) ### Clones - [Lerche (Julia)](https://github.com/jamesrhester/Lerche.jl) - an unofficial clone, written entirely in Julia. ### Hello World Here is a little program to parse "Hello, World!" (Or any other similar phrase): ```python from lark import Lark l = Lark('''start: WORD "," WORD "!" %import common.WORD // imports from terminal library %ignore " " // Disregard spaces in text ''') print( l.parse("Hello, World!") ) ``` And the output is: ```python Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')]) ``` Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark. ### Fruit flies like bananas Lark is great at handling ambiguity. Let's parse the phrase "fruit flies like bananas": ![fruitflies.png](examples/fruitflies.png) See more [examples here](https://github.com/lark-parser/lark/tree/master/examples) ## List of main features - Builds a parse-tree (AST) automagically, based on the structure of the grammar - **Earley** parser - Can parse all context-free grammars - Full support for ambiguous grammars - **LALR(1)** parser - Fast and light, competitive with PLY - Can generate a stand-alone parser - **CYK** parser, for highly ambiguous grammars (NEW! Courtesy of [ehudt](https://github.com/ehudt)) - **EBNF** grammar - **Unicode** fully supported - **Python 2 & 3** compatible - Automatic line & column tracking - Standard library of terminals (strings, numbers, names, etc.) - Import grammars from Nearley.js - Extensive test suite [![codecov](https://codecov.io/gh/erezsh/lark/branch/master/graph/badge.svg)](https://codecov.io/gh/erezsh/lark) - And much more! See the full list of [features here](https://lark-parser.readthedocs.io/en/latest/features/) ### Comparison to other libraries #### Performance comparison Lark is the fastest and lightest (lower is better) ![Run-time Comparison](docs/comparison_runtime.png) ![Memory Usage Comparison](docs/comparison_memory.png) Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more details on how the comparison was made. *Note: I really wanted to add PLY to the benchmark, but I couldn't find a working JSON parser anywhere written in PLY. If anyone can point me to one that actually works, I would be happy to add it!* *Note 2: The parsimonious code has been optimized for this specific test, unlike the other benchmarks (Lark included). Its "real-world" performance may not be as good.* #### Feature comparison | Library | Algorithm | Grammar | Builds tree? | Supports ambiguity? | Can handle every CFG? | Line/Column tracking | Generates Stand-alone |:--------|:----------|:----|:--------|:------------|:------------|:----------|:---------- | **Lark** | Earley/LALR(1) | EBNF | Yes! | Yes! | Yes! | Yes! | Yes! (LALR only) | | [PLY](http://www.dabeaz.com/ply/) | LALR(1) | BNF | No | No | No | No | No | | [PyParsing](http://pyparsing.wikispaces.com/) | PEG | Combinators | No | No | No\* | No | No | | [Parsley](https://pypi.python.org/pypi/Parsley) | PEG | EBNF | No | No | No\* | No | No | | [Parsimonious](https://github.com/erikrose/parsimonious) | PEG | EBNF | Yes | No | No\* | No | No | | [ANTLR](https://github.com/antlr/antlr4) | LL(*) | EBNF | Yes | No | Yes? | Yes | No | (\* *PEGs cannot handle non-deterministic grammars. Also, according to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs*) ### Projects using Lark - [storyscript](https://github.com/storyscript/storyscript) - The programming language for Application Storytelling - [tartiflette](https://github.com/dailymotion/tartiflette) - a GraphQL engine by Dailymotion. Lark is used to parse the GraphQL schemas definitions. - [Hypothesis](https://github.com/HypothesisWorks/hypothesis) - Library for property-based testing - [mappyfile](https://github.com/geographika/mappyfile) - a MapFile parser for working with MapServer configuration - [synapse](https://github.com/vertexproject/synapse) - an intelligence analysis platform - [Datacube-core](https://github.com/opendatacube/datacube-core) - Open Data Cube analyses continental scale Earth Observation data through time - [SPFlow](https://github.com/SPFlow/SPFlow) - Library for Sum-Product Networks - [Torchani](https://github.com/aiqm/torchani) - Accurate Neural Network Potential on PyTorch - [Command-Block-Assembly](https://github.com/simon816/Command-Block-Assembly) - An assembly language, and C compiler, for Minecraft commands - [Fabric-SDK-Py](https://github.com/hyperledger/fabric-sdk-py) - Hyperledger fabric SDK with Python 3.x - [required](https://github.com/shezadkhan137/required) - multi-field validation using docstrings - [miniwdl](https://github.com/chanzuckerberg/miniwdl) - A static analysis toolkit for the Workflow Description Language - [pytreeview](https://gitlab.com/parmenti/pytreeview) - a lightweight tree-based grammar explorer Using Lark? Send me a message and I'll add your project! ### How to use Nearley grammars in Lark Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments. Here's an example: ```bash git clone https://github.com/Hardmath123/nearley python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py ``` You can use the output as a regular python module: ```python >>> import ncalc >>> ncalc.parse('sin(pi/4) ^ e') 0.38981434460254655 ``` ## License Lark uses the [MIT license](LICENSE). (The standalone tool is under GPL2) ## Contribute Lark is currently accepting pull-requests. See [How to develop Lark](/docs/how_to_develop.md) ## Donate If you like Lark and feel like donating, you can do so at my [patreon page](https://www.patreon.com/erezsh). If you wish for a specific feature to get a higher priority, you can request it in a follow-up email, and I'll consider it favorably. ## Contact If you have any questions or want my assistance, you can email me at erezshin at gmail com. I'm also available for contract work. -- [Erez](https://github.com/erezsh) lark-0.8.1/docs/000077500000000000000000000000001361215331400134005ustar00rootroot00000000000000lark-0.8.1/docs/classes.md000066400000000000000000000116111361215331400153570ustar00rootroot00000000000000# Classes Reference This page details the important classes in Lark. ---- ## lark.Lark The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor. #### \_\_init\_\_(self, grammar, **options) The Lark class accepts a grammar string or file object, and keyword options: * **start** - A list of the rules in the grammar that begin the parse (Default: `["start"]`) * **parser** - Decides which parser engine to use, "earley", "lalr" or "cyk". (Default: `"earley"`) * **lexer** - Overrides default lexer, depending on parser. * **transformer** - Applies the provided transformer instead of building a parse tree (only allowed with parser="lalr") * **postlex** - Lexer post-processing (Default: `None`. only works when lexer is "standard" or "contextual") * **ambiguity** (only relevant for earley and cyk) * "explicit" - Return all derivations inside an "_ambig" data node. * "resolve" - Let the parser choose the best derivation (greedy for tokens, non-greedy for rules. Default) * **debug** - Display warnings (such as Shift-Reduce warnings for LALR) * **keep_all_tokens** - Don't throw away any terminals from the tree (Default=`False`) * **propagate_positions** - Propagate line/column count to tree nodes, at the cost of performance (default=`False`) * **maybe_placeholders** - The `[]` operator returns `None` when not matched. Setting this to `False` makes it behave like the `?` operator, and return no value at all, which may be a little faster (default=`True`) * **lexer_callbacks** - A dictionary of callbacks of type f(Token) -> Token, used to interface with the lexer Token generation. Only works with the standard and contextual lexers. See [Recipes](recipes.md) for more information. #### parse(self, text) Return a complete parse tree for the text (of type Tree) If a transformer is supplied to `__init__`, returns whatever is the result of the transformation. ---- ## Tree The main tree class * `data` - The name of the rule or alias * `children` - List of matched sub-rules and terminals * `meta` - Line & Column numbers (if `propagate_positions` is enabled) * meta attributes: `line`, `column`, `start_pos`, `end_line`, `end_column`, `end_pos` #### \_\_init\_\_(self, data, children) Creates a new tree, and stores "data" and "children" in attributes of the same name. #### pretty(self, indent_str=' ') Returns an indented string representation of the tree. Great for debugging. #### find_pred(self, pred) Returns all nodes of the tree that evaluate pred(node) as true. #### find_data(self, data) Returns all nodes of the tree whose data equals the given data. #### iter_subtrees(self) Depth-first iteration. Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG). #### iter_subtrees_topdown(self) Breadth-first iteration. Iterates over all the subtrees, return nodes in order like pretty() does. #### \_\_eq\_\_, \_\_hash\_\_ Trees can be hashed and compared. ---- ## Token When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes: * `type` - Name of the token (as specified in grammar). * `pos_in_stream` - the index of the token in the text * `line` - The line of the token in the text (starting with 1) * `column` - The column of the token in the text (starting with 1) * `end_line` - The line where the token ends * `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5. * `end_pos` - the index where the token ends (basically pos_in_stream + len(token)) ## Transformer ## Visitor ## Interpreter See the [visitors page](visitors.md) ## UnexpectedInput ## UnexpectedToken ## UnexpectedException - `UnexpectedInput` - `UnexpectedToken` - The parser recieved an unexpected token - `UnexpectedCharacters` - The lexer encountered an unexpected string After catching one of these exceptions, you may call the following helper methods to create a nicer error message: #### get_context(text, span) Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it. (The parser doesn't hold a copy of the text it has to parse, so you have to provide it again) #### match_examples(parse_fn, examples) Allows you to detect what's wrong in the input text by matching against example errors. Accepts the parse function (usually `lark_instance.parse`) and a dictionary of `{'example_string': value}`. The function will iterate the dictionary until it finds a matching error, and return the corresponding value. For an example usage, see: [examples/error_reporting_lalr.py](https://github.com/lark-parser/lark/blob/master/examples/error_reporting_lalr.py) lark-0.8.1/docs/comparison_memory.png000066400000000000000000000647301361215331400176620ustar00rootroot00000000000000PNG  IHDRR@sBIT|d IDATxy|Y䄬 @l* `EB+jbmjZj{\ϲA@Ed%,a !!d!9Hr1CI@X˗33=3!""""b """"H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLwDDDZS__kÇ}Z_UU3<5͚5_v^o}nY_Wznݺ]ņ ٽ{7|>իGcǨn%߿ķu+:thV(D4m[F!rΝ~RĬl,X@MMB ꫯ9s& h^Z:uLhh(tԉn|C!Rwgvۛ}vh\)ypH}}=RWWǛo?LNڹ/227ӹsgV/rP9ʼn'"55oСCԪn3dnݘ;wѳV]]fŊOמmW=V۹E"r*HoDFFRQQ%#c{cv|>;vx^6nȖ-[(..>|8z)O?f=8q&L@~>ѣl6z)S۷fݺudddPRR#..!Cpuaٌ֭`ڴiAuu5x7<xxꩧ` 8}kqÆ lٲB^/>}pM7W>}\.ҥ cǎm{zjZ$''3i$7:{/$??'NбcGRSSEvv; [o_f_Yx<㚝뵽뮻Z]B0 )S&zYcvΤ(znusXQ?::Ç3f믍AكO?2|A"##7o?XUV{w&eeeXVHKKcF̜÷~m۶ 'N`ݺuD~)L6^w+B7ݻ7{aϞ,jkk; #1[!q&iDFrAf͚l ^|)//7s 5~̤_Weuu5 yyy~#//￿^ƕ+WR]] bo߾s n7=z0d@@;Y=7>7xݻw-?~8ׯgǎG?">>}僆Mvv6ӧO3ʶmoݻwsA©ӗ-+**bdff䴁";{ߗfuu7l5;Xnz4=G;vetCו?~Xx'??]v8'Gvٲe3w}QJKDQWWg̲n jJK --ñt#@0dP lmddd4[vSSSÐ!CIKdx^b  >޽{7iS_,Yj|@2h` H5uA> #""nӯ_Q{w:+~R._mHw^cvUU~p,nqMqw6^zz:'N0uҥFY2d(4 cp1>S/ 7 [o5i 68/my^FŌ3M9|0`z4wjfժUFѣse̙DD4P9r8;vDFIΝ{ "WD|H]]۷ogȐ!~#fժU?q6m2?$%%믿ƍ[|$~㾯UVb y÷IKKܯ?t)pu8w}~!#-m ػw>ž={\G5^:AHtt4K,0>KJJ ::YX, ¡Cq\ՑK޽ٶm`BCC~ 7駟oVt1oޟc۶L~{xx|m߾};w.:~?fXL 4^xh)䲡C; é3-[֭єje~3[ io^z =7n0zdbXx<>G~uݻ $** 9lUUUlݺ;sһwo|>?AAAkܨsu*3t:+4쯫㫯2_{ڞKStnzk׮e۶m%%%^m۶#CBHu:M6rcsƐoK7C#r|Caa'CKJ7+oiԡV#F0bjjj8pk֬!''Mzz:{&33,ưwrrrZGQљ'E~~5Xh͆srrxm[hơo|0a&L@YY9{aՔSVVƚ53mڴK\4;[^!C3gRoE=nW^TTʕ+/H{a>x=|YYYFXZ~uYڍg;;w﷭+ONPر#QQ ã/"~GQQ!!! 0Fcj:")),>ŠxuaئaQTomm-~{[t߰vuu5˗/7ٓ vmEQQq͛7e]v_sQUUe}<#G2yd^㽫9'RIIIDEE5hPM9\{0a;ömۈɞ=Yn7n\7..㥗^x^zQk5,q&M.;Ce8Nn7/w~o&unݍGƼی7n=َ^=Gv$==صkFYcz2mS^^ƾ}.7akT[vƌ?~cǎQ[[kFݻ3mTS!m07+;00kƯ vѣ0b 9Dnn.yyy,\vM4ܧS\\{bi2 h;x7|\.6onnx_!C(..6TPP  ?]HLLd֬Y;nΘ1ïB][Anoj̘ߵq.^z%*++ٱc#|a7brΡL!RC1Bdㄚ3lٲ-[pQn7ۗcѱcԙ7tO_|;٦?v7Yfgew҅G}/,xÇs=aiiij*^޽зo}cĉٓkjjj [}=F.]5dgtbXر#={dF~Y#F 88-[ns"!!_^){dQVVF`` 4#G=R9"WOSD >(si#66mbѢE@CC=M4ѣ0uvn\*4FDN o߾ܢW^ߐfÆ N:|C}"ҢVZMN!WF]_L|g!vv4&~l6nT;RDҦ)"-`-Pcqk+l|n3""O!RDZH{LRSS>|8v#={q-_ѣG3nܸ3ETTGp50tP233/Z/YK**\TWWSZZ`ƌ$$$ց3`֭,['xٗE||ĉo~eeeGjj*F2z{&HDڞ<$&ve_SN?<ǏpDs-d^/W_}N:O3g֬YÖ-[}gf~͘1y'ԩ3˗/o>7hcǎgRRn8@yg?xNGӟs=LJ~9|>ر.]z うp?gx Yp!>A׻י6 ׳xb~Çض:nIDɞbłᠼ·\X=@mTT&L0zپ!dffvbccHKKcɒ%\IjsΌ;M6q0p` Пe˖i߿Y/p\t_߬~n exb0aY`G2<r!n(DHӧ/ׯ a+))0/_/⌏:ng|x^$1INN6!Cl2)))aͤ4 p>x9nx|Q>}MR5Gaa!n>J'!!8zfc  j6Ecн{wٽ{Yo߾sN^/7n8ȥI"rZC &3k:v"5u /f:6[3֝1cK,=VR0a]Z;vdw]j2p@C׮][nbb"SN7ߤѡC9¿/{1BBBN۾0̙Czz:.111̝;9"rAX|fDٟWOFho7ٳUVC6ު &3^緖aOlEi]⋤T|8))),YIrr2f2r9y%>>3f֧CDDD!r{8p={N'7ofժU2zh rau88NceAyy9qTUU#PPPK/Dnn.>Cv*֭csYp!>(ONfƍL2XֿǃzIVzn_Yc6ޏ;fذa8tЁ=zRTTl/.]mFqq1T\Qڼ'Çy뭷3gQQcjg}dffңGl6(zƤWXnZܿj+X,xjuHTTTEDDD.]$''ɶ.Hc̚5otEE3XbIyy9CQQQl߾XZ oNBBFcǎs]wѫW/`>SBBBkgL0`@*k׮W^x<6lA۷//fΝ 0M6xHNNt;v7X>cbccO'䈈zOEE[>eFܹsx cĉ Ƴt{̛7Jjj*ƍ ,,9s搞λKLL s%00Bdrr-[ÇIHH`ٚT#"""r,>ލU""""r~"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4{{7@DDD,;VP檿ۭ0qQyS6Ƿ賢v^YvH"""&}]n`q~j)"""m,o&\5"EDD䊑)pP"EDDD4 qq|:9\tZtv;]CnŌ[S.~~j67gp~vgW;ᬨE9&9I!RDDDܘAQpJW~YȯaX>V*=<6;#Sɭ"6*^_~+jNx 䅟tk)ң)ϯas<ȫEDDuGwǒ5EXԱ!0Q>~b;㢂-Y6şe|+y j<ȋE="""Y?+g= t_sr q<8+n\u%dse ѻk(~;Ҧ!*CE;ڹ%"""WGۃg}=o/x-;T0g8^H_shͅjCd}}=ڵ:>}:111deel2JKKcԩ$%%P[[˒%Kٻw6T&On?"""rq_4 3^`Ti#DZ'^^\r-gDsQ~gzpkvlC rӐh~>~5;qQ<^HqբC̙?p={taY,|>\._tq8yG(((ॗ^"77!vW_}u1~x{a,\G}#"""1q&ӳsߟXj[)O{ {=$*yqi.2ށoL˧9X)AyF(/ :&;;7ӟ> jAZn|>ǯjէ[رc !99h  @=)**6B. 1x'(,,l7TTT p ?)`'g\'cqon<+l|ON%ClԱcǎV˯v$''ɶ.X<|0os!*Ѱ3WbٰX,l6rkL9ݺ-Z~eb݀BBBp\Ȅ wE%֣Kzum6\-.Hc̚5otTT۷o7|>ˉbIyy9Fٙ=_TWWj{)EDDeyb  gE泳;믿]wE^Kii);wqF<ѭ:`@*k׮J6l@JJYkVVn򨭭?&66V=""""g{"ө`~˧Lȑ#3g0w\˄ Y=͛j%55q0܃e˖qa={&Ո%ϧ'*ךc<0]p86G|8[DDDD| """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bBfoWQ}b1c&?y=(D[^Jc˾}nηb'ڥ EDDRnU~M"EDDDژ=2p)Di """ruKΒB\,~:#-"""W4Ft0 g4ʬa!DqC(_ջ6kL<޼lf3S f fOmHgF/s """rEK|N|7 _bD]7c "$c/O|6ur 2^㭪!&hXwaMhk?+Zfss۲YY`|wI5L`;]CłS>a @YXϥB="""r2W'6%T^%0X:E:bs{5PO\t{7Fe`T,kp{'Jټs{.Lc/3 """rՊP ɟt-|t"ÉT 6SK/= """ruXs+K=QUI7 iQ!'5+׻KB\|>{w;ر5$JSp~;7ң5"""ri|cS?#O/=nyO>-/a?٧ڧH* 7NJ(-ww:vOő[o*lsf)DpUu" H1M!RDDDDLS"EDDD4H1M!RDDDDLS"EDDD4H1M!RDDDDLSӮhb-Zl/F˝ddl###ロk([bV=0tPjkkYd){fɓŲe(--%..St!ADDDDNCdFF|)))((hVt3f,7t,((˗S^^c=F}W_}u1vX*++Y`3g$%%M61|q0DDDD4233ѣ6 c{^cRn+k,l-jY,^ou r":\"EچP:vHuԹ[qa]YQQAFFcƌ1X,DFFR^^Nbb"mq;**۷|>Q[[Kuu5։r~DDDd?bHXvW|vω O?e|>ؽ;~0`@*k׮J6l@JJ };wzٸq#眻dvGmm-1X#"""r.jOd@@s>',,'ҧO&Lҥ1oڎQ#چ C9? """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)"""")Di """"bB)""""]!rѢ,Z!"""rٲ_;FFFw}7\sQŲe(--%..S@mm-K,e=l6RSS}ObZM~eYYYő QXr%999ՋLMFhhQرcOnll>&M멮fᤤdN'̚5ض_&77xf̘ABBB[+RDbYIDDbp8(//rN6Gc,_!?8F&@~~>&M'k׮,\fADDDJvQ'ָfjv|x<rj WnFcǎ!$$d6lG$:tGuKK.smQ\\Laa8l+Xv<2׋fb`ʽ^1tjY,^ou r:}V(.i&r8VpSUUM܊V+m"ؾ}Q^^NDDHILL0δVt#""r% Uچ@BBBkV@@6("ŋٹs' `ӦMx<#ڵkի 60hР3[]mOVnFLlli'ŝA\%ývmCpp0H\ $ 5D1gywaܹ0ax.}yaZIMMeܸqg\\Bdrr-[ÇIHH`X,<\+ŧ)"""-i6>:_&~ޮ5?ޮm.jOȕ>kEߧg?,}"-Q1{Oez'˾Eωܕ3 ?v۷HS """&lخV)>֮mHSюn%A!RDDDDLS{r9PioF\6Կ/y[bO<9ٔINd3^'Fst(eDLNYb(DkH0q'/ea­mw[֘J~=گwy*>qtb?Ҥl64p ޚ*_p%t8-98+TSǨ9^/UV`rROHl`ѓSWaX wAGNdvW{ځx9IR.Ii'vAoc :j$#o,FobՒ7p-_~]겳XÌm{w;?_hVKBHVTq~@ط56z$<9xK4lB]JSSm=GnJO":BHٲS?[Nl@`^>^X N@Pjz7 +IIt$"g)""҆,A!vʆ2߈~yz&aZpD,j3w؍>O@B,X,F}Ρv<6b eV, xJbwax*#"r8>;׃cs5zr(D\BFOzR{|g]- xJ w2c@|g<%@D䊡)"r۳}7U&iҽPJPZfkA@ A (8T+ʒLE2KAJ(dJKwӦL[ J..q$>8vJ\ EyV6EbPp0 { %GPz\< .GR{HaOB؀sL6d"TZLG[xNs\PK IDATxIM X  ˬGB['IB-[[xjk7 Jc7Qq'^qYcQ YOdBq$B|Yo~JW b-Z΢!- }"wҭ@M&>!gu-'y$B)v$%[D !I)B!,&IB!$B!brwB\7s+ONTCʩmX'H6O{{Dad敲`ά▐$R!A7";3YrLՄ̼2|M+9QBZ\!n-iBx?i9%L_t%MU'嚧9+'܃oקP\b$񴎵2ɷB[GH!-<וVhf>ˍ8j^Ĵ0i7 eH 蘒YL}m.DZUV[; ! [ë_x야1|nI|Ff" xoCT82zJ!e'r˖-l۶ҴCEII k֬?QTlْ}bgWfRR7n$''???Gƍ!*%,ؔJjVE8w39 苋^i7SݧBIH<[NH!֓H6Νp+w6oL^^.:ee̝-{K.t:/^̰ag߾},ZI&Vk{7wY%m2ddgiϞN{"<!RJIPѬ=67*'N}Ok˽=yFKj=gg*e''':t@||<]t!)) ???Zli.ۺu+gϞ׏3ӻwobbbk׎p֬YV%$$Çםoܹs裏R^{#6a ^ؐ}y1gW$1'%TwPte!~TJ<٣>ޮj6ͪo'J vefVپuj"l۶ '''>1LfVk^e Pτ HKKcΜ9;wcNܹsٳg=.\`ԨQyf,Y«B:, !asl^jx9Ύ*v\LlN2ff"Wdl_֓Ȉ<== ̙3Z OOOBBB0 (Q*P^^^Ryyyu.qtt$$$ooopIP22.7#5oN ӧ'O&==ӧOc4_Q~˶lk5UuŐIvnv%dN<`08|[k,vs1"*+QX'KJYW'={4'HLLI&T* h4TcggWRJu寉RTP(Mj5WD\ !lFS [5)j}ԝSX+J֮RpqqAQ?'Ph4 Z5 PpQM$x:wlh0L( #((dIB!8_w5 q󴲲F6_9.vpR2Y=_5?{#y<ŏЌ^(.vuoB!n=2K%F M+c_RaAcwBWd`ס\ Fkwfꨒg q$Ba<4rf68 / Bmh[R*k\߁έOV})^k6CH!6o؃tS_P(8h 쇽.--YŸ#<9rZ:F1GH)]swDH!6];c5@Vns$2# d؃P)9-ҋC*BZrWGJGNX+B)潿4o fHˍl;ü02`Hiq9澒{ߟ3fxkH.aw8 QI)ƜXv2q%UY{ Mۖ4g !BI)B!,&IB!$B!bD !BI)n~.?P̨ס_=/E˝qIy5pZX!"C[*wŢ mJyvN2C9Fu `4T*Ϟ":rʎœӨCiݾ6wA!W 5r*؎#OT)Sz>GWTVJir NSi]M(Hm.B$RRJoj{ήW.Sko~(;O8DvU !Ҝ-,W'hW|F#<]K!HMEz2&>׸Wia/^Gϫ)ڱڡ !I"EU~0yu8u%B!iξC|yoPi{CyJx*A{7mRR` &(ڼ `H~N\ ZLB!L;Df^)wϨiΎsm <ߌ܂2+|z=kgL?}qbB!eD!2JiOWuF<{)pw;+GK-ՖgUeu]Ú2rKk}~ZߦB;$w̼R~ړ)1 F 4w2=GxFVP.SD7\{-ʋYc^!)I"[{Q[CP7g'; 'IE6^VPVn/[m0뇳>4j1!=Iyxq`C!Ӳ=!לD ucY̛ߞ3V'ӉC dCBq$-eӾ,Fv|S Kɠ'?J}\+F,n ϧPr`nq{juB!n-I"Nv,ؔ;Bv$waV2c*B{PPTZ5~Y wʶK/R\h !yDJr3>>O~HŎ7S R8˒_R)?ғ5rԢ&2/Z-$JI"6"IecW{˚5tWk9"0# CCrF1^cZTJ_c(f ة<+>}pUs؅Pv"IVҹ'=ܐ>Oԅ"vqz:/1=z5 ) NW )B\T a%,ܒʉz Fkvd`VP߫bp_ Yy䗡ՕՕSX-B\h>Ni妫mJ!qR)|J[F3iyQ)I8Y?%[[Fgѳåg8\sC \!Du&R:CʹQM4冊ZŴ6⣥gL(54q |-Z5n| $`*"ev>~9 7`n'NFQWY 7 7B"IVj_ gi>g~"ci#BI""W'_ΆLVy[a.E%!JYL} FcH(9y@QcC9JJ)ʥCVZK((ٽ !MD a%*fnJRc[ÃtCc`T@*@716{"<'j;O #3iEǑj%_;l!;s1 ZN<椝'M"7X!InJuecWByGRn;{O|E<],'yۿY+_gF|t(I qn.m;o'F71{x =יy>K* !}I)jD_ѭ]jƮAnOODl{i\KK~Ih4s|2|GЭ*ncP` 5B Ii{%c Vz,%?G+mtq跬BrEr^p / ǐW`C@_+PDZ!KHqtkXu%Rk3jJs)ھ%;?nO').:{|٭[!I)n^-7aBpqFxzfפOA_*-1aJ]pBD q~ ˪-?`^W)5B C!BIMm #ַ穩m !#UV0x +GrsL&xl?[vqR1}]ʣַ-B볩$2))7qjfp* $No`?o{l_!e3}"u:/{L6h-ZDYYC)zk B+Ȥ$hٲ%:t`֭={__?f̘N޽AӮ];Yf ZZΝ;?>(ճ. !B j_+ <<<3O+,3aƌCLL 7of4i:={p{f4lؐ%K`B\Cyv޵gw+=>vc35(s^RIyyu.qtt$$$ooopIP2226oN ӧ'O&==3f.4X7 biWP-nZRwY5++uu(Ͽ8I?#u8HYXڶ9r~,q usj`1WZ3}1b'kQ9]=X~*9)R>uozl&`0Tf4QTW_TV*S(+ΫVqtt$??$ʿ0WSCTHN@gk@.T> rY;:A]9 'X9>\W_{ZPJ$9{,I"===IHH06LUjQ%%%z\]]gر7!Bۅ #''Çc4`0rCKJ:FJJ %%%lڴ ___F!:LM3O<6l`Ŋ0zh4 z} iƍINN^z9Bq "B!(LrKB!4g !BCHakd!BI" L8krSXz5[n];B؆xfΜi0N{=o׮]lܸ)S5y̯ W^UqyWi?H||hn?D!j}h8p;[60ヒEEEv͛ǔ)Sc۶moޜdzn:#HKKZh-[:tH[Mg„ @X111̛77|#|=8pZm>֭cѢx嗯{&ɍJbذa_:m?NV0L\x˗зo^ǵ޿ V>ѣFϽ=zH\zU!''^,ȬYӧOp^˥h4ŋqppk׮7_^uՓ$i]ӧORh߾=={`ժdppp`ȑ9uƍѱb:s ݛ]vYD* J%7vNVV:P)Qj֬SLAT*j׮_}ή}/@8::믿r݋ >:86nɓ^Na^Ӄnݺ2{lAׯ@|AF#!!M21gԫWǏڵkIII={_~ 2sL"##9v3b1lذh4nnjѭkJ $**Dcjܸq=zM6/q$''`f̘A^3ӻwobbbk׎p֬YV%$$Çkwɮ](..&88MAAӦMGaǎŀ?͛77'&խ'''?iӦO>nݺŋ^ت}ѭ[7bcc9rmڴTqlقf͚/J """HN>W۩Ξ=˺uٙ^zѺukfΜI``N>Etttײ2̙7Æ 3K 'X~=9994hЀAʹiׯXRFxFbkn:;Ԋ?Ļ˸qؽ{7III/2x`}Jmٲ$~M bcѲe+ڴiCNNgΜo憷7)))(J'''6T*IIIjKII {d2Q^}ڴiѣG̏|#FoOll,Plt!}Y^~eǮd2ΡChР`1U\\̒%Kxn&ᆱ械Pτ 3f 111l޼ccҤI{dgΝ5wy??͛WѯgΜe<۷*ۻTǒ-ԩSiӖm۲o_ՋN???zʕ+)*|e2HII!11w_]1|:tO=X|y ^xz<`0pB:tH%7@FF .GL:ƍtR -[СF&ciӺFb^ՌHqwwGRK`` 4osKypwfb9O?=f?>}:4lؐcǢV>nnnlRzz:'NFZ'NʉygT liV kIDAT"::\V)6_%h4 / 88I&U{aѭ[7< !33 H.;vh;ٹ}<W#ӭxbFՕhʶ2LT*{U0t錣#!!!x{{mڤI(߿~ӛ:u=4lhԨQ]m=>>W},^؂}N۶mٶmx{{Ws;vkda5C,d2a0۷/{onR*<'õiӆ˗EÆ h߾rɲe)//gԨQ7wYwww&O|BCC~#;;ȶ̙3b(//',,m$6d2bJ.^L3?Oҏ]uvuՃo1j 4Rq-Ŝ4FEExb ?@'Oٙ۷oŏݥZKT*,^VZ_ZMxx 0899UJ:vXVl޼SNa0U.$ial۶/=;vSN׌_Pkt:UOuȑ#o1V;v,;[nϏݻ_ NJR3=Zqs= 8R><<<'T*XnZv_LtI,Xϧ՝go$KZ-Ǐ筷ު2=$$777=Jjj͛ɮT5ݳ;{y(J̙seƌ`ҥ U:oӷ%Onn.zqqq_qwԉ;v@۶mqttz};mEdd[-[hUc?Q^^΋/=۶mܹ+2 k8p=¹so|"uSzFSO=h?`;7åDHnn.5KdWJ>k{QW$&&RPPʕ+Yryznn.={ro߾^w7EеkW~ZjuG$,^cǚ/x㍫.0o\~gsKBCC1cFs;-[cVIHH ++~P|BD0ɄNCTRZZJRRWQ'___yd _INN&==7|Yf=Wt-j]uVF#CYTXRRBAAƖ4mJEBBE}j u(Jg2od2zCܴnݚݻwNcǎ./ 994~Gٹs'/^`0/hhڴi:Dqq1qqqdff߇^DE]4m4JKK7T:ٳgoIL:uBs[шB]va4{䄇;#G~V%QQ$%#11|ʕȶ8q^KnЪK&F%~!C aÆ lڴMq]M)**j5#F`ٕv\{.]RXXHxxD>F:ub={ܤ}壢6wؑ;vpڵkIII=ɓ'-P*jՊ?GzӇ+V0m4ׯOHH.\Evvv5M6HN,mj׮Y̙3p+ ETFׯ˖-371#22^… )(( ((~fkg_͛7Ӵi3|||^99_Ɩٷo_G}>ԢUw_jnݺuVڶm{[y%AAAKtڕyT*nG``u.1h VXX4 S^=x 6mIJepuu0_he"Ȧ/$ ˱c=zu?o<"""lvh7P(۪F,_łB fϞMoGڷoOQQuux?z(%%%o߾"YT}ùD/bz=;͚^B܌2(((஻v84g P(xꩧǟhѢUMH8ēO>i˖-[ѣ~~~BCC9v,>Dfa a !DZl9))6lmE:Ҝ-B!,vB!I)B!,&IB!$B!bD !BI)B!,&IB!$B!bD !BI)B!,&IB!$B!bD !BI)B!,&IB!$B!bD !B?e,=FIENDB`lark-0.8.1/docs/comparison_runtime.png000066400000000000000000000603241361215331400200300ustar00rootroot00000000000000PNG  IHDRsBIT|d IDATxytOnŲm  Lf!3gqܙ͜2d!%a @ E%۲ekv%zu$$v~9駾U%=O.0dFFvr4yd)$bS'ʕ…IRKKK}{407iҤ{wW+:v$顇R߯Vy<͙3G7oVZZU_Z'NP4M2 CQk`b/~6tuuJ'e˖]q_L /a}^gϞգ>z%4uYs6m;b1?I*IzzܹSX4ԤzJ.kX_zoy=/*q-k t_ԤJ=x饗 ef=%F/ߟ8}}}ٳgu{i~;E"hooWyy|A͟?c%Ң'joov _x}k@hTO|=qĤРB-YDƍeYbP[[UjJ=É`#I[l?̫ꣽ]n[-ҴidY񸚛uj70QGUOO_:1dM>]x\pX[nUggرc***FeJJJ`h4SNɯ~++)##C/֔)S*//X,.DCi_~C^G/ҥKїiw߭7*(Kof݁TQQH$"ϧM6iMJKKS,SCCv=q\m`ںuz{{Ŵl2=cڰa,R$ݻU[[+It~) )HK.Q^^b/zB`:Np8}]FcH Ο?_~Y@PSGG$t\y̝;W?NѨ~Kx\ }u:ڸqLɓ'W_oaaNUՑeYzguyY"=r\D"wP(X,F͚5KgϞUMM,RFFPFF@e7о} tau]`YvޝzTZZ*0t+ELzNЂ FlӔ)SdYTVV8O>eY{9>}ZivD_c@50駟eYTOO%I;bŊx ӧOkÆ #6m|A+,Kuuu,KaܹsחեI&i FgϞ??Kg}Vuuu޽{y׿bY}7ސD@pm kooOX}ZFGX?XiӔ<РٳgKE01o'o,..VWWѨ8LM4I`PgϞMϗ4pl***d3$>tw,mH$p)((-ܢcǎ%F!͛'0zj?~\x\===տXLNȒDSccJJJs=y=sT0GpKׯŋU]]=TN>]V$~=z߯ēkۯXBK./eBOOO>ڲeK"qǝںuIvۣE/jedd{o[am߾]NSe)K͛xBϔ)StݛmT]]|xSNS_y6p @áuĉVUUiܹ4iկLMMMV^Z;vPcc5qڵD":qzzzzɶ[<ڻw*++5|\RW#--M>ۧǏUn.\+Wۧ3g'Ν;uY$HXB#8OT;vPCC<O 6 ^+Wަ@ ]wVssLĉuwKgZ[[#v˗@{х ۫t%e8xLNn}i޽x~L5khԩIeXc1'Ϣ6Ms}|>_bOG΁giet9axzx#RiiiO8N9Ģi`dt*%Fǰ:p8F3,%5>$%pvۺM$Q4'WbKepht [YlYV:ŸzX ~_B!]??Hk%;"xb1E"$KNS^w}Oz{{?{m; >fQayLCچ'.d0 c ױO|;tFõB-HHYY4𤖱$|$l6l!@$l!@$l!@$l!@$l!@$lqPS{ۯ\ȴ1vuС}W5H1>?$p3AU-LeaeUTTh׮]z'6oV_z# vErO!@שh̲%)6u5hkk3$[{{֭[sʲ,k풤u֍u {~D~[8{{{/k6m/멧?QN&@ ˲CZ裏~k͚5ڿHs$4M}:qN-X@@@{wT__t-]T˖-4,b.^Ls Ӳ,?^K.U~~~ . 4sL^Ziiin A2^YY*,,4p>kjjTUuZ@`so|CROOB͛7+??_~6m$I*//׎;o{qۗ4^h/_zUVV{ٳr=Zh^n6ܹSuuu:uj2$b6p iQ---/ԩSUSS&IҩSd/^͛H~DaW9sj֭2MSx\/#;Ԏ;t IR0أ#GhΜ9z衇UVV6jwqdUWW)ShUYYx<{N?4QAWWZ[[3??cǎ˶ f:˲ե^fgg+х vZ=C zWdY&L#G$Ayy FP(H$w}W^z)O*((Pnn|>߈ڔ%+i`3##Cmmm:D#M?N|>VXr?~UTT'OD~_YFGb wVooZZZܬG}T@@Z`8[nE4m4͞=[.Ks|0j~_z饗D4~x͟?DqSwzzzuVeffjҥ#vΙ3W]vf͚d1N۶maJKKӸqaeeezF?^r8Zn諾]%%%z7TSSɓJ?ZR͟?_~_'N/K}[4Dr rDpt$p{sCH… u!YFΝ3YIdYb:::*\.ugv?Oh),,c=Nm߾]>M<eYZn5gUTT KKKURR"+4ѡǏ0rڵ3gzdϡTWWM3f2MSCӦ5TZZ/pX?T[[ٳg_uNsĈeYW RP&VPPB*//oRw*;;{ļOcYR P__߰  i c=*0TXX3fh֬Y6##CCXLar]1?~<(;;[50??_W{Dd-Z'O7o<ψ\tI x=mn*ϧ<&MWkv)m۶MO=԰ѫNeggɲ,(??_dZZ/өx<`0H$"0ᠻ[)I >Ys0T4az˵}}} ɑ4p4XL# @ I#ښD}Cޠ^ns\jedd\1vWǴqFbamш/_P(^)ө y<EQuvv*//_c$QggH罧Gx\NSyO@שLdv~zq\WeYxO5pv:# ^obh5Oo1˵u8rd}.'\O;NK#>ޮS*C]q>o[Qkx<#£tDS~yfN+nii /B-_n s84ߞW٫Hx9?z=;vT&Lu%l`/.= IDAT:7-EcW#u; u;Yկ6\) ϱ9\<ڹSOOP@G>$$l6> [ [ [ [ [ [\MtRJ,uIak .c$# á{Gڶm$Iϟ׮]|rmڴI`PS=%` %5@3ghѢ-ә Рٳg+ EsΕ$͙3G֬Y,8p@'NձcTUUx@:y.\pm9$|\zMsڻwN>X,&álmذA Bz*4xtkƌo.!өnM%%%*I)IPHb2 C@`HaWWLӔU~~N,RGGG=Gyyyr,S$,lB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HB-HJuSүھm?'Ғٙ)͎ l9qG?}N͝:|=(e5zQ{;S]ںZ6;KӋ}.H`%5TqS"@[z.!!R]M l .)6&s U^^V͟?_sΕ$YSNZhTZl~X@q]BKzlooכol7Nىj|T4]wݕ2udVRAG?VY64EcޡYgIeYZlxjzrz7n:9UTTٳb*((ҥKtNZ_3u}ثPL++3AyBuc:\ٝŠi qS}JJJta)STaA'N7~ƌR__vءR\.UVV*kڵ,\C.o#gzsoZ;#}04>Ц#q9t,E%5@B}R$UqqB#߯Yf؈dZZ,R(ROOu=XيDݖiHf*>6i\L h.=rW#vPIov4_)|N\:Tu7(jkrKJ?I pXeiR͘1ChT:vf͚%I4 a7n-Z>HVAA,Xpm Opmٙ8'!^M,HӷdjO 㕝R̔vLF@ $5@|>\\.\.ŋ@ D^!/˲|2Auww֖-[FݖPnnn2WhtuiNuGUv]!gy~I;$I}('ӭeiDW Pnng|FISLաCt]w)4i$i:|UXX48qB/^Ԇ xt%UWW_v[\[a\uۮޘ$KeZRsGD]Qe{v*PE4eu<ѫcHm|fΜ)ۭv9jܸql[N.]R,SFFO.I0aʲ,͚5KӦMOvyjaiV͔Ї;tGqѳAmXd_9eSw6y;|RzS ge(7íP$^5qLwe܊[Z*?#Ӽx1[k&ZG kωWVwƴ2Hu `  `  `  `  `  `  `  `  `  `  `  `+n^ bsʶ)>p"@Rb׉NAՐR[wD_\UϼY=)EyIi-98y>qP]sX]TP0TW"@$la lnZv$e5k_ }z)%7 $p{iOMב)™ ב -$PՐp jj#ey*i 7l{b֎SԉN\Sީų3?" #"Ӳwi$oy%Ia \K@nqI=#m~^ٜҀpyDM8pɺ'u}:ޗʀ=cwk$dHynM,𪶹?ե7%F ׽P8Pa\jj_EgKnJ@{n^~5kj goէJ$]-Wl/5\# [ [ [ [ [ [ [ [\c#Gd-[x… jllT,SVVfϞ-;e ,@^xQ۷oNVرCYYr:w|*I6&O~חxɓ/߯{jǎZl Ѕ  4Yf1: pIz,K$͛7ONWSSiӦ)##C` B|\3g(kҥN$IvP$Q8O0M=gߨSKgD![r~nYΨҒ9U C1 R{_|^ҴrA~UjhɬLyt5%:֙ʲŸnJ3'hf2NuGr: ު^} :$Jn[midz@s6շ+1utP}XZ `(F TMI}:3-5E0$wѳA-)bÒ:$J_8s kraVu5,Iz}woܖ~S+K.j T>:ԯ?38ѿzQ}qI҉^ި[g6SךBpsvãmd캆E9 [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ŕܬH$"ө&okkSggLӔSQQ\1tA]xQXLM6)==]`Pz{{eY$iʕ3gN2JjО={|EӧPKK~hٺ[tIk_222{wٳe7h4[oU3gΔ$+++KHDtY͙3G999˗S$Q,SYYٰ5kh֬Y,SRk$˲TSSX,iӦIzzzhdYzzzTWWrmٲE_W5qDUWW'<$\,K jժ"43 CǕ|;wN3g(>^7ufA?`02n.K>/9}%Ohjj{ァ'[oKz%)0 y^hƍ:}***xxb 44gڦ&7so47{xKHPSS_˸!|>M81)}%ҥK*++S~~6n(HWTTFE"n+==] ڲeL֭[UVVvi&MܐR]@¸"MqnTNgXː$hҤTqIj[o`0e˖ܹ$ӡiӦ[nѫr9rD-R[[ٓX-IݖPZZZ2ygp vsR [HD~_vJiӦiZvN2FDYb̕s\q"V0OY >Cry0Eٲ.ytI%/ 2?Җ%Yfwx\GmxV2-Jžj.gjZH.Ţ2|~9 '\Jq7+eIu9 @CTSVôb|m5m⍵ٿ(pQr*ѿ{QΩ9[%+Q_l$Rsے?<+ԧ+ʳ`ٕd|)Gf2-ynY~0:KKu)b60R%|CH ꏘm/eEOϨ__3'O]B}xkTo/l +Od_(殉Se?!Ûd^ז̔tq#8<j錪!飾-Y,/ɑ-c^eЅW Sp(P+ga4-,?ګ#2ۚgX}MJ͕#cBGA_FeG$G@5#E+-~Yu>O.cdvIV2rdSk ܀7Jg+dv-4j=rfg(P/U_oݽalkVU?kiqST|$gfzƯeEKVޝGQv Y%! p,(:?qT*wƙߝ:e^Ȫ0l,I4$ Q%A@׺ĴI $ytW~NW[9eA cc!EZ h |UyLsGp9ӎ}7\Ac0>'81+\$ v_`09eevw }TۨC08n*ov\3!{N,^|M \mg-v(u- CؘЄE[z o⭮B?`H݅n盯NptW?goy$&b=1Сt8t2( Dikz̠W1߾}T׶BGܟV4F2b4h F"n)>铸0)3p8VlOCx5fqaC^u?*F][wlEuK ˺qS~y?4P<ECh8'8qa?|!С]l !S,IZ D:0E0uD s2] 3%qiSgwlBw L4HT(Jݭϐ0Ptd'Ռ&,_U%hc6qTXw୩7ڮq0mboL_aE!bރD{0щ[>/К4>-b@@FTRfE׳_٭x|5% ڸnbgټ4cңU4ޟ5!!}"(;k]?}k7ހmekRUl;6:n*JL. 'DTdjWeǰ; t]eZJ>0/00ݷ U?O8 ~ a۾}b&$j׽ezY4I]"CO:T@sѥM@ ?yD5\[A}o4cGT tt\BMx$O6VKwf$6#NAB3ALY.AN's ʂ$/Οե+}Xwಾ(.uUH4?N!Fڮ&R9n̼F@ 4w!!7^Y|"AI ҥk٬x+9rLK>B!LQ3@:oY ڮ i l8B<' .y 2$a 't] ]2G {%a !U:ʗ=u&}Gϔܘ70|,cI).!}St=a~3žw a !嫵6}2ژx|+/{U"ݴR׃t:S_5{r1dWι6?hY{'G$RhV}u5:אl *_9} u^|WWs|6+=9]?5Qӯܹ)+Bf٣ڭh?4h u`% dq}ܾ/A z5P=7zu4ǡ'Y<hOĝ_vB!D =w4zNjoToumC *@vZj8yuctW!m k?3,N²Do8B`b۝CQ]@T$lg on5:Y=4B!DӁF9 1}ӽ@xU[$F@gXxKwBE!:/仟UG[0%0A]!dv;ؽdQbˣBKg$-B|5U]Pj- |W2I hP55Rmn:`Q!*LNUx:[2#P Dacwє$B$ T0kJ^|t崊3c8´T[4-:wlĚ@uV#u7ƑA+ qD%a,|n=OPh (2\GX#tZ E,Iv,vFrX SGi g*JW;*mGCwPC:?@& d)r8bwxtUƃ3 |jgt|@""H@yŵ|RS))s~Yn$L΂/(./""N92JJ!D 3рhP4% :R7UFHN|gDէr䄙f7CD2|!B\$ ,;bC=i#<CG;1X°G3 p_XB!Qr ;V-/ǫ?> Mb8u36ʖRL":ROEg}͉R{B!E@:\. EQ͆j?={|F >g%KE&$,p1!]tjrعs'Ÿnbcc֭[gqQ-T5MQDG}"ot-B\:ѣ|̟?%K?#77AϞ=1cx^oΑ#Gz}$&&E||p~B!uz)B!l3Ek$2QBqe2my7 {<JJJ.\ii)O%&"0l6[Pv}:uT6Z;v,Z>9s'*uK/Ă :QV˟g{1Fcnt9}!6nHhh(X&Lh4{l2%%%%ܽ`[r%zbBBBD%,,Ç3eʔ6Ong߾}[UUEUUFۃsNp.ߒѰ\ǃF~n]VKVV111.dY-[Q0 (N޽jvIOO$ZA5JAqqq,] :xk׮eٲeDEE(P(=~xt>tx^FhwwO?ġIOKkr<~8dff6yoر;aG}M?{l222pn:۷}+XKFqct|ԨQ-6\j<ÅhZ #nP8~aUVVr4 C !++ V[Rnnʙ3g'p v;YYYڵI&i'h4(BXXcƌ?h N<3fW_q&e5իWr\5v( у/h42aFbwGuشi|-!993gb0ͥ7pz9rǎ`0psuu]/t׏o9Ν;IIIȑ#ĉ<#w^dxyɧ0ػw/ԩS4h|+ЫW/:l껦{l2EOjԪR2nXm2"""O9x ĭJ>} z $$LƏ(dgg¤IXz5:u ;w.|>kx1F#DGGsk6|-G!;;$kݿ%??m۶pBwx.ܻw/x<1ͬZe˖cժUp ĝm=H|L&-[6Ϳnxx8K,_޽ҥxbJKKk3KKK6m?8Y}|rnFl2lذ#<ŋeӦMڵ+K.eʔ)lݺI_֔ɻM7MWz#FqFONZZ$''3wܠNJmm-Oݻ>ϿnMM k.ϟϊ+HJJoz +6m"))+Vpwk.***9sdcXV<3gغu+YYY,Yxjkk;p|L&ǿ>|8gÆ ?t_yyyXV݇k СCY|QTTDvv6ͿvWp);|0?)=m۶ѭ[7ρ}l63dȐ6}%Ehh(np?h`~y衇}xNmmZ9v<F!Cp[]K/Fp}݇^Q֊ӯW^ 4uTRSSz,\]p80`>o1?#,, ٳgIHH`…-0PLMMɓ/@ݓ~mGyԟr-ZRØ1cHKKwl6cp:-%Z0**޽Ç'%%K^ |A^u+ \-7@~իEEE :Կ޹sxg0q Dӱh"L8N $yi[ bPYYI||<>h u٧OΝ;@QQ>f>LWÇ>|8P(p768k,VZŧ`ܸǖ-[( kֳ+Q\\F7|C^^ͣo>Mʬ |q6??BqOC1rH"""PU#Ganv FAA'O ?HN';v젲UUI~~ޏ|>{9?0&>|0 v%:GMMM귻8ζӐNc…۷O>h?Dg2o}꒎m۶1ydJJJ3g  )..nl ~;{7D1l0&Lp֍N ꔖ~lvbM̚5K5>%%%Mn_kںon|>ٷodtFF3glRFfff} /%z޽O>Z}aÆGQQQѣG[]No2a̙\l֓ g~z-ZSO]R,Wߢu;Bnn.&MjSycƌaϞ=呙IXXX2aق;,Z^zzy'[\>_4i555ܹoG۔]54MyϬV+> CvZ~_T|5$3C aÆ н{V~ʼn'xכ]RYYɴiӘ1cOW_%11:.]pI^/>`0+_S$ IKKORRdǎu]:JҖ}ɓwHHHhɓ;-z]v%==q'ݛݻwSYY3p->p8رc]xNXO(|ᇗ4Տcq{<:uϳtR~M2Ç |MonNA^ĉٵknnݺaq8ncZVnGQn7UUUΖaΝlݺş<Ft j5j۷oiF222]P\n9{[UUŚ5k8y$.NwUCBB4hlv8EtrY͵Xʺu?Y? -222o)(( \tRPPٳ.\ȗ_~Iuuu͘16n6Ŗ 7@AAmNWm70l0222xwp8n\|9r$| O`Z)**bʡC0`@Yv =4z7 gAN6Krr2))) i^ܹsyWHII=EQ aժU^?~^vʭf]kѣ&L`ݺuz}9гgOO͛ZL&Νۦ6\ :F dKeff~ߒ  >V+@t,XlCHHǏϡpz܈cرIzz:zL/رcDDD4IGL4 ш=˜5kk׮mTFM65ĢEߣGRSSٷo~{{}ݠsŅ_ 3f_f׮r-s/!Ǒ#Gt:l6@VV١Ce˖P^4j?FB\Q:DAA .hˊc͚5 :QFuRB4*EEEٳŋ t8*N?/n駟fҥMn !:baʕX⊻r [#GxZH#G9rd'D&D|>ٳQh6Yf eeel6rrr V[[IMMGR3Fw/_|[o"x׿xFp.***tz-{y*++Sh^zFr [!B@ !B6R!B$B!M$B!m" B!hI B!DH)B!DH!BygIDAT&@ !B6R!B$B!M$B!m" B!hI B!DH)B!DH!B&@ !B6R!B+W4rIENDB`lark-0.8.1/docs/features.md000066400000000000000000000025251361215331400155440ustar00rootroot00000000000000# Main Features - Earley parser, capable of parsing any context-free grammar - Implements SPPF, for efficient parsing and storing of ambiguous grammars. - LALR(1) parser, limited in power of expression, but very efficient in space and performance (O(n)). - Implements a parse-aware lexer that provides a better power of expression than traditional LALR implementations (such as ply). - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md)) - Builds a parse-tree (AST) automagically based on the grammar - Stand-alone parser generator - create a small independent parser to embed in your project. - Automatic line & column tracking - Automatic terminal collision resolution - Standard library of terminals (strings, numbers, names, etc.) - Unicode fully supported - Extensive test suite - Python 2 & Python 3 compatible - Pure-Python implementation [Read more about the parsers](parsers.md) # Extra features - Import rules and tokens from other Lark grammars, for code reuse and modularity. - Import grammars from Nearley.js - CYK parser ### Experimental features - Automatic reconstruction of input from parse-tree (see examples) ### Planned features (not implemented yet) - Generate code in other languages than Python - Grammar composition - LALR(k) parser - Full regexp-collision support using NFAs lark-0.8.1/docs/grammar.md000066400000000000000000000160071361215331400153540ustar00rootroot00000000000000# Grammar Reference Table of contents: 1. [Definitions](#defs) 1. [Terminals](#terms) 1. [Rules](#rules) 1. [Directives](#dirs) ## Definitions **A grammar** is a list of rules and terminals, that together define a language. Terminals define the alphabet of the language, while rules define its structure. In Lark, a terminal may be a string, a regular expression, or a concatenation of these and other terminals. Each rule is a list of terminals and rules, whose location and nesting define the structure of the resulting parse-tree. A **parsing algorithm** is an algorithm that takes a grammar definition and a sequence of symbols (members of the alphabet), and matches the entirety of the sequence by searching for a structure that is allowed by the grammar. ## General Syntax and notes Grammars in Lark are based on [EBNF](https://en.wikipedia.org/wiki/Extended_Backus–Naur_form) syntax, with several enhancements. Lark grammars are composed of a list of definitions and directives, each on its own line. A definition is either a named rule, or a named terminal. **Comments** start with `//` and last to the end of the line (C++ style) Lark begins the parse with the rule 'start', unless specified otherwise in the options. Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner). ## Terminals Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals. **Syntax:** ```html [. ] : ``` Terminal names must be uppercase. Literals can be one of: * `"string"` * `/regular expression+/` * `"case-insensitive string"i` * `/re with flags/imulx` * Literal range: `"a".."z"`, `"1".."9"`, etc. Terminals also support grammar operators, such as `|`, `+`, `*` and `?`. Terminals are a linear construct, and therefor may not contain themselves (recursion isn't allowed). ### Priority Terminals can be assigned priority only when using a lexer (future versions may support Earley's dynamic lexing). Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default). #### Notes for when using a lexer: When using a lexer (standard or contextual), it is the grammar-author's responsibility to make sure the literals don't collide, or that if they do, they are matched in the desired order. Literals are matched in an order according to the following criteria: 1. Highest priority first (priority is specified as: TERM.number: ...) 2. Length of match (for regexps, the longest theoretical match is used) 3. Length of literal / pattern definition 4. Name **Examples:** ```perl IF: "if" INTEGER : /[0-9]+/ INTEGER2 : ("0".."9")+ //# Same as INTEGER DECIMAL.2: INTEGER? "." INTEGER //# Will be matched before INTEGER WHITESPACE: (" " | /\t/ )+ SQL_SELECT: "select"i ``` ### Regular expressions & Ambiguity Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions. For example, in the following grammar, `A1` and `A2`, are equivalent: ```perl A1: "a" | "b" A2: /a|b/ ``` This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley. For example, for this grammar: ```perl start : (A | B)+ A : "a" | "ab" B : "b" ``` We get this behavior: ```bash >>> p.parse("ab") Tree(start, [Token(A, 'a'), Token(B, 'b')]) ``` This is happening because Python's regex engine always returns the first matching option. If you find yourself in this situation, the recommended solution is to use rules instead. Example: ```python >>> p = Lark("""start: (a | b)+ ... !a: "a" | "ab" ... !b: "b" ... """, ambiguity="explicit") >>> print(p.parse("ab").pretty()) _ambig start a ab start a a b b ``` ## Rules **Syntax:** ```html : [-> ] | ... ``` Names of rules and aliases are always in lowercase. Rule definitions can be extended to the next line by using the OR operator (signified by a pipe: `|` ). An alias is a name for the specific rule alternative. It affects tree construction. Each item is one of: * `rule` * `TERMINAL` * `"string literal"` or `/regexp literal/` * `(item item ..)` - Group items * `[item item ..]` - Maybe. Same as `(item item ..)?`, but generates `None` if there is no match * `item?` - Zero or one instances of item ("maybe") * `item*` - Zero or more instances of item * `item+` - One or more instances of item * `item ~ n` - Exactly *n* instances of item * `item ~ n..m` - Between *n* to *m* instances of item (not recommended for wide ranges, due to performance issues) **Examples:** ```perl hello_world: "hello" "world" mul: (mul "*")? number //# Left-recursion is allowed and encouraged! expr: expr operator expr | value //# Multi-line, belongs to expr four_words: word ~ 4 ``` ### Priority Rules can be assigned priority only when using Earley (future versions may support LALR as well). Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default). ## Directives ### %ignore All occurrences of the terminal will be ignored, and won't be part of the parse. Using the `%ignore` directive results in a cleaner grammar. It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extraneous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. **Syntax:** ```html %ignore ``` **Examples:** ```perl %ignore " " COMMENT: "#" /[^\n]/* %ignore COMMENT ``` ### %import Allows to import terminals and rules from lark grammars. When importing rules, all their dependencies will be imported into a namespace, to avoid collisions. It's not possible to override their dependencies (e.g. like you would when inheriting a class). **Syntax:** ```html %import . %import . %import . -> %import . -> %import ( ) ``` If the module path is absolute, Lark will attempt to load it from the built-in directory (currently, only `common.lark` is available). If the module path is relative, such as `.path.to.file`, Lark will attempt to load it from the current working directory. Grammars must have the `.lark` extension. The rule or terminal can be imported under an other name with the `->` syntax. **Example:** ```perl %import common.NUMBER %import .terminals_file (A B C) %import .rules_file.rulea -> ruleb ``` Note that `%ignore` directives cannot be imported. Imported rules will abide by the `%ignore` directives declared in the main grammar. ### %declare Declare a terminal without defining it. Useful for plugins. lark-0.8.1/docs/how_to_develop.md000066400000000000000000000027451361215331400167470ustar00rootroot00000000000000# How to develop Lark - Guide There are many ways you can help the project: * Help solve issues * Improve the documentation * Write new grammars for Lark's library * Write a blog post introducing Lark to your audience * Port Lark to another language * Help me with code development If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process. ## Unit Tests Lark comes with an extensive set of tests. Many of the tests will run several times, once for each parser configuration. To run the tests, just go to the lark project root, and run the command: ```bash python -m tests ``` or ```bash pypy -m tests ``` For a list of supported interpreters, you can consult the `tox.ini` file. You can also run a single unittest using its class and method name, for example: ```bash ## test_package test_class_name.test_function_name python -m tests TestLalrStandard.test_lexer_error_recovering ``` ### tox To run all Unit Tests with tox, install tox and Python 2.7 up to the latest python interpreter supported (consult the file tox.ini). Then, run the command `tox` on the root of this project (where the main setup.py file is on). And, for example, if you would like to only run the Unit Tests for Python version 2.7, you can run the command `tox -e py27` ### pytest You can also run the tests using pytest: ```bash pytest tests ``` ### Using setup.py Another way to run the tests is using setup.py: ```bash python setup.py test ``` lark-0.8.1/docs/how_to_use.md000066400000000000000000000036261361215331400161040ustar00rootroot00000000000000# How To Use Lark - Guide ## Work process This is the recommended process for working with Lark: 1. Collect or create input samples, that demonstrate key features or behaviors in the language you're trying to parse. 2. Write a grammar. Try to aim for a structure that is intuitive, and in a way that imitates how you would explain your language to a fellow human. 3. Try your grammar in Lark against each input sample. Make sure the resulting parse-trees make sense. 4. Use Lark's grammar features to [shape the tree](tree_construction.md): Get rid of superfluous rules by inlining them, and use aliases when specific cases need clarification. - You can perform steps 1-4 repeatedly, gradually growing your grammar to include more sentences. 5. Create a transformer to evaluate the parse-tree into a structure you'll be comfortable to work with. This may include evaluating literals, merging branches, or even converting the entire tree into your own set of AST classes. Of course, some specific use-cases may deviate from this process. Feel free to suggest these cases, and I'll add them to this page. ## Getting started Browse the [Examples](https://github.com/lark-parser/lark/tree/master/examples) to find a template that suits your purposes. Read the tutorials to get a better understanding of how everything works. (links in the [main page](/)) Use the [Cheatsheet (PDF)](lark_cheatsheet.pdf) for quick reference. Use the reference pages for more in-depth explanations. (links in the [main page](/)] ## LALR usage By default Lark silently resolves Shift/Reduce conflicts as Shift. To enable warnings pass `debug=True`. To get the messages printed you have to configure `logging` framework beforehand. For example: ```python from lark import Lark import logging logging.basicConfig(level=logging.DEBUG) collision_grammar = ''' start: as as as: a* a: "a" ''' p = Lark(collision_grammar, parser='lalr', debug=True) ``` lark-0.8.1/docs/index.md000066400000000000000000000033261361215331400150350ustar00rootroot00000000000000# Lark A modern parsing library for Python ## Overview Lark can parse any context-free grammar. Lark provides: - Advanced grammar language, based on EBNF - Three parsing algorithms to choose from: Earley, LALR(1) and CYK - Automatic tree construction, inferred from your grammar - Fast unicode lexer with regexp support, and automatic line-counting Lark's code is hosted on Github: [https://github.com/lark-parser/lark](https://github.com/lark-parser/lark) ### Install ```bash $ pip install lark-parser ``` #### Syntax Highlighting - [Sublime Text & TextMate](https://github.com/lark-parser/lark_syntax) - [Visual Studio Code](https://github.com/lark-parser/vscode-lark) (Or install through the vscode plugin system) ----- ## Documentation Index * [Philosophy & Design Choices](philosophy.md) * [Full List of Features](features.md) * [Examples](https://github.com/lark-parser/lark/tree/master/examples) * Tutorials * [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter * [How to write a JSON parser](json_tutorial.md) - Teaches you how to use Lark * Unofficial * [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3 * Guides * [How to use Lark](how_to_use.md) * [How to develop Lark](how_to_develop.md) * Reference * [Grammar](grammar.md) * [Tree Construction](tree_construction.md) * [Visitors & Transformers](visitors.md) * [Classes](classes.md) * [Cheatsheet (PDF)](lark_cheatsheet.pdf) * Discussion * [Gitter](https://gitter.im/lark-parser/Lobby) * [Forum (Google Groups)](https://groups.google.com/forum/#!forum/lark-parser) lark-0.8.1/docs/json_tutorial.md000066400000000000000000000351371361215331400166270ustar00rootroot00000000000000# Lark Tutorial - JSON parser Lark is a parser - a program that accepts a grammar and text, and produces a structured tree that represents that text. In this tutorial we will write a JSON parser in Lark, and explore Lark's various features in the process. It has 5 parts. 1. Writing the grammar 2. Creating the parser 3. Shaping the tree 4. Evaluating the tree 5. Optimizing Knowledge assumed: - Using Python - A basic understanding of how to use regular expressions ## Part 1 - The Grammar Lark accepts its grammars in a format called [EBNF](https://www.wikiwand.com/en/Extended_Backus%E2%80%93Naur_form). It basically looks like this: rule_name : list of rules and TERMINALS to match | another possible list of items | etc. TERMINAL: "some text to match" (*a terminal is a string or a regular expression*) The parser will try to match each rule (left-part) by matching its items (right-part) sequentially, trying each alternative (In practice, the parser is predictive so we don't have to try every alternative). How to structure those rules is beyond the scope of this tutorial, but often it's enough to follow one's intuition. In the case of JSON, the structure is simple: A json document is either a list, or a dictionary, or a string/number/etc. The dictionaries and lists are recursive, and contain other json documents (or "values"). Let's write this structure in EBNF form: value: dict | list | STRING | NUMBER | "true" | "false" | "null" list : "[" [value ("," value)*] "]" dict : "{" [pair ("," pair)*] "}" pair : STRING ":" value A quick explanation of the syntax: - Parenthesis let us group rules together. - rule\* means *any amount*. That means, zero or more instances of that rule. - [rule] means *optional*. That means zero or one instance of that rule. Lark also supports the rule+ operator, meaning one or more instances. It also supports the rule? operator which is another way to say *optional*. Of course, we still haven't defined "STRING" and "NUMBER". Luckily, both these literals are already defined in Lark's common library: %import common.ESCAPED_STRING -> STRING %import common.SIGNED_NUMBER -> NUMBER The arrow (->) renames the terminals. But that only adds obscurity in this case, so going forward we'll just use their original names. We'll also take care of the white-space, which is part of the text. %import common.WS %ignore WS We tell our parser to ignore whitespace. Otherwise, we'd have to fill our grammar with WS terminals. By the way, if you're curious what these terminals signify, they are roughly equivalent to this: NUMBER : /-?\d+(\.\d+)?([eE][+-]?\d+)?/ STRING : /".*?(?>> text = '{"key": ["item0", "item1", 3.14]}' >>> json_parser.parse(text) Tree(value, [Tree(dict, [Tree(pair, [Token(STRING, "key"), Tree(value, [Tree(list, [Tree(value, [Token(STRING, "item0")]), Tree(value, [Token(STRING, "item1")]), Tree(value, [Token(NUMBER, 3.14)])])])])])]) >>> print( _.pretty() ) value dict pair "key" value list value "item0" value "item1" value 3.14 ``` As promised, Lark automagically creates a tree that represents the parsed text. But something is suspiciously missing from the tree. Where are the curly braces, the commas and all the other punctuation literals? Lark automatically filters out literals from the tree, based on the following criteria: - Filter out string literals without a name, or with a name that starts with an underscore. - Keep regexps, even unnamed ones, unless their name starts with an underscore. Unfortunately, this means that it will also filter out literals like "true" and "false", and we will lose that information. The next section, "Shaping the tree" deals with this issue, and others. ## Part 3 - Shaping the Tree We now have a parser that can create a parse tree (or: AST), but the tree has some issues: 1. "true", "false" and "null" are filtered out (test it out yourself!) 2. Is has useless branches, like *value*, that clutter-up our view. I'll present the solution, and then explain it: ?value: dict | list | string | SIGNED_NUMBER -> number | "true" -> true | "false" -> false | "null" -> null ... string : ESCAPED_STRING 1. Those little arrows signify *aliases*. An alias is a name for a specific part of the rule. In this case, we will name the *true/false/null* matches, and this way we won't lose the information. We also alias *SIGNED_NUMBER* to mark it for later processing. 2. The question-mark prefixing *value* ("?value") tells the tree-builder to inline this branch if it has only one member. In this case, *value* will always have only one member, and will always be inlined. 3. We turned the *ESCAPED_STRING* terminal into a rule. This way it will appear in the tree as a branch. This is equivalent to aliasing (like we did for the number), but now *string* can also be used elsewhere in the grammar (namely, in the *pair* rule). Here is the new grammar: ```python from lark import Lark json_parser = Lark(r""" ?value: dict | list | string | SIGNED_NUMBER -> number | "true" -> true | "false" -> false | "null" -> null list : "[" [value ("," value)*] "]" dict : "{" [pair ("," pair)*] "}" pair : string ":" value string : ESCAPED_STRING %import common.ESCAPED_STRING %import common.SIGNED_NUMBER %import common.WS %ignore WS """, start='value') ``` And let's test it out: ```python >>> text = '{"key": ["item0", "item1", 3.14, true]}' >>> print( json_parser.parse(text).pretty() ) dict pair string "key" list string "item0" string "item1" number 3.14 true ``` Ah! That is much much nicer. ## Part 4 - Evaluating the tree It's nice to have a tree, but what we really want is a JSON object. The way to do it is to evaluate the tree, using a Transformer. A transformer is a class with methods corresponding to branch names. For each branch, the appropriate method will be called with the children of the branch as its argument, and its return value will replace the branch in the tree. So let's write a partial transformer, that handles lists and dictionaries: ```python from lark import Transformer class MyTransformer(Transformer): def list(self, items): return list(items) def pair(self, key_value): k, v = key_value return k, v def dict(self, items): return dict(items) ``` And when we run it, we get this: ```python >>> tree = json_parser.parse(text) >>> MyTransformer().transform(tree) {Tree(string, [Token(ANONRE_1, "key")]): [Tree(string, [Token(ANONRE_1, "item0")]), Tree(string, [Token(ANONRE_1, "item1")]), Tree(number, [Token(ANONRE_0, 3.14)]), Tree(true, [])]} ``` This is pretty close. Let's write a full transformer that can handle the terminals too. Also, our definitions of list and dict are a bit verbose. We can do better: ```python from lark import Transformer class TreeToJson(Transformer): def string(self, s): (s,) = s return s[1:-1] def number(self, n): (n,) = n return float(n) list = list pair = tuple dict = dict null = lambda self, _: None true = lambda self, _: True false = lambda self, _: False ``` And when we run it: ```python >>> tree = json_parser.parse(text) >>> TreeToJson().transform(tree) {u'key': [u'item0', u'item1', 3.14, True]} ``` Magic! ## Part 5 - Optimizing ### Step 1 - Benchmark By now, we have a fully working JSON parser, that can accept a string of JSON, and return its Pythonic representation. But how fast is it? Now, of course there are JSON libraries for Python written in C, and we can never compete with them. But since this is applicable to any parser you would write in Lark, let's see how far we can take this. The first step for optimizing is to have a benchmark. For this benchmark I'm going to take data from [json-generator.com/](http://www.json-generator.com/). I took their default suggestion and changed it to 5000 objects. The result is a 6.6MB sparse JSON file. Our first program is going to be just a concatenation of everything we've done so far: ```python import sys from lark import Lark, Transformer json_grammar = r""" ?value: dict | list | string | SIGNED_NUMBER -> number | "true" -> true | "false" -> false | "null" -> null list : "[" [value ("," value)*] "]" dict : "{" [pair ("," pair)*] "}" pair : string ":" value string : ESCAPED_STRING %import common.ESCAPED_STRING %import common.SIGNED_NUMBER %import common.WS %ignore WS """ class TreeToJson(Transformer): def string(self, s): (s,) = s return s[1:-1] def number(self, n): (n,) = n return float(n) list = list pair = tuple dict = dict null = lambda self, _: None true = lambda self, _: True false = lambda self, _: False json_parser = Lark(json_grammar, start='value', lexer='standard') if __name__ == '__main__': with open(sys.argv[1]) as f: tree = json_parser.parse(f.read()) print(TreeToJson().transform(tree)) ``` We run it and get this: $ time python tutorial_json.py json_data > /dev/null real 0m36.257s user 0m34.735s sys 0m1.361s That's unsatisfactory time for a 6MB file. Maybe if we were parsing configuration or a small DSL, but we're trying to handle large amount of data here. Well, turns out there's quite a bit we can do about it! ### Step 2 - LALR(1) So far we've been using the Earley algorithm, which is the default in Lark. Earley is powerful but slow. But it just so happens that our grammar is LR-compatible, and specifically LALR(1) compatible. So let's switch to LALR(1) and see what happens: ```python json_parser = Lark(json_grammar, start='value', parser='lalr') ``` $ time python tutorial_json.py json_data > /dev/null real 0m7.554s user 0m7.352s sys 0m0.148s Ah, that's much better. The resulting JSON is of course exactly the same. You can run it for yourself and see. It's important to note that not all grammars are LR-compatible, and so you can't always switch to LALR(1). But there's no harm in trying! If Lark lets you build the grammar, it means you're good to go. ### Step 3 - Tree-less LALR(1) So far, we've built a full parse tree for our JSON, and then transformed it. It's a convenient method, but it's not the most efficient in terms of speed and memory. Luckily, Lark lets us avoid building the tree when parsing with LALR(1). Here's the way to do it: ```python json_parser = Lark(json_grammar, start='value', parser='lalr', transformer=TreeToJson()) if __name__ == '__main__': with open(sys.argv[1]) as f: print( json_parser.parse(f.read()) ) ``` We've used the transformer we've already written, but this time we plug it straight into the parser. Now it can avoid building the parse tree, and just send the data straight into our transformer. The *parse()* method now returns the transformed JSON, instead of a tree. Let's benchmark it: real 0m4.866s user 0m4.722s sys 0m0.121s That's a measurable improvement! Also, this way is more memory efficient. Check out the benchmark table at the end to see just how much. As a general practice, it's recommended to work with parse trees, and only skip the tree-builder when your transformer is already working. ### Step 4 - PyPy PyPy is a JIT engine for running Python, and it's designed to be a drop-in replacement. Lark is written purely in Python, which makes it very suitable for PyPy. Let's get some free performance: $ time pypy tutorial_json.py json_data > /dev/null real 0m1.397s user 0m1.296s sys 0m0.083s PyPy is awesome! ### Conclusion We've brought the run-time down from 36 seconds to 1.1 seconds, in a series of small and simple steps. Now let's compare the benchmarks in a nicely organized table. I measured memory consumption using a little script called [memusg](https://gist.github.com/netj/526585) | Code | CPython Time | PyPy Time | CPython Mem | PyPy Mem |:-----|:-------------|:------------|:----------|:--------- | Lark - Earley *(with lexer)* | 42s | 4s | 1167M | 608M | | Lark - LALR(1) | 8s | 1.53s | 453M | 266M | | Lark - LALR(1) tree-less | 4.76s | 1.23s | 70M | 134M | | PyParsing ([Parser](http://pyparsing.wikispaces.com/file/view/jsonParser.py)) | 32s | 3.53s | 443M | 225M | | funcparserlib ([Parser](https://github.com/vlasovskikh/funcparserlib/blob/master/funcparserlib/tests/json.py)) | 8.5s | 1.3s | 483M | 293M | | Parsimonious ([Parser](https://gist.githubusercontent.com/reclosedev/5222560/raw/5e97cf7eb62c3a3671885ec170577285e891f7d5/parsimonious_json.py)) | ? | 5.7s | ? | 1545M | I added a few other parsers for comparison. PyParsing and funcparselib fair pretty well in their memory usage (they don't build a tree), but they can't compete with the run-time speed of LALR(1). These benchmarks are for Lark's alpha version. I already have several optimizations planned that will significantly improve run-time speed. Once again, shout-out to PyPy for being so effective. ## Afterword This is the end of the tutorial. I hoped you liked it and learned a little about Lark. To see what else you can do with Lark, check out the [examples](/examples). For questions or any other subject, feel free to email me at erezshin at gmail dot com. lark-0.8.1/docs/lark_cheatsheet.pdf000066400000000000000000001322541361215331400172300ustar00rootroot00000000000000%PDF-1.4 1 0 obj << /Title (Lark Cheat Sheet by erezsh - Cheatography.com) /Creator () /Producer (wkhtmltopdf) /CreationDate (D:20180526220623Z) >> endobj 3 0 obj << /Type /ExtGState /SA true /SM 0.02 /ca 1.0 /CA 1.0 /AIS false /SMask /None>> endobj 4 0 obj [/Pattern /DeviceRGB] endobj 6 0 obj << /Type /XObject /Subtype /Image /Width 156 /Height 25 /ImageMask true /Decode [1 0] /Length 7 0 R /Filter /FlateDecode >> stream xm=j0gRR*9BGRGFu9v7PLMNvwGF(9F/ vK gdendstream endobj 7 0 obj 290 endobj 8 0 obj << /Type /XObject /Subtype /Image /Width 156 /Height 25 /BitsPerComponent 8 /ColorSpace /DeviceRGB /Mask 6 0 R /Length 9 0 R /Filter /FlateDecode >> stream xߋUU_=P ( '*,ʗ*E$p4b~HCQC/o9ss[,ýZ~}i}}G?Ͼ}n{\zYhF?w5nFA?z_1z?z{ ͱ '*Z"t>}wS/gϛq"#?̟Z4 JSD`v Ԛ5L U'ZuRn~7\ôB1;(bJYŅO}nsɏ?8qYfx{ЈVƏ4EB|Wgz sYuX-97. lƮz{E8𱚗2Tڋ>2QgΞPlJcd)_??)̬ ">Ol~yb c/Tҏ?9e_33G̃j̏T@ZadhИ{ak 3H)M<0R kkM:|df1/]~҂uLY؈zU:e3UhβBF1jRW5M91lZ?21ϰZ)loLꀲcP01!3xS#ꎗ5nl,n"C|7Azv-x]9]ƁHz֎<\(6LY*i-z)ғa.gpMm2Yl[Sf6ȩDto bsVaIm= VL&U!NlL@H*XSQ"QSM;&*]z$sb6g'gԸ4kЗN>+ ª܎:T䱛Ni5.Z^'1}zbmZ6#؆*ȊBN:r3FCzb0ڱUMB#!d`jSԲ1mIL{܆ &X0V*"Z#[Ԯ});d dLĩFW"K{1]'1/sqq1MY%,ri7i+ a}LHԞ)Pd1RRqy02ǃXbk\y?]鍩#*ă۪},dZE~uU/T}Y 2aj,ڟ>u3^Ԍ6oj3l/1c> stream x[v0 D&iZ^2,F_Ƃh !B!B!g$ݐQ\%*oq ¶Fm 9Թi<}8s:u1ʚu[}gYRw`rA(r$٠OEy#vO+՗}Y7 *̬[0gjw/\`-`A8PsP=9żK=9 18K =qYRVoX1K,^ڶq@;1VK|ʷ`t ucVx˻`.ҀTP*:{3ν73Q+:tvV =3:cK:G0g s1+g:{RљE1T^y1yOyPљ'yq/Qg߫v+#vnAWt6/>tgMp*,O~ ~kbAcvz1jvi#逘ג F! {uĽnWl#9 ~ iݵh7>P8~=Y3վPTR?[8f=)l:I袎o"Z:s8Nr5/fB!B!B;endstream endobj 11 0 obj 615 endobj 12 0 obj << /Type /XObject /Subtype /Image /Width 180 /Height 180 /BitsPerComponent 8 /ColorSpace /DeviceRGB /SMask 10 0 R /Length 13 0 R /Filter /FlateDecode >> stream xnZQ/REp1=9 _6^qYIZKr^SbHtMywR[w*w~mT_>B6oFp]l ɞwʫ! P*!Fa\_Vq2mdE¸dϑz3|i0 .ÆX$<^Gtɏ#M&ƥb/Vwwi|kX4"aނ]*ս;T=(?y F7tRD#>0 #6rDˆQ#66ȶ\ڈau,&k|ɶQxәfuԲm܅ő,meG3lj)31L,d9ব҈}Y~|2 K#h'%+*8ȭxRGo;屭W4~y>ayO%o2a@φ+g]gTb]੊ߓ0O$ n)A Neendstream endobj 13 0 obj 893 endobj 19 0 obj [0 /XYZ -6720.48000 813.679999 0] endobj 20 0 obj [0 /XYZ 28.3200000 753.199999 0] endobj 21 0 obj [0 /XYZ 210.719999 753.199999 0] endobj 22 0 obj [0 /XYZ 210.719999 509.359999 0] endobj 23 0 obj [0 /XYZ 392.159999 753.199999 0] endobj 24 0 obj [0 /XYZ 392.159999 460.399999 0] endobj 25 0 obj [0 /XYZ 28.3200000 329.839999 0] endobj 26 0 obj [0 /XYZ 210.719999 315.439999 0] endobj 27 0 obj << /Type /Annot /Subtype /Link /Rect [28.3200000 770.479999 159.839999 813.679999 ] /Border [0 0 0] /A << /Type /Action /S /URI /URI (http://www.cheatography.com/) >> >> endobj 28 0 obj << /Type /Annot /Subtype /Link /Rect [187.679999 775.279999 219.359999 791.599999 ] /Border [0 0 0] /A << /Type /Action /S /URI /URI (http://www.cheatography.com/erezsh/) >> >> endobj 29 0 obj << /Type /Annot /Subtype /Link /Rect [238.559999 775.279999 406.559999 791.599999 ] /Border [0 0 0] /A << /Type /Action /S /URI /URI (http://www.cheatography.com/erezsh/cheat-sheets/lark) >> >> endobj 30 0 obj << /Type /Annot /Subtype /Link /Rect [83.0399999 72.5599999 173.279999 85.0399999 ] /Border [0 0 0] /A << /Type /Action /S /URI /URI (http://www.cheatography.com/erezsh/) >> >> endobj 31 0 obj << /Type /Annot /Subtype /Link /Rect [396.959999 60.0799999 492.959999 72.5599999 ] /Border [0 0 0] /A << /Type /Action /S /URI /URI (https://readability-score.com) >> >> endobj 32 0 obj << /__WKANCHOR_2 19 0 R /__WKANCHOR_4 20 0 R /__WKANCHOR_6 21 0 R /__WKANCHOR_a 22 0 R /__WKANCHOR_8 23 0 R /__WKANCHOR_c 24 0 R /__WKANCHOR_e 25 0 R /__WKANCHOR_g 26 0 R >> 35 0 obj <> endobj 36 0 obj <> endobj 37 0 obj <> endobj 38 0 obj <> endobj 39 0 obj <> endobj 40 0 obj <> endobj 41 0 obj <> endobj 34 0 obj <> endobj 33 0 obj <> endobj 42 0 obj << /Type /Catalog /Pages 2 0 R /Outlines 33 0 R /PageMode /UseOutlines /Dests 32 0 R >> endobj 5 0 obj << /Type /Page /Parent 2 0 R /Contents 43 0 R /Resources 45 0 R /Annots 46 0 R /MediaBox [0 0 595 842] >> endobj 45 0 obj << /ColorSpace << /PCSp 4 0 R /CSp /DeviceRGB /CSpg /DeviceGray >> /ExtGState << /GSa 3 0 R >> /Pattern << >> /Font << /F14 14 0 R /F15 15 0 R /F16 16 0 R /F17 17 0 R /F18 18 0 R >> /XObject << /Im8 8 0 R /Im12 12 0 R >> >> endobj 46 0 obj [ 27 0 R 28 0 R 29 0 R 30 0 R 31 0 R ] endobj 43 0 obj << /Length 44 0 R /Filter /FlateDecode >> stream x]K8rׯ-D 0 TWO`Xtރ%, "I@3U OoO/uSuw~!Yɾ:׶i^<<|g瓨ïF4=пS߁_mIӿ[~<Ps퇰=QszU+x_(NO&gYM=+[k[չVvt%O۱K74MݜZWBZLӸ*wz}m ئ'^'[*Q_9vS$x0v^mNZkIJ_h!?eՓG$ԧG~NoA}kR-#P_"%)Q!HI}~km{$ԧG~Nom1}ao*]{Pk=B=ڨ'롍~Խ77/*S ^1Ua}ǑLU#EoLVW4$cD꾻|My| [!McQ(߂oI4E'<>kPCBڋ$O >zTV&6GeQfqV> =Om\_`t#cooFFUU>8_14D3Iʫ )Z $]4{/k9zޒt~ Y%%?? ;I'BgK;~0. @WL:~h᧝: ImWvUQt؈ )b:>"}aoafU?|'e턷*lֈZֈo ֈo_X#oepσurF~p|q'+~(9bHe(΃N ˃n-L4-qf3Ǚ~ D= @? P̀gLxqfS)gÏ3$ fVfo_U\*|ҍvUA'~[Wa7˪paZ_ ޜ YѸ=w~%׸2m'YqFFl+ 83`3Ǚ['AN:(3O:}Rz89pk?9p׾uC?'mmŤC*^{ľ#o. |{89_3062Q-$D`z2M䔢_{]|HB{LRP8*-18o'?Rzt"п'IS1>U׈#趛VHl5nn2CJ i*-fHSupӵ3ķ/Mo^pRf}lO g;ķrf82HbHe(g[UBo6@G9.R tR? t<H0Y ęAK$؀eDp-QfCЭ؂fUa"αwIwKz{:'e#7!O?NvV$iҚ}?]iPxq?/ziBi~h[K_߉=ޚl=}y'1-۟OB'{Y1O$|}V i'?f'^S'}=p^7o;צ'}~vٽȮo;Z.]7Zvm{.Rp0 2'[|kS07ă넩0F+s;_ 1O} 1u;_lXBcVbҴ+zNBܷg_TҸ^\%IN5TpiBhҍX2)b`CkKf1\L8-^O6EVx˚J%|d'\j-h 2t0Tqf"TwvTIfZ Pɮ!(2b(HᙐjLǷʐ oM|9 Oq"'_*zah-?օ$m&eˉa3!U"B]5ֲwfܩ^,KFNxOOVCsܵ}{~;3 g;*Y*xTLdwV E%tuavQ$+#pK,205d]*  )@˃SQv$Sh[ڪ&V fY]Ly`Oe '8tk߄It!KvAopg쫴9OHĥ%FvJuV%_I B$ _f.mY/ሦ0SL9l[1v7i;v~i]`usVa4I\5>k+huȎA|K eYyff\; r)'?)|2,#)9+I6Il&D%ҫJO?MQ$u"dM8sVhH_6\rK%f ef)<6GO2)If=b>J" tKd}2#\_$[ɑb:W1ȸG!@9WFU2_FY+%Ia/W$TW U+u'и;9Uza%$EAʩX^]K͹:ٴX݀2]& }EXaD;sBE.QXad?̑[5G(HW72 J~4%Jic철V?짍Uo#RS=#wOuR m{eAUp>GZ饪ǃ-yvsw{ޛ솾unO+qGDBHpMR(Ltk,B:$ɡ=D""%%'N-%[b,ZuW<<0۲DW&,uC8yzќh*!h3NLr`~=,cNʒY%/KJ"1d?82?B~P4r\^\ׇ\xȅ+i-ˎdc"Jsт}\DeWIn HHd3?x'8&IbhCQ ;Y5u%݈P\w"r̷f$H7`Ҙ,]:ʷJȥ UsVMEg]q`>hH_\Ju哗JrEr .^dnnHVΝ,+g7 +Ri;YH):SreāUEeݹ4ZTثY(JMub"!*"|AQgIUJݔfK^NZ<;Dn &1,/Bi9N6I;|xIϺhΐYaG/3&d"U`wR 0uH(D*x\.Tm˪: fY!OcZo 'Ge2 qsrNIO8+?TN^ovT.WN5%r4hE(!-|qt:D`!_7Qi;܉l-xqH*+¤Hs63Vr'jHB Eې&al5`Ʉ|$Av仔Fkx51R/ٶ19*5 "tQ6~fCrΫ5aꁡsSiq. wj=ͳ|Ja?#.߀KUvEK7`r`Â8/{ՊQB^Ŵ0rEEi<-JC`N&p*z潷/P\-aRXxʸBNn~?Ediw69ݜ[-J؂[SJ*Rک 0Pܤ73ndsԄ[D$ŠkH /4 kʸ+ :8@VZM =Wa{ʻÈMg14IrxN$? 6 6a겵J={~AMY,( Q[u&stIrʧ =GU8 Fhe\`? \JP/dqmgst$SC Wx*93bDsNKu[%,#e[4ߥMHR7c%zPҕR/T4"ZȖΦ@Riq\c$0833o)Ӥ0,L_)/-cOp \BEuD!/ZL76`ӌ0^Z눉w xAG3 iv[rd`T:c#z-xS7:b͑`#Y:b/fb vyjfަ7aw| P~|]q.:5Jބ20ޜS;B ?) =s<0$Q_Ü+GAI.ՇNkƬ0i}.S7^m5w\ ڶqmתW[6=Tޅt{Փ}7ۉ%8!)$ǒNT´lۈvsnRa;U)lgeED"=cViz[[пBl |Ot+8W2#&[ԣbRS d\W)'N/N%Pt߹Vs ":cQqU͔ ASZy3w-geE6t`qVĭp{o.EN B,;_;TQS咁IH2z#e8:7z-U2M Px,u"{=Ssǃe#qƭ9OĢ6STÿwBǒqAp*_!4#܀%M-i apRp!{m~$jï3| 99YA~{pendstream endobj 44 0 obj 7381 endobj 47 0 obj << /Type /FontDescriptor /FontName /QVBAAA+Courier /Flags 4 /FontBBox [-48 -288 684 841 ] /ItalicAngle 0 /Ascent 841 /Descent -288 /CapHeight 841 /StemV 50 /FontFile2 48 0 R >> endobj 48 0 obj << /Length1 6368 /Length 51 0 R /Filter /FlateDecode >> stream xX PTW~4`44M4ݏA~h~ ""*""'F0΄d1'u2Nt4duS)MLʭ$qL"=5?jR5{~sw}P%)B u"(.Ϧʢ|2z]:BR;}-֫C9a}`гk+Mu`(cxB_4_a:T NQ*5ԫu6 nȰe%tEM#og!c]X:;1>lא /Ν(==vH{~씸6395puT jkL7}J P4AzEK묨9D/cW[ft/@Tpgθ\fsOXk}ot$wWt@]=PSUn, tLrTM۳*Jߒb.) sȥї]Mύ>].kHR@Dzaj-ʤ4.:IQ!3Vly춹yrۼ8*)Z+%A sD_Pqp->@`X*cl0ZCq~3-ao\DV}+(k2VL"dmKz۽ez0D/ g6V$X,UZw'qumnrMi &t)!9M5ɛ 61R?}/ H(FWu1,З7ܖSW:RfjVZx96_F? +@4Z]i3w13&2fK.k`a"`kLyl/"GϏlΜw-I) 4$RSlh$:`ק+ Ć"Yjr)1:Ly?ݍڃ8C̔=P8Cfτ0S ׈CR3w)_QcPD ޖCsoSc] ?\)WtXe&f"|K# <0Q&oC !uW.YiE KaL*"fWM9X.m]Z yR6>%ڃk;&8#ַs`ӹ`-) $8m:ԶmdҫcE`VP9Pn-s nb(}e{FvZ26km1SFWyN/RfѹΞK̓p.Y 4G 6^Ad\ݝ=k舰{bH* pɂWȋ祗8P%ʘ  '^o0>vϞbD{CنNg L(itI ǜ~8W9jMZN>HP1󫕬XլVҗXV0lR2f'dfWJ]#=*+kd$X6^!R>vug'-|1N$3-,@!.ů&Ylb4x$E镪'˧A&aWm LUmMm56kXiX#8qb4\66IE.+CDфgmgu.Ԟ3@BǾ40óe̷COA?Mĸq8,P:xJhouPdzwY“5[-"g\=(t+0Ylrw:0WQ(I?Dfȟ<9yuY1qurTV`^lZ}^jYC0lqv^c'1XՈEKiJ:wDA˷٩g(8Ёo/o`WCD l{!^QTr1~Kbb6а+ئ+sڝz}Q[vo b}4\׎;z>ad;DpMt~?2FslqÌ]^ozoʆЙ{j>?eEÏ؞QDS#DS0&#BkW]鴶8I(ZYAROm&u*=MTo*u3š`n/Q,![pY/I`X("kU $.Eh}">Ӝ-7ꟵU5;t5:&H0q75$Ϋ1ULUw ub_OoAsP^p2_,o*$.PooVBZD2mAkކ7'_] %yKB9(X $4@_ZiJ2DAJB 42 T.UC8db ~E7H$ԶҸ⁊xC+f*LN*4jɗ *+.4AX3!Ȥ|- NUҶ,{UM;3|ʇ˖3G\E$[<(1A/dHC~ u޷$B\G!u &yCw } D,L"ӨTv(Q<W _dȥN|U-}U3 &aSD\a:8:o[Tz\-;ZhctHyn]^ӟвtAᧉs$"۬*JXϱ 4rƑ@G09+Ȍ:hM_\niHa*u60 G!c~屢RGLHq2(hk=.N{jݍ%mc{3zdn>ZLONDbnsm0 R n+ϯ9D -<+sLf:!D=\¿<.b Qo0_LbSK!-x:e9^2_a4!enDu9tk@g0Ec+nfVdPcڞkX?:e] L&(۝;oWnmp$W gvWoU8Fg=Ԉdk( G@q(kq? 1clJ҈}23D"yDecnznPF1((3( pC 1r(.8ٌXnDIfnJl }GQZP%o=ӗ(wQFr~[Kq(<Š+~K݇g#d#:+m{1-D#)JrYrG4VZ.}ޒ9dn'?dTx|y|+}R,ϑsAEmG͒$/o2`T&Xץ2_*KUJQۨ>ꥶS1tR,ñSxg%=F`¿4*edR5*ۅ|݃R' Ƕ!Q;schXcendstream endobj 51 0 obj 4625 endobj 49 0 obj << /Type /Font /Subtype /CIDFontType2 /BaseFont /Courier /CIDSystemInfo << /Registry (Adobe) /Ordering (Identity) /Supplement 0 >> /FontDescriptor 47 0 R /CIDToGIDMap /Identity /DW 595 >> endobj 50 0 obj << /Length 728 >> stream /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> endcodespacerange 2 beginbfrange <0000> <0000> <0000> <0001> <0034> [<0070> <0061> <0072> <0073> <0065> <003D> <0022> <006C> <0079> <0063> <006B> <0078> <0074> <006E> <0064> <006D> <0062> <0069> <0067> <0075> <0027> <0066> <006F> <002E> <005F> <0054> <0076> <0028> <0029> <003A> <0020> <0045> <0052> <004D> <002F> <0025> <007C> <003F> <005B> <005D> <002A> <002B> <007E> <0033> <0035> <007A> <0042> <0041> <0021> <002D> <003E> <0068> ] endbfrange endcmap CMapName currentdict /CMap defineresource pop end end endstream endobj 16 0 obj << /Type /Font /Subtype /Type0 /BaseFont /Courier /Encoding /Identity-H /DescendantFonts [49 0 R] /ToUnicode 50 0 R>> endobj 52 0 obj << /Type /FontDescriptor /FontName /QACAAA+TeXGyreHeros-Regular /Flags 4 /FontBBox [-529 -284 1353 1148 ] /ItalicAngle 0 /Ascent 1148 /Descent -284 /CapHeight 1148 /StemV 50 /FontFile2 53 0 R >> endobj 53 0 obj << /Length1 7720 /Length 56 0 R /Filter /FlateDecode >> stream x9 XWOkUi"(Ȯ .( IX$7DT TZTRAuuZ>::S yI:}{B*17hj8B^ٱỄRMQ9*ᘪ\$D> K N?3~x!'k E˫H}N)CA}tֆdb͇ܷ4ڻOIB^!h&QVȲ1F݂%Y)-4}2c7pq O430`ܴiߡ7d7C=LA`c&"3nD(dv~-a ]% Wɴ<~,'rR?5_Yc:{'ѣ26T+ ɕ"kTd>jR\.Cјjo&V`_iC kH;vCVsl_0˦>o|m[:t&@媋_4-O)]!kֽzD/ۋ7кdȥ͕qlWqSzPdl(tז/9+ л5ʂm:5?nEט"svBM] ϼ3uy h>roP!x<m Y(Ru2h*]ghU^f" JO 9:,H@B4[ހn龜;A1gIP{jz&4Rgܺ%=( mbgP~o.Qb쌻' r7i:{+Ə/_L?}sZ,/qɃ'2wiT/9b$fi:kWsށon{7!8CKk΂po㨘;ޢiL϶^}g;}=Sпj{k`ܤ(|nmF:nd;bbOQAOIW$JvFFPxOUO~6ȻBuG-l|ˢ'Yt[ 8^Dpz B{ak`ƺj{/ Q y?U2Y=E{jܕiWBš?VϾC[Ly= ) qƄ +7f(brے52-^=J(m(bJ ~5~@ ZgQs&2Sy}TA-g7(jYaݻYrܯE[WdُO~ 1" W _PPS43Dً-ZHOH G '8ֆ-%b߈آ%bAH2HlՈLj!4|]kgcKGG書0\f78l="/xOI9s#;i1οcl_lp4,@ZCF83|Q I c6ȡ't,I=\;]]j~YB$qSO%_KIks/;85:r:oe>E+[9v v3GQ5bN1,{O~wƿ}dl0Esx0? GB"`ɂ  D~ oa c-vQX`Dq}v'찄gtظAg yVlq_g)ӕSUS3\5(r4Bܬ4+MzEl"hiTQ(,Q$*ҍu8XdTkJZQ+5ijL ])-3)5 |h"DQ[YQK5yssJ"l)nQ( IQUAAZ]FESކd b딉99q s͢%sQHٚ)[L /lZUڧsw\aC˜kJeiᑆ Ѭ ԗ|`ȴaE-S2ao,5l 0h E"VD4QnS #FRE1ʈ3QՃLy#E| wJ,uRILH($3p8RIJĹ@&b6L1I(i"G8E- lG8R=b+D+zlEG#huQyd.RQ`xaIE K2%~e=g:4#$fC^:ktLE8S~9hg ].ٌr!=4lQ"#d#Cn[v -;D)$GkOmSf]3Q_"5p\8s8k@Hhqiؾ>q$"vYȪa-gk"סc#/0x{(ifqU;Jȳ֏-4<k/żendstream endobj 56 0 obj 5548 endobj 54 0 obj << /Type /Font /Subtype /CIDFontType2 /BaseFont /TeXGyreHeros-Regular /CIDSystemInfo << /Registry (Adobe) /Ordering (Identity) /Supplement 0 >> /FontDescriptor 52 0 R /CIDToGIDMap /Identity /W [0 [278 552 552 330 496 276 716 552 552 276 662 552 496 496 496 496 220 496 552 552 552 276 826 276 552 552 552 552 552 552 716 662 220 330 552 276 552 330 662 716 662 662 552 496 352 606 716 716 189 662 606 276 276 826 552 772 330 579 552 662 716 552 276 276 ] ] >> endobj 55 0 obj << /Length 805 >> stream /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> endcodespacerange 2 beginbfrange <0000> <0000> <0000> <0001> <003F> [<004C> <0061> <0072> <006B> <0020> <0043> <0068> <0065> <0074> <0053> <0062> <0079> <007A> <0073> <0076> <0069> <0063> <006F> <0067> <0070> <002E> <006D> <002F> <0036> <0031> <0033> <0030> <0035> <0039> <0055> <0045> <006C> <0028> <0064> <0066> <0075> <0029> <0041> <0052> <0059> <004B> <006E> <0078> <0022> <0046> <0077> <0044> <0027> <0050> <0054> <002C> <0049> <004D> <0071> <0047> <002D> <003D> <0032> <0042> <004E> <0038> <0021> <003A> ] endbfrange endcmap CMapName currentdict /CMap defineresource pop end end endstream endobj 14 0 obj << /Type /Font /Subtype /Type0 /BaseFont /TeXGyreHeros-Regular /Encoding /Identity-H /DescendantFonts [54 0 R] /ToUnicode 55 0 R>> endobj 57 0 obj << /Type /FontDescriptor /FontName /QFCAAA+TeXGyreHeros-Bold /Flags 4 /FontBBox [-531 -307 1359 1125 ] /ItalicAngle 0 /Ascent 1125 /Descent -307 /CapHeight 1125 /StemV 69 /FontFile2 58 0 R >> endobj 58 0 obj << /Length1 5216 /Length 61 0 R /Filter /FlateDecode >> stream xW pSվ',Kd,[IƲ$/eX˖ddA, K!Nɤ! I%e(v'd?i` ?I^ӧѻ{|s#@& KHE!맑\ODmB&a;,22+䙛1D'a?sd fh>/luu &dJ v[㣅,꟒FTOXF|H$d2,%J-ԓNT-Re<+lX 0d:q E7 aAMzK6_'jl+yƙIqS-8 ŽjGkPLjbeWE m!{yzА=x*vd5g~eӕ̽}za)氋A掛bJI шhLB_T "bQhpNB4sN UZذU8U ++CbI *0)'eqr%gLO2˓)RDFcʠ ZJ]p]p&0S}хin멯qg10Z'0kw'ݸw<ޮg> thEt\#|R#((t\lC7[[z=cOKә !qCNLꖫ/š5T)e*_,_i`#z1;N3E*MRgeFV+88Ӭ=eV" agU~Ʊg5L%Hn\ mHѓ}h9=܋Q3hZ݀a% ϱD=3\AreP|?;XTt۷]~L4y Y )!'p @/ %Prϖo],V|qe9ptw-– +Wa6v2̯C#wY*J MJ2 %=^E=QHѓxR eR‘-s[iشƆr5iacSzi:ZV ѡ[){[&'2H> =VcRSy8`lݻ wZH~F:aP]Gu=x #ߢX>,h_[ Ӆs R T 3$ p7 0`{,JR=[;cܕy7C|f h-OEF|R2$сt"J36QJ6$Yd?K0BC: qSfp|,U93eA^{m<2+j7g%̴ƏفȪM̼xSxG*tSd̎P!*yG_yحjXۿң]VԖ(eūݥGO?lq˰0Ψnnw|>/~#f-Zsƚd.pֶjaΫxMMLϨ\tIF2ennii8xg\#\3 fZoU`R)T`/2 A[f-zNO%$<>)V4qn[|y IKVfXV/QҶSr=tj^TוcuPTͨ˺n.i0?õID4H|NK ~n>*ފҏhk!ފ[=;[QʥǏk렢?vl3ޚ~׳i˅E\9ue"TCΕ 7! Rن' :fqU&?얦.+L!Ȍ gFB`ç3@pv!]Ў],IZ:7.ab#, ts8exXEM5e!dlL,Rs> OsN)H|m,yy8GRKa߄m.|"MquK(wQ? TN/&q1c_`?bfƙI%~ے/%K^IPybw[2ŭ͒aH,Gh ɆZJd`Q:4 a Iw ~Lc{#4C%{#-!~NQ;Bǐ9aE<Aoϧwxyc%_:xGk mMb}/f{N]^Og/U[[||m:a;/ov5V>K[pYl{.e|wXgtV]iѲ=YF<t7\Ng[%ZĶ@@Apj-Oq'Dbb#M8֊R> /FontDescriptor 57 0 R /CIDToGIDMap /Identity /W [0 [278 606 552 386 552 276 772 606 330 276 606 606 552 606 552 716 330 552 772 882 716 606 276 662 716 662 606 606 276 496 606 606 552 330 276 ] ] >> endobj 60 0 obj << /Length 602 >> stream /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> endcodespacerange 2 beginbfrange <0000> <0000> <0000> <0001> <0022> [<004C> <0061> <0072> <006B> <0020> <004F> <0070> <0074> <0069> <006F> <006E> <0073> <0054> <0065> <0052> <0066> <0063> <0047> <006D> <0044> <0075> <006C> <0050> <0041> <0053> <0068> <0067> <0049> <007A> <0064> <0062> <0079> <002D> <002E> ] endbfrange endcmap CMapName currentdict /CMap defineresource pop end end endstream endobj 15 0 obj << /Type /Font /Subtype /Type0 /BaseFont /TeXGyreHeros-Bold /Encoding /Identity-H /DescendantFonts [59 0 R] /ToUnicode 60 0 R>> endobj 62 0 obj << /Type /FontDescriptor /FontName /QKCAAA+DejaVuSans /Flags 4 /FontBBox [-1020.50781 -415.039062 1680.66406 1166.50390 ] /ItalicAngle 0 /Ascent 928.222656 /Descent -235.839843 /CapHeight 928.222656 /StemV 43.9453125 /FontFile2 63 0 R >> endobj 63 0 obj << /Length1 16036 /Length 66 0 R /Filter /FlateDecode >> stream xMl#WntKj]4uCP'$1ة=vЉٞ]{ƚ' !$qOE+W8!.ġ HHHE*|NM?H͛}J ;lVg^Nx{?+~i {P2cgl\Ep`[=> /Vȹ7f'[p-|g$89\W/~'\땟c|…ryw.2ɭ岷x/QwY_0).|\~/_~>K3<`K?Z槿/4쏦Vb-yftE.#J U ϗtMKb.ҿz],׋^sw<v+^\Hds? ޸ϲ7~ח"u{;Jؿ׋rEϱϕ<{< R^_d^]^xI/z,{Ə2p׬6f,d`WX]kux].@017d9l ح3૰2^SZ]I.@cp}3j=up|F9y8x;l]u$ 4@Ň1]ݡ{Z0> WWŵ_.vĊGq(!~*P*mpOU>&NŊ38qUqv&;pJb<z]#AR0mq|XevApx(ǁ!kG`,x l~7dy/UQ$5EhQS>u%|{&ז\MchT7<)^H>] 8l%MQ0 }Y%^IHـ0$Ů >bM!us0ޘ)j!a`v ս)KNפ_i5OwVt^;ŋz}͚LgpZŴQ8"[߅q w-MŘw >u*Puy: ,5͘.a* VX**(ץψ]QNb:2fj1į~wƔ5.pv"Kkp7? M>R-3Fa!*GOk `az.QDY僢i։唴4c+L}XlQU2ѥwa'Z@tIq<ԕ-Px$%]촵tP 2d>עXgCTMr%Xjs1֔5YπNA}?Ϭ~1Dx:ZjRGM٢{$#iHq;JRy>ԡѣ1+j䒤/?g~\UPMxL'X) 0|t| |1O6C{Hx!՜ :TW2NFd/ӧuN }%ʜ=^rVrQr1uR9Y':8كIvLK^UTbdNvLPi%EaqԺyۥ'5Ϫ7B\6Y9 ?>\oJURɁI9wHoÎM1~B7urs6gZaЊhFqWjd/JJLHg& wEO9~YAtGcti[+*UL4r֭ٷc$@?"3-L^,~S{7&ܢ\$_wYԒoՀmT}QRY?Qf'>,'>TgM|xns,&4 X6ijeqJ$ؕ`jUmU><"AKW`$P/%cOLq"8t\9r»"MS|K#//yP~Z q @eB<0# 4x ;uh :XYXB&\bp(znН;1+HD'`U$0p']Id\v'Dx7w%A0A!PN"Gu 15F˞Jݑs0Мa  .iL d2>W HNDMsB|'#œ:į,."4$%I 1'CBB74+;+2xeWB&! mx`^ #JtʸSN,N_ F]ݮ8pJãCjr:C+ up Շ +(X P K[fjv֪Vhm fӾ-Zkl7UCXlNڢը[Wo۫XffvP5AbVJQo|n7&2v0bkX@c6͵6p6-PZ[ $6 nUkl4kmA Uhk;f!Vvn[&¢u֛M]o5Ŋ+ Kf}榹$LL#ºմf-Vر޶j6A jvaa P5oHnT[e]G-Z{Z^F@!Vp2@b/BtYrclVʨE*>$ڣ%KYtxR膓H^wOB!{e:@y"r 0 j3(P<9 ǡ(C1vC=} " DrɊ?N)oOYFx~/GZu2_7^NZX<U9u\n_Φ'xA|EK̘Ӡf ?M$^?W~xdW {^aij^IWⅾ?WxWʧo]YK\KT/Kύg2q?nL\L-nIZ&>e2qz 67NLtG;0X 6>G> /FontDescriptor 62 0 R /CIDToGIDMap /Identity /W [0 [595 0 ] ] >> endobj 65 0 obj << /Length 368 >> stream /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> endcodespacerange 2 beginbfrange <0000> <0000> <0000> <0001> <0001> <200B> endbfrange endcmap CMapName currentdict /CMap defineresource pop end end endstream endobj 17 0 obj << /Type /Font /Subtype /Type0 /BaseFont /DejaVuSans /Encoding /Identity-H /DescendantFonts [64 0 R] /ToUnicode 65 0 R>> endobj 67 0 obj << /Type /FontDescriptor /FontName /QPCAAA+DejaVuSans-Bold /Flags 4 /FontBBox [-1069.33593 -415.039062 1975.09765 1174.31640 ] /ItalicAngle 0 /Ascent 928.222656 /Descent -235.839843 /CapHeight 928.222656 /StemV 43.9453125 /FontFile2 68 0 R >> endobj 68 0 obj << /Length1 16084 /Length 71 0 R /Filter /FlateDecode >> stream xo#Wn[J@-Q+SvvuLw;'N<38Tq'ƍ?pġ-pCTR}/II~}J3lf{Rk_Ow77q;{+qO^YHutxR ק;ž߇k;C~㗟}5VYwN]>s}^}uO3V>{fn|/WuV!,ߐI!3`|dx JiEt%S vMp8.&m1@t!j0HG t=wqƪh/zX\\^yK^št0BHd$Vj;;ÛKNeyN{2N(x{u Ɋ*Irb Jl) ,,Ex2{n0_\\~H9$]$,ID\@3`(Hbp/W 3KRT֏ѕE+Q0;=Y%^ITө0H% 'HiL0=7"bJ ZHJHug’zd8.$~px]զ8Ԝ|LgpZŴQ8$[߂ݓZq w-MŘw >t+Puy: ,5͘a* VX*rQ@C\>#8֏St BD%;}Lʘq*cfQָC؉4.iSmݘ*H6&*&}J̐%:TҎɆF;?yDm죇HDeZZ'SҎ҈I,2vCqHKUGޑAh!z &_N"UuK{Z+UZY`l Id׀< Sm.ƚ:K)(Տ"OVGkT)X \ɞ>[wydtu$ (NtGI6us>G]r:t"zT3tS\+SRC ID_S"%d搏/AϤ=ffhۧ;!YJF7ى҈LeI"KZ0<\Hp/9mrQr>qlS9Y:8ف IvLK^UTbdNvLSi%E~qԺYۥ'5˪- Uلk\2B`ٔ;(ImϸDuj5}5ạTkuRFUNW &{A:}E(O(b ֐Y&|zdMe\gT*ږJVʹM%Hvl I56I?$KVD{ӈkR%{PedE35I3+:xaRun-;*kkUZTqbntDϾ\-S&ٮ:ɟi<`jfڻ<6qgX:EP& ﺖ|3llQwڡh%\xk ۩5?]kzΉӺ]<]c֍N#Wk󝀪«;vӒ:g|6 ;y:V|f݇(ԟ0J Lvnv$(xUalkoƥ_")Ŷ+߂ި o8 XCarG ,Aߺopqߒ#hW/bqGġʡAw 2z}΋D_xB 4Xؓ{b$, 9@}ة #Gҏz d@N~ :c'Fyt)hx̿p$ ( qGżq,Q^@0͝EIvcfiF!TQC %j)@a ?j'Xp@v21C`M!@D!Mىq6TzDW8H@E HWFY{";ߖj dS3!.B1 B9SmdF%Tكltzh Ѓu\4WuBk½( .[fQ X.vGbmܪ4RUӠUEBxՇU{c 2NUݲcC^*pH ]/L#p3ODfAAtb'(e7b(&n轫PS4@.YqP2)^`C> /FontDescriptor 67 0 R /CIDToGIDMap /Identity /W [0 [595 0 ] ] >> endobj 70 0 obj << /Length 368 >> stream /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> endcodespacerange 2 beginbfrange <0000> <0000> <0000> <0001> <0001> <200B> endbfrange endcmap CMapName currentdict /CMap defineresource pop end end endstream endobj 18 0 obj << /Type /Font /Subtype /Type0 /BaseFont /DejaVuSans-Bold /Encoding /Identity-H /DescendantFonts [69 0 R] /ToUnicode 70 0 R>> endobj 2 0 obj << /Type /Pages /Kids [ 5 0 R ] /Count 1 /ProcSet [/PDF /Text /ImageB /ImageC] >> endobj xref 0 72 0000000000 65535 f 0000000009 00000 n 0000044634 00000 n 0000000217 00000 n 0000000312 00000 n 0000007584 00000 n 0000000349 00000 n 0000000795 00000 n 0000000814 00000 n 0000002775 00000 n 0000002795 00000 n 0000003583 00000 n 0000003603 00000 n 0000004682 00000 n 0000028737 00000 n 0000033912 00000 n 0000021395 00000 n 0000039193 00000 n 0000044492 00000 n 0000004702 00000 n 0000004755 00000 n 0000004807 00000 n 0000004859 00000 n 0000004911 00000 n 0000004963 00000 n 0000005015 00000 n 0000005067 00000 n 0000005119 00000 n 0000005308 00000 n 0000005504 00000 n 0000005717 00000 n 0000005913 00000 n 0000006103 00000 n 0000007417 00000 n 0000007253 00000 n 0000006286 00000 n 0000006408 00000 n 0000006561 00000 n 0000006698 00000 n 0000006843 00000 n 0000006984 00000 n 0000007127 00000 n 0000007480 00000 n 0000008000 00000 n 0000015456 00000 n 0000007705 00000 n 0000007945 00000 n 0000015477 00000 n 0000015676 00000 n 0000020412 00000 n 0000020616 00000 n 0000020391 00000 n 0000021529 00000 n 0000021746 00000 n 0000027405 00000 n 0000027881 00000 n 0000027384 00000 n 0000028884 00000 n 0000029098 00000 n 0000032902 00000 n 0000033259 00000 n 0000032881 00000 n 0000034056 00000 n 0000034316 00000 n 0000038558 00000 n 0000038774 00000 n 0000038537 00000 n 0000039330 00000 n 0000039595 00000 n 0000043852 00000 n 0000044073 00000 n 0000043831 00000 n trailer << /Size 72 /Info 1 0 R /Root 42 0 R >> startxref 44732 %%EOF lark-0.8.1/docs/parsers.md000066400000000000000000000066671361215331400154200ustar00rootroot00000000000000 Lark implements the following parsing algorithms: Earley, LALR(1), and CYK # Earley An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time. Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitly using `lexer='dynamic'`. It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independent first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'` **SPPF & Ambiguity resolution** Lark implements the Shared Packed Parse Forest data-structure for the Earley parser, in order to reduce the space and computation required to handle ambiguous grammars. You can read more about SPPF [here](http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/) As a result, Lark can efficiently parse and store every ambiguity in the grammar, when using Earley. Lark provides the following options to combat ambiguity: 1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax. 2) Users may choose to receive the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs. 3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface. **dynamic_complete** **TODO: Add documentation on dynamic_complete** # LALR(1) [LALR(1)](https://www.wikiwand.com/en/LALR_parser) is a very efficient, true-and-tested parsing algorithm. It's incredibly fast and requires very little memory. It can parse most programming languages (For example: Python and Java). Lark comes with an efficient implementation that outperforms every other parsing library for Python (including PLY) Lark extends the traditional YACC-based architecture with a *contextual lexer*, which automatically provides feedback from the parser to the lexer, making the LALR(1) algorithm stronger than ever. The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of tokens. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows to parse languages that LALR(1) was previously incapable of parsing. This is an improvement to LALR(1) that is unique to Lark. # CYK Parser A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context-free grammar at O(n^3*|G|). Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars. lark-0.8.1/docs/philosophy.md000066400000000000000000000051221361215331400161200ustar00rootroot00000000000000# Philosophy Parsers are innately complicated and confusing. They're difficult to understand, difficult to write, and difficult to use. Even experts on the subject can become baffled by the nuances of these complicated state-machines. Lark's mission is to make the process of writing them as simple and abstract as possible, by following these design principles: ### Design Principles 1. Readability matters 2. Keep the grammar clean and simple 2. Don't force the user to decide on things that the parser can figure out on its own 4. Usability is more important than performance 5. Performance is still very important 6. Follow the Zen Of Python, whenever possible and applicable In accordance with these principles, I arrived at the following design choices: ----------- # Design Choices ### 1. Separation of code and grammar Grammars are the de-facto reference for your language, and for the structure of your parse-tree. For any non-trivial language, the conflation of code and grammar always turns out convoluted and difficult to read. The grammars in Lark are EBNF-inspired, so they are especially easy to read & work with. ### 2. Always build a parse-tree (unless told not to) Trees are always simpler to work with than state-machines. 1. Trees allow you to see the "state-machine" visually 2. Trees allow your computation to be aware of previous and future states 3. Trees allow you to process the parse in steps, instead of forcing you to do it all at once. And anyway, every parse-tree can be replayed as a state-machine, so there is no loss of information. See this answer in more detail [here](https://github.com/erezsh/lark/issues/4). To improve performance, you can skip building the tree for LALR(1), by providing Lark with a transformer (see the [JSON example](https://github.com/erezsh/lark/blob/master/examples/json_parser.py)). ### 3. Earley is the default The Earley algorithm can accept *any* context-free grammar you throw at it (i.e. any grammar you can write in EBNF, it can parse). That makes it extremely friendly to beginners, who are not aware of the strange and arbitrary restrictions that LALR(1) places on its grammars. As the users grow to understand the structure of their grammar, the scope of their target language, and their performance requirements, they may choose to switch over to LALR(1) to gain a huge performance boost, possibly at the cost of some language features. In short, "Premature optimization is the root of all evil." ### Other design features - Automatically resolve terminal collisions whenever possible - Automatically keep track of line & column numbers lark-0.8.1/docs/recipes.md000066400000000000000000000024721361215331400153610ustar00rootroot00000000000000# Recipes A collection of recipes to use Lark and its various features ## lexer_callbacks Use it to interface with the lexer as it generates tokens. Accepts a dictionary of the form {TOKEN_TYPE: callback} Where callback is of type `f(Token) -> Token` It only works with the standard and contextual lexers. ### Example 1: Replace string values with ints for INT tokens ```python from lark import Lark, Transformer class T(Transformer): def INT(self, tok): "Convert the value of `tok` from string to int, while maintaining line number & column." return tok.update(value=int(tok)) parser = Lark(""" start: INT* %import common.INT %ignore " " """, parser="lalr", transformer=T()) print(parser.parse('3 14 159')) ``` Prints out: ```python Tree(start, [Token(INT, 3), Token(INT, 14), Token(INT, 159)]) ``` ### Example 2: Collect all comments ```python from lark import Lark comments = [] parser = Lark(""" start: INT* COMMENT: /#.*/ %import common (INT, WS) %ignore COMMENT %ignore WS """, parser="lalr", lexer_callbacks={'COMMENT': comments.append}) parser.parse(""" 1 2 3 # hello # world 4 5 6 """) print(comments) ``` Prints out: ```python [Token(COMMENT, '# hello'), Token(COMMENT, '# world')] ``` *Note: We don't have to return a token, because comments are ignored* lark-0.8.1/docs/tree_construction.md000066400000000000000000000066731361215331400175070ustar00rootroot00000000000000# Automatic Tree Construction - Reference Lark builds a tree automatically based on the structure of the grammar, where each rule that is matched becomes a branch (node) in the tree, and its children are its matches, in the order of matching. For example, the rule `node: child1 child2` will create a tree node with two children. If it is matched as part of another rule (i.e. if it isn't the root), the new rule's tree node will become its parent. Using `item+` or `item*` will result in a list of items, equivalent to writing `item item item ..`. Using `item?` will return the item if it matched, or nothing. Using `[item]` will return the item if it matched, or the value `None`, if it didn't. It's possible to force `[]` to behave like `()?`, by using the `maybe_placeholders=False` option when initializing Lark. ### Terminals Terminals are always values in the tree, never branches. Lark filters out certain types of terminals by default, considering them punctuation: - Terminals that won't appear in the tree are: - Unnamed literals (like `"keyword"` or `"+"`) - Terminals whose name starts with an underscore (like `_DIGIT`) - Terminals that *will* appear in the tree are: - Unnamed regular expressions (like `/[0-9]/`) - Named terminals whose name starts with a letter (like `DIGIT`) Note: Terminals composed of literals and other terminals always include the entire match without filtering any part. **Example:** ``` start: PNAME pname PNAME: "(" NAME ")" pname: "(" NAME ")" NAME: /\w+/ %ignore /\s+/ ``` Lark will parse "(Hello) (World)" as: start (Hello) pname World Rules prefixed with `!` will retain all their literals regardless. **Example:** ```perl expr: "(" expr ")" | NAME+ NAME: /\w+/ %ignore " " ``` Lark will parse "((hello world))" as: expr expr expr "hello" "world" The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal. # Shaping the tree Users can alter the automatic construction of the tree using a collection of grammar features. * Rules whose name begins with an underscore will be inlined into their containing rule. **Example:** ```perl start: "(" _greet ")" _greet: /\w+/ /\w+/ ``` Lark will parse "(hello world)" as: start "hello" "world" * Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child, after filtering. **Example:** ```ruby start: greet greet ?greet: "(" /\w+/ ")" | /\w+/ /\w+/ ``` Lark will parse "hello world (planet)" as: start greet "hello" "world" "planet" * Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered). ```perl !expr: "(" expr ")" | NAME+ NAME: /\w+/ %ignore " " ``` Will parse "((hello world))" as: expr ( expr ( expr hello world ) ) Using the `!` prefix is usually a "code smell", and may point to a flaw in your grammar design. * Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option, instead of the rule name. **Example:** ```ruby start: greet greet greet: "hello" | "world" -> planet ``` Lark will parse "hello world" as: start greet planet lark-0.8.1/docs/visitors.md000066400000000000000000000074471361215331400156200ustar00rootroot00000000000000## Transformers & Visitors Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns. They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each method accepts the children as an argument. That can be modified using the `v_args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument. See: visitors.py ### Visitors Visitors visit each node of the tree, and run the appropriate method on it according to the node's data. They work bottom-up, starting with the leaves and ending at the root of the tree. **Example** ```python class IncreaseAllNumbers(Visitor): def number(self, tree): assert tree.data == "number" tree.children[0] += 1 IncreaseAllNumbers().visit(parse_tree) ``` There are two classes that implement the visitor interface: * Visitor - Visit every node (without recursion) * Visitor_Recursive - Visit every node using recursion. Slightly faster. ### Transformers Transformers visit each node of the tree, and run the appropriate method on it according to the node's data. They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree. Transformers can be used to implement map & reduce patterns. Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable). Transformers can be chained into a new transformer by using multiplication. `Transformer` can do anything `Visitor` can do, but because it reconstructs the tree, it is slightly less efficient. **Example:** ```python from lark import Tree, Transformer class EvalExpressions(Transformer): def expr(self, args): return eval(args[0]) t = Tree('a', [Tree('expr', ['1+2'])]) print(EvalExpressions().transform( t )) # Prints: Tree(a, [3]) ``` All these classes implement the transformer interface: - Transformer - Recursively transforms the tree. This is the one you probably want. - Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances - Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances ### visit_tokens By default, transformers only visit rules. `visit_tokens=True` will tell Transformer to visit tokens as well. This is a slightly slower alternative to `lexer_callbacks`, but it's easier to maintain and works for all algorithms (even when there isn't a lexer). Example: ```python class T(Transformer): INT = int NUMBER = float def NAME(self, name): return lookup_dict.get(name, name) T(visit_tokens=True).transform(tree) ``` ### v_args `v_args` is a decorator. By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior. When used on a transformer/visitor class definition, it applies to all the callback methods inside it. `v_args` accepts one of three flags: - `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists). - `meta` - Provides two arguments: `children` and `meta` (instead of just the first) - `tree` - Provides the entire tree as the argument, instead of the children. Examples: ```python @v_args(inline=True) class SolveArith(Transformer): def add(self, left, right): return left + right class ReverseNotation(Transformer_InPlace): @v_args(tree=True): def tree_node(self, tree): tree.children = tree.children[::-1] ``` ### Discard When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent. lark-0.8.1/examples/000077500000000000000000000000001361215331400142665ustar00rootroot00000000000000lark-0.8.1/examples/README.md000066400000000000000000000034151361215331400155500ustar00rootroot00000000000000# Examples for Lark #### How to run the examples After cloning the repo, open the terminal into the root directory of the project, and run the following: ```bash [lark]$ python -m examples. ``` For example, the following will parse all the Python files in the standard library of your local installation: ```bash [lark]$ python -m examples.python_parser ``` ### Beginners - [calc.py](calc.py) - A simple example of a REPL calculator - [json\_parser.py](json_parser.py) - A simple JSON parser (comes with a tutorial, see docs) - [indented\_tree.py](indented\_tree.py) - A demonstration of parsing indentation ("whitespace significant" language) - [fruitflies.py](fruitflies.py) - A demonstration of ambiguity - [turtle\_dsl.py](turtle_dsl.py) - Implements a LOGO-like toy language for Python's turtle, with interpreter. - [lark\_grammar.py](lark_grammar.py) + [lark.lark](lark.lark) - A reference implementation of the Lark grammar (using LALR(1) + standard lexer) ### Advanced - [error\_reporting\_lalr.py](error_reporting_lalr.py) - A demonstration of example-driven error reporting with the LALR parser - [python\_parser.py](python_parser.py) - A fully-working Python 2 & 3 parser (but not production ready yet!) - [python\_bytecode.py](python_bytecode.py) - A toy example showing how to compile Python directly to bytecode - [conf\_lalr.py](conf_lalr.py) - Demonstrates the power of LALR's contextual lexer on a toy configuration language - [conf\_earley.py](conf_earley.py) - Demonstrates the power of Earley's dynamic lexer on a toy configuration language - [custom\_lexer.py](custom_lexer.py) - Demonstrates using a custom lexer to parse a non-textual stream of data - [reconstruct\_json.py](reconstruct_json.py) - Demonstrates the experimental text-reconstruction feature lark-0.8.1/examples/__init__.py000066400000000000000000000000001361215331400163650ustar00rootroot00000000000000lark-0.8.1/examples/calc.py000066400000000000000000000026471361215331400155530ustar00rootroot00000000000000# # This example shows how to write a basic calculator with variables. # from lark import Lark, Transformer, v_args try: input = raw_input # For Python2 compatibility except NameError: pass calc_grammar = """ ?start: sum | NAME "=" sum -> assign_var ?sum: product | sum "+" product -> add | sum "-" product -> sub ?product: atom | product "*" atom -> mul | product "/" atom -> div ?atom: NUMBER -> number | "-" atom -> neg | NAME -> var | "(" sum ")" %import common.CNAME -> NAME %import common.NUMBER %import common.WS_INLINE %ignore WS_INLINE """ @v_args(inline=True) # Affects the signatures of the methods class CalculateTree(Transformer): from operator import add, sub, mul, truediv as div, neg number = float def __init__(self): self.vars = {} def assign_var(self, name, value): self.vars[name] = value return value def var(self, name): return self.vars[name] calc_parser = Lark(calc_grammar, parser='lalr', transformer=CalculateTree()) calc = calc_parser.parse def main(): while True: try: s = input('> ') except EOFError: break print(calc(s)) def test(): print(calc("a = 1+2")) print(calc("1+a*-3")) if __name__ == '__main__': # test() main() lark-0.8.1/examples/conf_earley.py000066400000000000000000000020441361215331400171260ustar00rootroot00000000000000# # This example demonstrates parsing using the dynamic-lexer earley frontend # # Using a lexer for configuration files is tricky, because values don't # have to be surrounded by delimiters. Using a standard lexer for this just won't work. # # In this example we use a dynamic lexer and let the Earley parser resolve the ambiguity. # # Another approach is to use the contextual lexer with LALR. It is less powerful than Earley, # but it can handle some ambiguity when lexing and it's much faster. # See examples/conf_lalr.py for an example of that approach. # from lark import Lark parser = Lark(r""" start: _NL? section+ section: "[" NAME "]" _NL item+ item: NAME "=" VALUE? _NL VALUE: /./+ %import common.CNAME -> NAME %import common.NEWLINE -> _NL %import common.WS_INLINE %ignore WS_INLINE """, parser="earley") def test(): sample_conf = """ [bla] a=Hello this="that",4 empty= """ r = parser.parse(sample_conf) print (r.pretty()) if __name__ == '__main__': test() lark-0.8.1/examples/conf_lalr.py000066400000000000000000000023041361215331400165760ustar00rootroot00000000000000# # This example demonstrates the power of the contextual lexer, by parsing a config file. # # The tokens NAME and VALUE match the same input. A standard lexer would arbitrarily # choose one over the other, which would lead to a (confusing) parse error. # However, due to the unambiguous structure of the grammar, Lark's LALR(1) algorithm knows # which one of them to expect at each point during the parse. # The lexer then only matches the tokens that the parser expects. # The result is a correct parse, something that is impossible with a regular lexer. # # Another approach is to discard a lexer altogether and use the Earley algorithm. # It will handle more cases than the contextual lexer, but at the cost of performance. # See examples/conf_earley.py for an example of that approach. # from lark import Lark parser = Lark(r""" start: _NL? section+ section: "[" NAME "]" _NL item+ item: NAME "=" VALUE? _NL VALUE: /./+ %import common.CNAME -> NAME %import common.NEWLINE -> _NL %import common.WS_INLINE %ignore WS_INLINE """, parser="lalr") sample_conf = """ [bla] a=Hello this="that",4 empty= """ print(parser.parse(sample_conf).pretty()) lark-0.8.1/examples/custom_lexer.py000066400000000000000000000024611361215331400173540ustar00rootroot00000000000000# # This example demonstrates using Lark with a custom lexer. # # You can use a custom lexer to tokenize text when the lexers offered by Lark # are too slow, or not flexible enough. # # You can also use it (as shown in this example) to tokenize streams of objects. # from lark import Lark, Transformer, v_args from lark.lexer import Lexer, Token class TypeLexer(Lexer): def __init__(self, lexer_conf): pass def lex(self, data): for obj in data: if isinstance(obj, int): yield Token('INT', obj) elif isinstance(obj, (type(''), type(u''))): yield Token('STR', obj) else: raise TypeError(obj) parser = Lark(""" start: data_item+ data_item: STR INT* %declare STR INT """, parser='lalr', lexer=TypeLexer) class ParseToDict(Transformer): @v_args(inline=True) def data_item(self, name, *numbers): return name.value, [n.value for n in numbers] start = dict def test(): data = ['alice', 1, 27, 3, 'bob', 4, 'carrie', 'dan', 8, 6] print(data) tree = parser.parse(data) res = ParseToDict().transform(tree) print('-->') print(res) # prints {'alice': [1, 27, 3], 'bob': [4], 'carrie': [], 'dan': [8, 6]} if __name__ == '__main__': test() lark-0.8.1/examples/error_reporting_lalr.py000066400000000000000000000041511361215331400210750ustar00rootroot00000000000000# # This demonstrates example-driven error reporting with the LALR parser # from lark import Lark, UnexpectedInput from .json_parser import json_grammar # Using the grammar from the json_parser example json_parser = Lark(json_grammar, parser='lalr') class JsonSyntaxError(SyntaxError): def __str__(self): context, line, column = self.args return '%s at line %s, column %s.\n\n%s' % (self.label, line, column, context) class JsonMissingValue(JsonSyntaxError): label = 'Missing Value' class JsonMissingOpening(JsonSyntaxError): label = 'Missing Opening' class JsonMissingClosing(JsonSyntaxError): label = 'Missing Closing' class JsonMissingComma(JsonSyntaxError): label = 'Missing Comma' class JsonTrailingComma(JsonSyntaxError): label = 'Trailing Comma' def parse(json_text): try: j = json_parser.parse(json_text) except UnexpectedInput as u: exc_class = u.match_examples(json_parser.parse, { JsonMissingOpening: ['{"foo": ]}', '{"foor": }}', '{"foo": }'], JsonMissingClosing: ['{"foo": [}', '{', '{"a": 1', '[1'], JsonMissingComma: ['[1 2]', '[false 1]', '["b" 1]', '{"a":true 1:4}', '{"a":1 1:4}', '{"a":"b" 1:4}'], JsonTrailingComma: ['[,]', '[1,]', '[1,2,]', '{"foo":1,}', '{"foo":false,"bar":true,}'] }) if not exc_class: raise raise exc_class(u.get_context(json_text), u.line, u.column) def test(): try: parse('{"example1": "value"') except JsonMissingClosing as e: print(e) try: parse('{"example2": ] ') except JsonMissingOpening as e: print(e) if __name__ == '__main__': test() lark-0.8.1/examples/fruitflies.png000066400000000000000000002674271361215331400171720ustar00rootroot00000000000000PNG  IHDRX]g^bKGD IDATxw\בLAEq57Z)jeYiάLӴoj6ܚ#P@MȞ$q%p3><羯}羮UFB!B!MN B!B'B!BQH.B!B#JBQ&::j5ddd * sss*WJHJJBVBff&iiiAjjj実}}}}|```@ LMM ਄BWR !D)r]Ο?ύ7vׯs=bcb^-VNS~}pss QRSSyORR$%%@BByDy! (|4=`nnfffall &&&ajj}Rx !PɬBQrݾ}={p N"""x>ZMRROHIIyby ʕ+SR%,--Tڇ ˗ߛBdBQ\ruֱcNRFm莃{c]00,j5_#.tџժE7ۛ-[~EOXXaaaܹsp_#""׮RE%+'S~%66<# ̰jժTV +++lmmY&vvvQbER!J4)ԅ$HLLdڵX XUQ4jӞ[W2irry2 C aTR3)33nݺEXXoaaayUbcc6wYnQXJiii4z={zҴ*Jxyܻ‰];8} i)xԩSWDy!G ~]bEիG!^Eff,PmA<,,L3omm3899iUV&EL u!(oYOT06stG?sfF:vV΀5s&JG$##W}\vM; ԫWk[nQ6BPPAAAܸqC}bb"'777pwwE; BR !DqhXbS,5vBߠ}@on6?R2e ƍ%LJJ ϟ_[_~Z 4͍ ժUS:("##+W$''xoԨM6XBQPBΝ; <'O=zi3I$\ڵY4lPX)"""ԩSF˖-iѢT^]Bh營O… deeammM-h߾=^^^ԩSGB_H.Jh4|W̜9xJpMN?lRU$''sߏ/!!!kѲeK<==iҤ\K.J4Ο?Nѣ$%%aoo;wC2YB]!V0p [lawTdNghԈݻwu,::[eNc\\\&Ds .\Ț5kPT 6/KKK !.E0WoapgD\z` xVYl_5ǏgذaXXX(L/!!_~9s搑_~ѣeE!DQBQT6mɓ'4K)޸IDEg޼yJG)!!!L0陙p?tb'++ӹsgׯO`` =EèT*LMMiذ!͚5CRahhHfpqqJETT3ۊcTX֭[<7nJ3y;x ^^^T*<<<ظqc߿G}[oŻKϞ=3f yۿ?CRR8p ׏?N=PTiӆ۷rgINNf͚ݻWLѰ`&NHvhݺ5AAA/f~5k_(ZfĉܻwYo Ãnݺt4!D C߅ɷkR۹qćq|Ն+V0`(TgNIIw;vE1hРNrvލvJ7nyboo4 VVVPTK=ʲeXz5$$$hOz >e˖Y?,,Zjq ӧOƎNrf~zڴi]?==cffF\\Hlll?g{ *TMٳ'/ >>d ¤Ihڴ o}nݺ̘1Cz戋cذaѝ[/'0&W&'߳~|ǬQw$>;u#}׭i)^JeUP{4%-- .xJ%++ x4Ty֬Y8::R^=Mqww/OUBtuuvX 4icǎsΌ;>u[ 7nv4̙СCcǎ\rFÎ;1b<|Aaii ϟ׶sN.^Z[[i&RSS~ ǏArVfM -%/D|y~9>G|pSMjR"zzSŶFƼ1pHe/de[===n߾]m'E͛75kuԡYf̟?_; ˱gȐ!JF}%III4i҄ *0{l͛ή]hܸ1OlǥK8wsΥUV3=zrJΟ?Orر#4nܘu?ә?>׮]cڶs{xu~;;AR>mM-,y{,rW,g]*H* ߿zmKȧhEsgܸqtԉvZC ٳ aȑeUTaԩ 4Yfk_ e̛̙7ʕ+k={+VbŊ'q uFj/F7OL;\i>SN>ܹsi׮oh4n _:" -[z3rH,,,ر#@?/˝0ȽNĉLNx9w(_oTT)sիO"iŊ&-** kkVT\+W0x<~sIM^VVߧe˖y꒓~jsOPHs/W9ɔ3n0YZZt !D)%C߅YXXhHIL,(~fLU8ARC,YF)_ǬRѡ\r4k֌KҦM\\\Jl> ˳<4wF\53gN< 44'yttt^x}߿mY_V̳.L{$V$7nܠf͚JBRR !D!pz'Y * p?SirrHHEuOeefBJi;TA̝;pN>͈#xwرcG{$|yaԹx L”)Sسgvv2O?pVkk|-]ׯ_UV;!R +Ww_?3UVeĉy7nܘ|a8z([CVr ۔hO@C/^̌3h۶SP^=&O:HTTF 0]C077g$''ömۀG\Nذa֭ۛPϡCeѢEXXXh"mQ+K,N'*Ubմmۖڵ݉?`ڴiܼy WĐ:p'fෳl۶+WelBff&7n剟%z… |w,\XVX$￳o>N>B'z\ٳ/}vQlkkƿQF_ر+Wh'kӦ 5j֖9spI155ٙ8_NNNOw^,YR.Xx1ӦMcժU4mT8BJĸF!(cfΜǂ7Uo{кQ ̜9so߾4o_`~)_|ӧO/w:T:J b={V(ɛoIժUY|QG#5;N̙3LBRcBQ"##Q&ƳJ)n^ėzqTЖfVbԨQtЁիWk'c/nݚǏ BJJ 'Ndȑ4h@8/̙3 4sajjtbcc:t(wf=ZHBm\.Eښ=zו e͟&Ez> 7oޤN:L6M{x9VVVlݺ>,YKPf͚U"fΜ/ҳXlNNN\xÇK.(R !Dn,R,] ϜٳRlyxxp%Nʼyppp`͚5_5hЀ3gh" (^YYYYkRzuE|8|ժUS:%FTTK,aҥ[o1vXZht4!D$B@6mJ.o0t2?UKټGGǎSbEFFj 8{%r-99?7w^LMM'v/(PB%ܹK1`(wo,X>H8BFF[laݺu8pv튷7o&*TP:IKKcϞ=lܸݻwI۷/*Q!@ u!P͛۷/àS)WNGHE_W3gǏW:NǶmشiF__ӹsg:w(t7od߿Ck7*UR:BB[2`@59?RDHNoĮ̝;O>DHeBLL ۷og߾}:tE{077W:,11Ç ĄӥK~m)"B(ܹst>|=u+D˙Z IDATe eÆ JG* rdgW >\dZ6,ٙұ?8vN-Zܤ](*!!K.ϩS#22===7nL-hٲ%m۶!BL u!(.4 ӦMc9ʕêFMf݊^ɵz)'3uTJ1v?ӧOs)Μ9CTTvvvꊛnnn)XJw% K.@@@oFPremQ޲eK<<<(_ґH.JСC|J^{R(>MQI.E)99cǎqAKpp0* mݺիW_Kͺt0杻oP|oG/YL:wt4Q|2.]ի"+<<}}}ԩ#ԭ[;;;j֬IedEVwaaaBpp07n $$ ggghذ! B9)ԅ0=>i޽{#++ ==?0~bD)*US?055<7*'QJJ $%%@BBybbKll, y U q!(R !Da8q"s%;;YtqIN@?mdmee+W’30@O_}CC22 [&-9$ŒM(#'']==iݪ-[UV2PF!::|DR077ccc )_<PB066FGGSSS6tuu111ɷ}3332FC|||'''9@NNIIIjRRR$--t222HMM%##C[n}}}LMM=aii=aaa}^J'A!DB]! [FFnnnܼyS{tuuyXzu/''۷osu׎FDFMRR}zZ/_}}}R UVŶZ5lllpttɉڵkbQeff]NJJ"))<233rnNZZMbb='(ebb.FFF?$&&&ړ#,300(BB$BT Ɔ F]WWիsҥ !B2eaBBtq\\\8pg*m۶I.B!PBBΤIh߾=\tuѰaCqJO?iB!Dq"C߅_ 4~F}ڵkVٳ'7oV0B!(dB,̙ իytggg[j֬ʕ+J*B!3QBpU ?~Soeͭ[^~B!̑u!xj9s၁L8ё"]!B<u|7|Y!B!=B4 ˖-ՕDΜ9ĉHB!B u!x aaatЁѣGGq9B!(P2]!^К5k裏QgΜqJGB! !sDEEѽ{w}}HB!Fzԅ6ǫQ077ѣnZHB!u!Gtt4={ۛw}˗/K.B! !Ŀlٲ? *pAڷot$!BQHB#!!#GҫW/t•+WHB!ENzԅeذajvAݕ$B!(QBi9]ҢE ^*EB!P !ʬSN1x`ؼy3={T:B!ң.({Ҙ4imڴ^zH.B! QB)gϞeРADEExbFt$!B!u!DŴif͚\zUt!BQ,IԻr  "((s2fT*ұ\LL QQQǓAZZallXYYaeeU&FB!!!ܹ#Gpq(o`@m۪ԱL-*XYbeaNe3̌ˣ.zdIDCBJ*qD>'&>nFw$ɩiT(_-Ӷm;u놻҇/ʐ "## 22pbccy < 66|NNi?r_}}}#Ottt055\ri71'%%%ߌxDIJJBVLVVvܓ O=x\XZZbiiIJ%+Wlll~_B!MR !JFpqq_QX֭[u cafBG{<kѤ=ulREts7Bmb㰫YbDْNXXoDDDp='**X* +++maiiiI*U濋Nss͚6?O>rWDDAAAy \+V֖jժQjUlmmzXYYiřZܻwO5**J&<<86ڢN}zUhDB]Q7;;;~W5jt֭[/  s]zkFGttpk_<鏩cǎc̘1/_^hj޽Khh(׮]#00P._Ltt4hjժaoobŊ /Ӊ 44GHHv=j׮3NNN8;;cooBGPBl >___ƍ7|Szs<}DHHox3G++?L`şGr*Ug\t,QH>|ŋwooVZ%סCrA<==e˖2 Y(Çɓ'9wiiiTR-[ҡCtB:u*B u!DĉYl `ѢE(صkӆC֍T85hBHeZZZǎc߾}ݻ`LLLh׮[e˖4nܸT J,ĉ9rԩ]v]vrB]Q۷aÆ֭ґ Tjj*]v҅ 4vj)ЄG3g&+q1T$&&}v6nȑ#GHKKՕ.]ХK<==gDSo>G@@kx{{ocnntL!x)ԅWbb"ǏgٲeՋ%K^ج,z-N:)#k[gÎ~YKXņǎajjtR---ݻw~zCNN]tGt# QheǎݻFC.]۷/o !+)ԅ:t(,Y~[Hba_UGРvu{1qݏґJW`6nHJJ m۶o߾;rrQ&%$$m66lC044W^|'*O! qiiiL4 ///6mիWKm~zVZ-SE:me 'Sjh4CΝiذ!'N`ƌܻwO{KtQV1x`Gxx8gܹsѾ}{vANN1)ԅƙ3gpsscҥ,^͛7Srɹw˸{.F=/O5rR:"Uc\|3mqJ4F_>ݺuCR{n3f UK*xUTa\|S|yyXzBI.P\zz:&MUVԪU+W0bc)'ca\q}P:GڌcJܹsjՊӢE \/]vJ *N:iOlk׎ÇӼys'(äPB(˴hтŋ?w^lmmUΟ?Ϻw t\IֶJb|9w[n-F<`4o]]]Ο?/^Iff&'OT:߿ϦM9sQ˖-ŋѪU+GttфeBEj̙C&M022ߟ#F͛Ktj@(OG"ZPU߮^M] u?ɩSpwwlذcǎ^9>JԔ ҬY3T*4k  QTDEE=8&OLŊiݺu#aQTTXEg]*U k֬~J;w[ntЁ5jR{.GSNT*T*ڵCxzzҷo_]g_cƌJ.ݺu ؼyL͚5ٻwvFaL8vQR%{=l߬Y3Ə}VZĉw+\\\8p۶m^uc !)ԅEڵkhтoӧsqԩt";pML::Х7]jnߌΝ˅/EFF0m۶ѡC W^夥Ѷm["##|2gϞΎgrUéSl‚YfaddTѵ=ʴi3f }aT*2220`fffyϩ[.c?@ٿ?vb:t0x h۶- jȑ#:t___4 ޽[ pUmۻvחSNQ~}zܹs_w^ɂ 2e 3gd۶miӆ*kժKY9tuu8q"cƌ!44.=zŋl///֭[t$!D"0|7n./^dĉ+Wvm۶ &-ܔ\zXɾ:զUe6mT$SҌ3V&L ॶݷo{fСl۶ BJbҘ0a3kKKK>Ҟ۞J*c d,\}}}Qoq~HKKŃݻv5@,W'OѿK{r`llw}Gvv6 ,ȳ}D400~А{{{/^Ljܜ?6m 1cKy9,,,޽;ɯ~A033c͌3}v# !ʈX۷oӮ];ƏϤI8ydeʼn'hX&FJG)VT*MJG)ݝ:u0c n޼m_~O믿NNއ~Hݺu ыɡXZZy-wX… rJ Kv_;VVVcbbP lid9\]]]v /(* FAMtSy!DhXl 6Ç={iӦT;[3e 3>p?MŊ?0u&zMϐY;~~IOfqag,$蝨X>YE_ U`ǐ˽ϟGVxōYYYܺuӧSn]3gOl3ydTҥK\*T@W"OBB&L`ҤI;Ν;3vX>|m}||000`ܸqIҘ3gCÃ;r4 ;v`ĈC %...?^Ν;x"]tybxyyqe8]ͯ{999ܿ?7Љ͛7Ð!C^h=z4ڵQFJTTFbԨQhgggi& i\ΝYf=/]]]m9˹uμ `ggt!DPBXwaС;vq1}<=ˢ;w2m {j֐$fNzvm9q!80k{* !s3qŸȞuՊN}'Cg-epa]`bWiih''BBB/s<<<Ƞ}E͞=F]VJNʠA5k?6hh(3gd޼yT\YٳX+V<'NЭ[7UFppQƍ3'ӧ133COO/߼ڵq"~ 2rrryWYl} *Qbj *G[4aKھL&̙) +V`͚5w<7?sT  СVVV^?<^^^j5wvrkwzߑ2E1vXxgΎ_ƀŅiӦ1|sZZZA~dܹX[[o0n^](),fVT\RU=[yvKĖCyNj=k@x\vj_U nѢK.͛={y橊ijj`֮]KTT:ý^ժiwZx1^^^|^_nݻwgΜ9T *JVXA޽?~<.]`Ḻ{1uTm[okƔ)SBmPᠠ@UW^^ƎyͣHT~bϟ$&&IvCCC~G0aB{\|QQQIKK㥗^B[[ڰ<==ٱc}NVuSu>Osd0# @|=Off&~~~} u֭[|ǼkM0 O;UAA^^c 0ҡCuUEGGstP<){ϜX-M ש-Y_&M,LY69tظ/^ @)fG[/W/" ~,?iZ1cj́T]\\3gׯwߥwLt¡Co;v,[qUUH|2bddD́'|2Ǐ̌7RXXH@@ lSSSvf̘1rQ?#֭‚u֩T---:t駟d+++6oL޽iѢ=K[[B1cptt'OoInn.vbܹ@,Xӧ7opeIKK[[[qrrb׮]OϻKDD:t={W_~zڴiömV?paBCCY`իW裏xj wsssx9wyyy'|BPPСC^x{zG_-6m?̡CX~}iɡo߾w=Բ L85AL6R֯_ψ#ZܼyMMMw X]]yq 5K/;'#''qѥKG755???8|* AѺuk֮]PjUTTfСٱqFu@bb"$//* <E.CS(XիW3j(&;*~L%NNN䄋 NNN8::쌝2 RIs1} Ə'$ܳM}66mbJKKG>}^·/r9K,a֬YkN<3g0i$Ο?uСCL4 :$tAn1nGʤI 003g;${$Iff&.]BKK L CpqttܜG4uC74J%~eԨQ"IvvvdF!CXv}o 5eϞ=|7n:|P/Gʕ+9vOJJ0a uiu(oУGZhkDnݺ8OZ^ZJJ ϟիWTp#1'HLb K|Sݡ4zzz|g?~x<<<7oڵkʕ+YnCڵS{_߳m6Zj*ڴiChh(vb֭"IΉ ӹs>TIKK#33 RIOO;)''Kjjb!&&&XYYamc]X[[coo3hkkx|r|||ؼy3[~P_UTTIpp0AAA\xp455qwwSN닞-[(J.\Ȋ+խ\{a̘xuT?fmB.F>xC,7'N`ryƍȑ#E>Ic޽ܹcǎǰaØ4i}E&;DA;D]ETTw۷㉿@qqj[c}lM43k Cl06X#] u04Hc#]45512A[KSu]]-I+T+'_@^"E)R %#_XBfv!r2̑]HfVU555ifo ][₫+iӆ\&OywXp!4={V1۷W%~~~5&7VVVdeed2wƍxyWX4~k̛WVw8MBر"I={d 0OOOu(.**ÇsaN:$I 0c2l0{.B}&u)//^DEFRDKKG+G{3;Ydo96& Jn'咘\s|ng /lAWG닏;v|C)66Vs%J%Ւroo?bۇ|gL8zh֮]˂ 8E㇠H{wU3gΫ]VbA^^dggB0`>l.8qBߺu 3331bI"BC!uq$pΝ;ŋx]BQ.mZ6v -[{%I"!9 ._O$>)qلH%+ [StԉΝ;ӥK{ᮯ S%N"##mmmڷoJ{:G@@/fĉ|'XZZܹɓ&tۖ?,LUKXW~VbѢEIJ%.]رc;vLjѣUUӔΟ?Oii) :>}ЫWNAD.4>;N$#3 m-MZe#mީI$EJz>%rz'sJ"yhiiҡ}{퇯/zw=oɪ<(( .PRR}pu222ݻc+44cPRT^)VENfW;*P >\! ɓ' V6)//yKΝi׮zzzWhBJJJv.\ $$`bbbԤC<ntAGD.4|YYYI)y6cA=EN֭Z0t_?ܹs!$$"LLLڵ*)ٳgH~W*9Lԃ 0ԫSJs_?I$㣏V1m4_#T*y&/_&,,LH憗 VyBMrss"<<("##vH%ު)޴jժI;p~~'NanfHCX T %W"9t2Ct5Gf<7j4SLcǎUEߪz#""$ 777r___<==mQ"V^'AC {^_/T|{r 1s&˗/s#))IITTEEEڪ6m憫+͛7\aa!qqqܺuXUBIJJ *)#pAhD.o$qjFG[};̛gDS g£Sٹ/MNL>{.#˹t*) %++ CCC:vJʻw$\֮]gE.ӷK;ݕnhZl?<ǁ(%>cK,Aq %IӪ=224vVVV4o\W}uvv 5^ 999$''JFXSi1F: B`֭"t+^оm14QwxCR*%Dz}o? }}fΚ͜9sdŋjw.֩S'v튎U vb}` gϝܔxзK;:WTpm_α$eѦ _z ̞x BæP(uVǝ]NNj[===5kFfpppGGGpttKKK,--ҪKg6deeIff&HJJ III B)]|nhXF S"Qꗢ"֮]˧1:$?Z۩;41evv***֦s;;ڱU5899E P{D.|wm 3ٗcmiЄZVZV{qI2.]$צDN:ӧ9)n܌AT-f&Xacn:hkiaCqi%eeQ(!3<2 NaQe={gϞٳΫ $I"--ZqWyg򙗗Rxzzz`ll2L5:*z LLL6W0_^^^RYEEHDnn.jAvUוJSuWpgƝVVV6: H a0ul7HЛ u֝LbK/?k\.'""W];HJJ2ERPz`kg3vvvjՊmҶmFQI_r9T=U e2 :ߝ[Uo)?w6ܫ---UCÝ_F Tfd$AhD.OII ˗/' `5}z!ୡ:^զF^T4ͦ6AAA]D.GLL #G '..Ĕ;$A.\I`I,[1bCAAu-ŋ4ZoH:w"Wyn'G"00P!  ZuL:u F F7o'}XjMF_O{'3^yxw   ) u͛zn$z_@[K; /%-?lywquueҤIIAA. B Ϗє>-',1 .69]?7gŋGAA(&'ԉ  e.NN͌E;}UT(6[2ջ!AAFH֭Iߑ-Ww(BSSW$ vءpAAN4qBuV u8KS^Tߝƭ M °^<LF~A1GCCFiYѩx㍗`f\Q8|*>xc 9yElZ3+ C>Hȅ[XquD$]WUN+~mLxkn?o͸ 9%)'f;#hiKȅ[?vߎ]Η2ٜ:95ݞ~@4L|sTT(U$.1_CWx=)Xҳz6n e{:2f\W#S05c՛jmWgEs+⛍_/yAAA]w߯/L IDAT|zl7ho׫Ơ!c ٽ~2=wgWUHq_QQ$x5bE-'/_Tۘ+/*k\-I'!9W^’>} >hkNghhG뎑K[΅+G#o 1 Ѷ(  4~;BhZlɖߑɠy>3Y/nҵ$_6ښщ"q+) >_?m-3 h1]M.eq{($QXTJl?vCZYcÆ*yQ)oيHq\La޸:YrDŤh8ؙ_bJ.#'t{4MK[RV bE%]v:3}r̛ }=m稥>^%ds<8S7p3wF5oI5o޾+  jr],&ԉ]v1n8gh/uy_͍[FPY/Rƒ{e+עu~LLL    uFGGֆgz1N!:yQ)9DRj Kja+bMGqwwWwH  Pu) ƍ#Gx fNxLz b3>r֭[;$AAAkbPٳ-~%gM$;,AKyg΄9+tAA=Z?IINJ`̞[m0;pD[=իpqqQwX ۷p\~xcxzzұcGQ7@W 999䐛KAAP((,,$//BAQQP(TߗP\\Bx`]U%zzz뫒xSSS100\)cbb1FFFcnn,AH$Bʊ+曍hn+3naoD$IdM>X{{{u&CVVׯ_'""É"11R E-bccU2 ڴi*z٦MH,A4 W||<ʩ?N]ѵ9]ީ9Fz>H+]M$|,Aot."E)3dp BΝXooEEp455qwwWYzD$&e<7k<<: gY ~Sy,] e(77Ud**iӦMd\ E|UCTTj%^^^xr>>>XXX9zA8sNqh455ogqc[G\Ok]Spz"Mȅx΄Q(/Ɗz}2dj1%% .TKދsΪϯZU\\Lrr2nnnrx&OLPP0}L`ԬW041cSiI1~M_Ӣ~۫;F/77k׮qE#""I077 OOOT_7o$[NN1kUR`ooONTΝ;? 7"Q4N>ӧ9u8TTT`an#^xu=.Nh5HI'F%r9<דu={ѫWoz쉇2\BPPETs{>OPPf5wΝ̚5 3;f WƷYzR"-!>ϟbΟ?ϟٳg #11ggg|||Tooo5kV@fk׮ٓ]AP MCaa!/_g9oRQQ-lij5mZҲddRV]^(vR61YDǦNdL1iT`O]N͑?dUEUC;uDn:Yt)$Ѻuk6mO?HqIoʕ+?n"-B Jo]_~ĉzy>}4ϟwNN+++u+Vvvv=$$tttҥ ~~~___1C$uDDD¯_#&eefe2pq4ff45kK#l,0BOWk~WT(̖#'#2 O!1%$$琕]Ɂ6m>>={ƪᅅNff&Onn.r F.SZZJii)ry8ee՞hcxG!2 333126SSS166 ;;;U1񥥥JHH!!!"uִk׎'NԸ$ѣG6nX:gf-,lm=p$f4+gӭKg*--%,,PoNXXjhkk#?.]i&OOѹw ޺~wc\RԹr9Ͼ}`ȑ9}bbbA\޽{ٳg 2I&1p*2*B&ux"[la˖-AEE]ikk3uT֯_(blڴFhhhη~ t8]P6 'o[Ɓ0`éIII|l޼4zI=zH*,,f˖-:u KKKyWqqqQwx 4,"Q|zjh# NttCo_5}3 Vl/Nʍ"mĕoܸ{Ǯ]d֬YA%of֭lذy-[V6^-=^u&|"I+//7n}mmmtuuW$I#e%%uj2a';'"==YfɅ ذaqqq,_Q%饥;&///O!4iμİyf]Fv2e )))O@$B0e/^\-q7---<==9sfF&4DgϞ߃ iժ `ԩ,['OrMiJym weTVP\\M6Cظq#W^ej_Cĉd2LLLh߾=ݺuC&Gnh۶-zzzd2RSS{lssszQGWP)::vEdxyyQTTTmcǎѿd2;wf׮]OKKcΜ91ѣG3j(ΝKzzz=dd2 WiL&gϞݻcB\\\ؿ?}~~~:Dyy9K,!11QaE[[ &pelw?"p'1]h2J%#FPo2SNѳgO5D'4$Ν㯿Q읱} חW:fN|7L8QᐑNr93f`̛7+Vԫe8@@@@Bwww"##ez)9u%I֖:KBN:_͛!//333f̘_]m8\]]VEԩS;pBU?vQb155%;;5kFRR͚5ϱ=HII ޽A@NNNNQF xϩ:f͛EM Aj" M< wUajI0vʌ38p ^^^Ln݊H܂Ne-={`ccÌ38u}[^^}ѣJ#G@HHh͠A۷{iժUDJ%/"SLq]cƌƪ^jsLJJ -\ӧO'##XdME#%I˖-yܡCZhqXуrss:tC "B"u+((`ȑ|{߿ d2{Pӹym;$I©cl|-^y~3WLl5նElcv|ր`|}z*$믿2sLaҤIXZZҶmj $ҥK\ޞݻwSTTĘ1csik|RS9=Ga+^*7n`СӥKNRw8 7 4b111doo/;w{wd2ԲeKTM M3g$@Ii~l%;rZח9K_:)rH-ڶv\6_윛KfwÉPιd(}ihI굇=Wm>_۷IzUV˥Lrpp,Y@rww|uVIKKZj%J999$ITEVV/JWcM>]P=۷dcc#J H'mݺU];v?o> ,XP$IVVV}?KK˻gu{0o<ѣ%ICCCu/Zl)I$I%)ʇOUZj%KJR$Iz{^{XXH|cuwߕ$BPA?v=]FgΜٹ?Ǫ5d䌗 \߬gkQzaUd2f=zz{W߼y+WҾ}{8ʐ%2%"bQ%vbDUjTV[7UAkV]e$i3VDvd' I|<Σzs]:'|wDDRKٳy&_|r9SLaȐ!xxx/( c̙̟?333sαzjV^9Μ9C^Q7n~,u5qD𥿿ǍɩEPi(J7O?)'*`ܸq̛7?QFQzuISSS_VVVE JHHH`Ō3S>sis9s0h ~Giݺ!)Em6 FΝٺusgT DIJKKCMM -mb= 555 Hz"SGDaM[p#]_v⦣Y~}U!OP(W`ƌ\¬pBӦMZjpMfll\`he"jjjYF$Eb&MpY?-;;Ү]555;`NNKM?O9Y`{\t4vm;&RdM ÜEjrrUʟAL<7o?7o}w:T`pvv.Ņɓ'3yd>޸qcdrOz*K.-rիE|&MXvm}7f͚Bϵ|r,,,o l'&&;w<ɓ'i߾JNz`dggӣGeπW;__>lٲLOlhaaQ 6o<5kFƍUEQH#*H:uĩSdС$RF!JH7+/ܥup꿃֍$>Uk&6mN|ezr-(}ϐs;Q(4mڴX[s Fŋy&ӦM+LGeѢEZr=ܤIaɒ%DEE)/[vW_ <)ONN8p Sn]f̘~͛2e ƍcذaP-%%y:vHJJr{nՅu}ׯƍ{f;v,]TY,`=͛5O?G},8x cǎ}fd;w.lݺ'r||<}\`nnnˋ͛7+=Ο?σ1Ν,틁rwwtt,JWfϞ=2 QNӥKrss9v-ZXBMMMxz}.rLn[9w9XlM''8vFFF臘2w\OiРl۶-[O@߿?aaaxyyq1Xl&&&,[LYjjjҴiSVX]'UVetIexϞ=L6[nquj֬Iڵ \g.]8~8~iuaРA矬Y]vk.ؾ};ϼg|G\tYfpB,Y£GXzuƦͶi&9'N,g٨Q#ظq#x{{ӤI~WeO3Ξ=Lz<|{3011)cHHH`ժUTVkbll\ٳ+V`bR<x3e8B劚B(&wf888cǎB',9s&s,`_]YQ)}[NQزe 0Ν;J{xx0e>3-Z^ %};ٓ pBUG)Qׯ_g;wNQݻwgϞS'(bcc9rȑ#<~ݻ3x`QuD!D%(rss?~5kFʕUW2#;;7np UF˖-qtt֭[SR%UGB]Pe֭[Oٳ'6l/ɧ~Q AX&//mٶx.666l޴ƍ:Vst򑘘5ޞ͛cggG&MTud!J\tt4W^ʿ`hhHfh޼9͛7u4h@ՑB uz ӧ3}tƎ˂ PWWWu,!JB`Ŋ|$u律l7[L{L"wJQXXXҥK`ddDƍqиqcԩ#EP(w׮]ʕ+\vWJBBʂ6GKNKqW/кM~>nݺ:($ׯ(Рf͚½~XYYQV-,--k ""H߿Oxxxa *rccc_B1)ԅj憎aÆ$D9sصk::֓n{n?=ĸ8ww}GTMgb?"""RjժXZZRfMjԨ%jբzXZZbff*"QZrss%66(e!IDD< 22R9$@KK KKgzt?LMMUxEBQPo <v؁# Q.İi&VYCի^N4떭ѭ\zkffs38KxSN СC]veeCtt/ Wfiii/̨V昙annfffcll*:񴌌 HLL$>>X>|HLL ,=z_7Y1Ғ5j`aaQ!D1B]E1aҥKeݷK/P띆Ӵ9mŪ;X֩n1A0"nv 99X777Zj%_ %&&* G,bbbfe===tuu144rhiiU/;;TIKK#--DIOO'!!TxBBB?ߌVzꘙQjU,,,05]!^N uQ:233ټy3gر$D/gϞח 2"mVݒ-145 St*SI[JZZh萝IVf&99GR#bcÈpЬT ;:Ү];Y^TT* +$ }*U(.:::){}顭imxzriiidff\NN)))CVVrⴤ$TXȟǤy5e "Oo333{k?H.J^\\}!00-[ЫW/UGBΝ;rUß "&&222$#=-mmtuuB___꟱3&ׯ__zr###Cy8qrr2=sg9//$IEnn.w_&XT0FFFzԤJ*TRMM200@CC@O}}}tuuR RreeV ed]|7778p֪$B!eَ1]pttzI.B!E (VŅ=zp1UI!B!)Eeر9~[XB!BQnh:8RRR0`ǎc 0@ՑB!ܑB]۷oJRRgΜE$B!t}oLJmۢB!BPodڵtܙ:KZTI!B!5)kQ(L6>#GcTK!B!=.^Ǐ4hׯgȐ!$B!DFFݻwcǎ$B!ȂpssCWW___6lHB!BQuQ$v&Mp9)҅B!H.^HP0gϠA8x FFF%B!t}ϕ͛Y`cǎUu$!B!Pz~!۷UGB!B/_uuuVu$!B!xkuQ'XZZ'EB!B2)ԅҪUի={رc:B!BuP0fF?-[Qu,!B!x+\BB}޽>@ՑB!&[۸ɓ'iѢ# !B![O |5jߟ5k:BQadddI\\III( 044D]]LLLQ*N,(deeBNNwlaQ(lFOOZZZT\]]]ttt\2ZZZQW$B ڵk5jlذпB!xΝ;˗|2WB_8FFԬY;찱uXXXPr!DIѣGKBBIII$''T#11,RSS$--M՗ddd:FFFhiiahh|chh3033jժrEMQXS ӧOg; . !EƑ#G8z(OPԨYՊFѸZ#j¨: ̩Z}*ak@rF2yy R3SK#:!I<\ν( 5hHΝpvvG BC"""ÇEyLLLw +f]ο[OOmmmT@??y[ZZ^Jvv6撔D^^ deeh`HHHx"==9455Z*fffSZ5eoeeE5VZʻoR%RSS4hGaժU Skkklllޞ{ B"۵kC ё;v`hhHB!DɓYz j3ces)+LF-[X`cƌQu$!̊ggٿ?]I:ky_=8;;oÄK-Cpp0x{{Czz:իW^h׮]E,JPdd$/^T>|||HHHܜ;#͛7WUɣGӧAAAlݺUGB!ʬӷO_rsY>p UubqEFnMz6;vQՑx-7od߾}xyyCZZjբSN899DڵUST\t'Or Μ9Cjj*899B=022*HRW/_uuu|}qi޼9 ?.K nժU=O>+VHB!DuazfDLs8#ﵐ;wЧOUx2dsͬ_ 5jĠApwwNxLN8ݻٱc鸸0l0zY\xK^^0n8/_Ώ?ȴiTI!(.]D{nƂުYѿockNq[,88ٳggׯÆ Aфxeiii޽sI9r$cƌM-zy@߾}9w6mwު$Bi7B۾\a?O"kq=AHB̙39t.Ǐo߾eyk!^ݻwYf ˗/';;QF1aU:ۡ^Eɺ}6ڵڵk:uJt!&@8\j,L㈷HHHδmۖ8O`` C "]T(uaƌܽ{Sa֭˄ HNN~I^={mۢ?$By.]b53 ׺Q!T6aGuVΞ=8KOOgۓʱcW^oհR | w_eƍذwW:Ě5kҥ ;vLJ5k:BQ.|=a"-<#)ﲼ g.tl܁o&~~%ԩSٱ|r,X/;wVuז%[0zhBCCҥ }> &&HK^) Mƈ#?~<۷oGOOOձBrҥK8u=+3w23Ypt=huޟ´3J{~y|||JQz聚jjj899D-psscdff>/cƌQ7p@:T̩Ԯ]KMPxb[h߾=ׯ_/1 Ӻuk"o%<<&VEt?)˰T>fϞ={vB!ʻkRߢ>mQu%Jڌcn+\-ӚhjzZv]D#ʟ7ƌ?3fg,--)YѥөS'9wds|24hЀRKIMرc }iӦBڵ+W\'8qϟgĈ̚5 [[[\R`vڱdj׮ upqq)rJ*^}]x1'Of̙矘TcnݺBshjj2vXosШQ#8 _/Qq=CTI!(W ̪EuJqΣ=bn͊ױ:fE|B<*U*󋲩jժԫWӯ_?͋ڵk>|8+V?//k.*WL=hԨ׮]Sn?>ݻwlܸ1ׯ_/"W^'00SSB_FPP\|7 nx coo/_FWWor'c)7"wa9s9G__IJEff&~!\p:u2,:w-Z ++ .H.Bk׮KF迵oHTUE!999\pqQzu?^:c[=z4Jt={ҵkח/w)DEǠA6ls`4hĉ ȑ#|KKKf̘AXX{X[[ӠA?xDӫW/bcc_iӦԯ_]K6;vF 4 "R1;ws4mڔg>E!/燮.֖MJ\ct~:3_n\ U>Ώ{1q7:4zCsr9u4K?,7 %~.Q( rssɓ|gҳgO6lǏgڴiԫW~I CSK0蠥ERR&Mc„ t֍ &}Ν6'NTN2Μ9shѢ΄P(طo9VVV$$$0dLMM… /]?˗/gܸq|gvݷo_zgpAFIZZь9#FpAƍG:ucǎԪU9s($''3o޼rssٱcC Q\~=W^U9ֽ{w~wn߾]:Xn~~~۷H^F( ̙ 144Tu,!ܺwujSIv\9:Ey/6m۶h"썣B!ʛ8K\CPW6v̓6l@PygN:6EX*iT¡ARɚDτPvYe׋$ټy3&&&899N.]J+f={67o/Pn377gʔ) 2~saaa̜9cff~9V^igΜW^ԨQ7n?PV-&NHPP~~~>wވUVƌüy?>7n|,0i׮5kƍ| Pu{ۛ,Q]ړgΜ]2BB]=zD>}f߾}SՑB !-- M3:'INޓ䵓2)S]`^>NXJI݊:V~ݻؽ{7&L(l%!*U؞߭CC\\\hڴ)UV-= ._s﷚ԎzŲe˔wm6f͚s_>|,KS{abb###>|(nƅR!!!hтHHB!1Io PQ]H9mZSYsJzVsyUDP( H]p'j*SMlw-=뿇Ν;۷3gΜ +t~nnnhhh<;woy3۴i999,Y'ggg?'DyPZo~n*rϏ&MJ~!mbjjJ\j\McͦK'J w=yURRT?zB3jԨQfBw:T`{ɓ}::ubIݩ[.3fO?eL2q1l0222g~a @ǎIIIy޲e Ջc7.]D׮]IHH`=zUVǀ 7w\ٺu3'>>xiRSS loܸ1?37odѢEӓ\{'**_ѣG Um͚5ܾ}QF(''3e֬Y%'Bնm[ȯ^QE ϡimw/,^}r=8=&+'vJw;Q>ʕ+᯿b>n8,OOOƌrHN<|^OO???Ȑ!C8q"&MԔǏŒ3]gϞÇ:t(IIItؑYfquue޽L86oތ˖-S0sLXp!L:t # xa&ׯ_$0aƏl75b@߾}9w6mwު$BTxڽKs9/R}BM޵k=NyDҳgOuF~|}CCCqppm۶޽5Eٓ pBUGyԨQGҲ B!xqk撗|XXXPfM022"-+{ǏnS01P%-Z[[[MƘ1cVwf^ʕ+|RewDFF2sLU?11,] ̄c^V^͗_~;zzz$BKyyyyVcx2B@WWM4ETyl֜YGe )w^befgf#nX|yC=<MCC<>c-ZIȱ#kj,eJTR4=¡ĺ <999l߾YfJn6lhkk:o$227vZnݺ|ɓi޼lw}ܻw{qݿGDD$%&9T+++ԮCڵ&MPv"ߟ>UrԩلBINN8q'NCzz:uɉ;_̾XXXn: =~rr2w$%&_Tߴ/LHLK}UQϙg0(M j*9!dذaoRQ겲ؿ?ׯ/_ҤI7=|)o޼/AAA7!!!CVXֲIJխ,053#CM@Bl`TE甤?Hb| ?JaCEq/$%&`hh-vv{k444wɐ!CС۷oа4:!\#449z(TV:쌃666}wrÇg8s@/2Fl¬Y^%۴]q믿(BQPXX+ljjժiGGGi޼sϙ3S 4`Æ jժܹ'+)oYŜAlxyҨQ#UG|}}ٱcΝ;XZZ憻;NNN=^Ç9pۛLڶm; (KPOOOӓÇsDG`V*hO IDAT-Ӣ]3n]s*iߘ▗ǍE@.r*F=իt)B!ٳgÄO6mpvvٙf͚yU???ڵk&?'OFKKs=z^={q# ϣ|x l/5XqaWHBٕ+W8x 7olk߾ĤlT DKK GGGzG}eI(B=//___6nȶm[IM}{8viCgNط}ψ~%7''''>.xB!DE#e7Gqtt '''7o=в֭/5kP~tSղruۊߨR/DQɓ'9~8~k׎͛ꨢ& SNq Q(4mTVΝKgd (/_aC~.>UU܉[RSwqzz  cǎqƪ'B,|}}ѣ\t uuuڴiC׮]ҥ Z*UVbjsoMՑ^ˉk')ĤưdHB7npIN8ɓ'FSSZlIVhٲ%hjj:(cp?sWg;1}89DJe,ى 3c`,1e c_& җdMJЦ}_OBu*:/uI빯=xԘ6 BQPP[2 GGGyΝt9bƸ1#ێ`GSWlؔT8oM`}Z;fׯXJ^"&&ww!88gK2׮]+++5jDƍ | hР ~﮸8"## Ç<|PGFFRΚbooOӦMqppyUמaod2mY3d֏_gp/q^c̷ܿ~ܹsEJAA^UޣGtuufy&7ofdegѾq;6MO{WUNyytJ gq)CʐO0qZN%~]4 )}}} D>rrrHHHӧDGG"##DEEINNJJJӨQ#5jT ~ð͘/G͂/SW m^V}333<{ҼysE% #p'NԩSDDDCnٳ'X[DFFNП8u֪ Y=6F6h]L>^T ~ӻwo D߾}E'wA)))ԯ__~_WWmm7--->˷WPP@jj*)))*%''@ll,$$$ӧOIOO/64hsssLMM1773334h.Mn^~e> ZuXk9Z9Tdph6og}Aɓ'ǓT*-1D"AGGQQQAGG%%%}MMMjժk.6{Sҗ}'HII)q\SRRd$''SXXHjj*OFFd<##צnݺ```!FFFׯW\۷_ЭogVm_VYvpssc:3ArM?Ή'quԡ}׏A@&.ɓ'D>$6:(,,$%ه_]m]$ abj)fff888дiSD @fSSSIKK#99Y~_*KR)呙INN|&(ly󴴴JUVV.6_NN:uJT<A[[F1eK7mɓ}2_͟, r|.&LdݺuuA^[RR7Νرcbee+gϞR  TG*c֭Lk~E i۶-ӧOG899):DAA^Kg}}}ҥ ggwS*3wCgl߾1c(:Aۛsqקk׮׏0AAx3/.}e4iј~qI]u;.w_ïkwK- `;vFpٹAwԋɓ's!nE[z/PD*-`;En^ͣ/`!e3+)WPPGS[ޕ~|AA񂃃9|0GڵkJ>}ӧ&&&QAW5!!!l۶%Ԙ$@EE1_`rs+3j)Ꮂ2Wf`!=z*$Arݻwp ӣO>ѧO  'ruH\OC_BP^io]Ut( B(((חp!"##`ߟ.]ޯ  %g 8|0=F&'?bt<~ (:AqY<ѣGINNёq$ +Oz("pڜ1qDGh|4%+3#8"QѸԅ$kz4g}n˚]hԴ1FecЂ9Khץ >t /$, da/F]Cϝ˗W3AK[G_ B-Ҷm[Rt  T%J7m\܌QTLti>2.=B/ Ʀ}BEҥhhiv ̭pmS F~1a>qX$}[M7EcA>yɢK?i̖1ok+; FWO ˶} 㦏}Ŭ {-]+6lPaA_nn.gΜĉ:t1b 6Tt  T=%Kߣ04`lҧ72@IY z:>{v_I ksЭOg MDطhB=z*, - rrrYOl]U;~ܨ |ȧ{,w_+7ddfHdddCA(yyyxyyq;FZZڵc޼y 4sssE( BWbմ44Qvtו]OC*Nc5-H&=5c%t $ У=2.;·uk$%5B! 9Lƕ+W2e &&& 8G`?~ӧOI  eRbFȈ ~w2~;F~1#{OT 0i`}FMZ̚+7 Oeʷ+4xLZ UMDDc턄`ggԩSEY  oDnjjJltRi**oJty#gdvX} y}X5,^$*"% /))k.\1fǎtA PH;vHNv._}6 x6>$'omGW_xKLDsssg-7W$SOL5ڿ+G~  #=C>9wl84 -ubne^a'5 }~AWڵ ///TTTׯGwbsAAUkԭhޢ9^G)"Y?NGCKy`Мs&d|"#ۗKgz kHNLa߉xm$%$`\ď1glZ$ Yt q1l[ Zu$333uNW>gމӇTuիBAJWXXș3g9r$ >2Fbb"_$  ˳XE?.b_3LƁ̤fU^hr=fAA[~HVmسgO- .**={yf°gȑ=CCCE' BWry6Sփ7`ͼr=e-Ww Zq֦X6o57cȷ)++clnD02)%vZj):9pHARJJ 888իWYd 111xzzңG PtFHPPݺwXblnTS23f.ȑ*:$Aj|∋#55TJVVD"AGG ]~clܸ?%%%θqpvvVthB5T*%33<5#*qEFUUuuRWN]6u֥VZhhh,.A^^2]4 vqlּ4E' {NN|>?? HKKC&T*%== '%233KOMMXn``>zzzcll,PN2o yDȦM={cɆڦ"т0w0c ,X{.{c &MZ9ӸyKl[:aДj!@*%=ܾ[7OJbVV ߟ!Cо}r?ni 9<v,G9&&]va"##֭&L?DEP6yyy# 'Œp###LLL011m+J ׯ/ֹdRb'Op!**cI%VVVXZZbmmM!P|8bڵܻw'''MСC V?#~t$= fţ8{Q9%cllZn:nܸA6m6m )Xnn.ܺuK~w&`gg'q뺅wSNN!!!LPP봴4L8::ҲeK6m*/(DHaa!'Oc>UcK>ޟCʼS%%qi9-;8tdڗfڵ>|}}}FԩS133SP (L&c۶m̝7|)MJ׏>FvZXXNrd0w\^Yoƺu눏g|WTR233~:HRi֬4o[[[lmm155UtB5#O޽˭[s`kk+O[j{gA囨?޽{l޼Sۖ7t̲OK⟿.q_xZm>#ƏOǎ_hlu-ѣG%D/BUѣupMV靤N;а5CKWZy8qYooo_Nnn.$5ieeeE,pE?E*ɉ:₽;_)BPqzTʹsu(>zt5ܫNZRn)K3>\8}7QV-\]{0lpF}EY+WвeK&Mg}&۷o'ND-NKUH6ͣ͟;8zw^V8q"#F?U08лwow^%XlٲݻwʨQ>}h$5L&c:a_ͤV ,+,ojvݻwINNfРAL6v):/## .p ACCΝ;Ӿ}{\\\hݺ89,T+\~+W?Czz:Ջ={ҽ{wѬPr~^;ϮyP]C4}CX7&hT/"#&2"H$6ť ...o[[ H\\+7nɓ'>#Qr(՘T*eȑy c@+:Js>ΝqƊF?x{{-fpYbb"ӓ .F~߿?ν TU ={'Or1ر#C aРA(:DAUDE '"""##ILL$11HMM%''l23JH$u5cRUUE]] ի^==ի>&&&XZZbaaUӃرc7o&''?3fڠIgҤI7fxliw,?69uꔨzK23g/兊 }O>_~d sӧٿ?'N 77WWWLB>}PRRRt #Q^,==]p@@X];v0n8Y.be 77/^p vt֍c2`qM Dff&'N`ǎ={ 2eƌS'mAD^5wέ[kH>p m|N<￯p.]ʦM>/iӦM ֭[Ǯ]H$7y桧AxD&*Z}Æ 5 ܹ 1)|s(u&&>];ߗTʖ-[9s&ƍJ믿l2rss?>SL AL"Ą Ctt4jՊ-[鉷eF͞/R wاϊJ;w͛7oa̘10k֬j0R<}OOwmmmONHH'Of޼y4mڔSN):4A!"Qj׮7ׯ_uL>SSS+BCC S#.ӰhXz 3r,+V$''GT9̜9={bggG`` ˗/GGGc9<---7oN6mH$ѦM6mؗĜ9sեcǎ y+WdhDYYY;wzB"}ӧL:>3h MF\\\Μ9ðaÐH$H$FI``.]bH$:uđ#G:‚ӧO˷d2֮]ٳڵ+ >!C{boӦ 3g,T*eDFFoKCC`ׯ_~%Mw(}Lj5A1\ ?' ):*+-9:}6FpMll,nҙ|'\x3jԨrSf̘]+WЩS'999ԩSmmmJL`bbBTT&&&o۫Rn]<==4h̛7222;v,ӧOs_O?ƍhѢ2,q$%%1n8V\ʟʴ~&L3FKKK! PswM={6xyyƐ!ChҤ ˖-#))ՃvލH_AKN]JZb&&&Z?Ϗ .(}&OLƍ+!+,,>c̘1v޿?FFF?%bEK>Plļ{ܸqdzr2WTVZVV4jH~ɓ'H$m7-Zаar)/O-Zٳ0}tE#B &u.]Ǐ͛8::̮]WtP0$&-e aǒdgf2o TaW31cSƥ !Yit4fqkwbrdhրZjVcW%EIÇYd 5M6xxxțmݺkkkƎPJzz:[nݺ,]={p HII)񘤤$n߾VCL6}v_.KKKɉ{ņ Xp!ܻw)S.*wv~񉳎;b _ܹs__`` hjj}]]]T9QVez @~شi,ӦMlڴ QVVm۶ڵXü?Ç'$$_~E1h ys8׮];IfȐ!4oޜ͛7+:AjWI:tCd_zX]@BB:s D"· W-%">~Lpz TXzz|8~2ȶM zV3Ez$H̙3ebbb^=L&O WWWBBBݻBJҥK aĉmݝQFd/_.^hh(/@իl۶m۶8˗ׯ̈́A,$ kM$ ZZ%%|YZG휞]vK_uV.;wdΝ2~ekҤId H믿H$TmI(1\T2}ʕbK-/^|Ei?f}ccc166~e۷s]F],/bgg7))).ӧOi߾}***z\TTe.qFƌٽ{7K;O>-˙h+BBlM'[EիL&#RQ:>e+6ЪQznU̦Md5֧OW>gD2JJJiӆ͛7өS'6mZmtbۋfK:)Z|ٲeŶ'&&Jfffc9ee2ﯩɡC>+.]p~%׮]C*RleNKJJ*qMbley̭[ٳ't'z%IPPCJ$ksrrb̈́3gpppՕ&Ax&W4wmOYg0oc*Ya!Y6x5GFZZYfZ(|}}0a}Gv/9)9?ydO<(q)U߾};w.sԩSM47+{^`` Kc566.]fggǎ;Jm7sLׯK=ֆ 022bŶ;99Wj&U?} 6ˋ{"JqwwE*.70`C^&-˖-#1rAlmmF轻5X?$e;U 86ݤ$96-ZXZpxbszoΕX)~2-ZUI2Z5bܹpmbIѣQWWMQI,jQc͚5 ֭[WZӾ}{N _~.]0l0n޼ bѢE;wwwOΘ1c߼sΤ˷)*/O>)u).---<==9u򋼜իWsY~U .WĉL6oʕ+?():!nEk 5+rJy=x`kGQ3dܹ2aE"B %uwMl2ĚKGï{_D"a.O\zg,]({hش9vʎ%?вcW6C߁ds]W---6o̮]^k-ŗ_~ <%4iR޺u˰a5jnnn̚5 ===Ο?Ozz:-/]O2zhRSSܹ3?999?s܈GKKŋIMMe͚5DGG0|9r$\C0a._\y._D ;yp*nUקOlllXfCP9W*:7ҿغuCUjOx*ى^S$Bfڵ>|}}}FŔ)SJ,"Ell,ݺuAXXX`ff昙ann^bɡXX0iR\zWPUû?c|}}(vI޽;;w7c^ӧOر#.]?LfϞĉi֬ym5j~~~/ _Y9y$L2E! PD]<ϯɞ*d5CCCPRRw~~%]]]iذ!VVVsA>e_ghAմm*qdׯ3tPbccqsso}:‹ݽ{%Km۶ݻwI늎fĉlذA'_qwwGMM={$ T /77W&vvvL8qKB4x`9U$ jB&ɛYXZ3s&XVBǽk #믿իéXn/FWWE1bQBCC9x fRt(¿Yj'OVIs_&M… K4A P&{ǎ3g|MG+UICL&7'#%by9YY\I+ԭ[ٳgHN=z4[~wNZ$ULZow^ڶm+vvv!tA*H*Z=::~3gдi7Zɓ'%Һuk (+?+oOxȐufօdazERm{n_ &LAZeXXX0j(͹r XYY):c3֯᯿=+BLL 6mb$%%ѫW/ €^B,UQFFǏg>}---Ə_|kAxkԅ+$$۷eqssE%MOOА4h籶xR)'((GÐG&zhkhAMYukoytZֳYviA!9$gArFD'&qSrPRRuCZj:tu֨Tsiӹz*MJJ 4oޜ:BՕ?nΜ9f% m۶eٽ`Сt)fs]?֬`ٲe̜9SHIII>|OOOΟ?*ݺugϞ[[[E(Ç9s3gMΝ2d}QD.T//ZGP㬬*.T?~؛5 ,62'-*1O"~c::t[n &44TpM 166.l ;|L&VZzjFUr?#Fbʉ/fy=*K#ZJ!9r7XZZʓݻ0᭥qy8s hjjҭ[7}>C  ˈD]?~̦Mضmiiiv:VRRBIIݻw駟q?9xPKEƶnd{Mhfi]iIyY k_p)ѠA jm\{W{ǞU?s./!TM\~___|||Gll,8::ҢE qttR 5Ǐuo֭[ܺu0d2}8;;SNE,P^D.TosaժUeZZIIaÆcǎbIBrr2WfZ |ک+v^=Ék8q|swlZZ׮][><)ԩڊEEE1g\ލMsG] n̍Kݻ7?EƢ%LnѣGCfhҤ 6664i҄&M`ii)[ /UPP@xx8Lpp0w!99DP "| 5Hԅ+77###RRReeezŁPUUe|7>0+û@Z!QL0 VroE3G&ɯ-*eoժUKbϏ///,{X|U򋂂n^ݿGmYSѡ  ==;wpmIVTT4jԈ&M`kkKƍsssQYJDFFNxx8!!!\]]jeY,K-+54-wnMj88sG7zu3OQ&mK]jԮ3U(bqb(Bu)B}|ۧ{yr x#66abb}kXYYaoo_"y KKK,--.yG999dggEFF333_$''7ZGGGjԨ%B2!Ξ=K=171(H<2?W -^̨QڄN8Qb<i֬YmdgѣG?zsΑ5w6XUCcc 0210?|TEE(HM!#5DRIAVc`h/ҺukeA'DKM2SRRHMM-Zffv. 666XYYaaa `llXXXbXZZZ5zLSh4\c%;;DCDFFj,ɡRI^^撟M‹)VVV6hiJ:/%3]Nέ/)NEB!FuQ$&&2{lJ_tyaS Zmm{埣/N#űcذa W!BDuQ,X3##l{?){_Њuk0/{5pqtLUB!9D]T*,YbbdB^0&{Q.s B!De%T>LfV&=u(GM<]kc]"B!D#KaJe׮]wu}z=~_o-wYk;'.ЈΜP9bmT={CMIfʀt4r,,, ڴW.ckiɼcF|5>ɞ3Xt>]TޯҵIƘWX{ODMLh2W;`e^B!UJ%2"oR?-zB^ }ѵISѨN]9Gc;Щ5ҳYsp?SYc+C:ueoq;%w3W<0ٿFFM=t?%;OYwZ\J*B!ʤG]T*qoıs);37V7;v᝞}ĝl8WOա:66 Sѭ^nT$ h߰Nոn/ɗہ4q_|磯>Cکˑoبމ;mPT$&&\e !BQI.*lk>gG+~⋵> ;u)/혹)^ks Jwg{wo—p=!.^^l071 33Su!B!ʐ }Ce[@3m} q7ϥTZR_O&&) e~}6j)wg''2/[!BLuQԨYR[6P;#ǡRύϥҤd˧89WXK~5.y< ii`mm]e !BQI.*f͚q҃]dwz-LLq '/lm+,@hJ_XRocaxf`w/((q}FMpw`-L'?7n䋵kxKr:~9͛yB!BTuJgϞffz=+,`؂,۽)HS,|@RF:͍ټؔds9^n$]eg}EJV&6# DЦuNI}{/,׉Sߘ}g0gfe4bv.}YQk4\H=˴\!B!(4ww Q x{yѠ_PZS#bٯTOIztB!:QGS#H,}utl4 ?΀%IB!9D]T::۴VסWbcd4]"B!D$t e ;džC`z\q.ꪎ[II|a-'OǧB!F樋Jk,;VM[m]S18 29s&m۶}}}]&B!DeNuQiЫW/Μ8>C \Nۛ .%͛7uִjՊVZaccpB!$Q[NN;t &:k: G{?.(aÆDEE‘#G !<<FC:uhӦ iooo B$%%Ozz:(J044CCClllpttQ>7!FR,<LMK>Bm->V(ҒD]T~.^cM޺HHg$de}7os3228y6y?rJ+++7oM[n ERʙ3gt.rd$ /~$==LJM˓w,-- KK666cggB!D)$QUC^^ÆeӦM5uy=-|L^vعkuy닊>q٢+WuV8@pa23056j8^:լpf%`dh!E*d䒚M\J:I\I\Gv3SSZjIի7*xbbb#66dIHH %%E}rr2*%%`ؠP(FOO+++:oq}ʢㅅdggV6xLOO ޞիk׬Yggg\\\pvv.B9IEաh;w.ӧM#Ó/ĹZ5]U͟Xo}.aoԩSL@@6yo֬eRgeu5~W6m@إpY[ҬA6pW8yc$NGFs2"'R^[o'JELL QQQDEEq-bcc#&&x+ qppΎիk'888`oomѓKzz!#11DFJJ %?jzGGG{͚5quuN:* !ijD]T= ~ ]]^f˽0 =zs/773gh! 6oKEVٺu+K,f߾TSSo:7iw^܎X e0chѼ9yAICx\"""vڤ<::7nPXXj&w:99ꊓLx ILL֭[kpMCCCܴ{cݺuiР:#!$QURdܹ+m_ܨI]ͥKӳ\5\||M4QFҠAjժUahy&q9Μ9CDD* mޤIZn^BI.ݮ]… Yj9ٴo؈>-[㇉aQk4FG1͙;w.ƍux,>>'OHprriӦ=00]H;wd!(Vo7uHO|b 1bCOx:J׽ cP)G˗/si݋ViPB*Eu!Jƍq2CB064oC:7 /,-u#pe=ߡHHMţ^=1Ν;3e:ĉ5kVZXXȅ šCHLLڴiC@@ڵ+ޙݻwfk잺Zɓ?>0mX_M*lE*߮8` ?/_.s˱(v޽{9z(XXXвeKy饗hѢ, &^(Rɉ'&$$GEjhݺ5;wGxxx:T!( (lٲM6q Fzi酿{]:Lrt:##8q%QW/(aC^׏}ҰaC'ίZľ}v'NPXXXw& -Z<*ӦM?ޞŋӿ'///7f|UewGC/c۶T˅|ٹs';w$22kkk:vH۶miӦ 7@ס R8F+:vcǎtWWׇ9z(SNeڴiUblΝ;Mޏ=Jjj*k֭[?]vP(PtڕKRvNJ%77_3|P<˧D-֞됪}H!DE""==3gp;:kW)}-Lͨ@5 [bca%FFX`ld!*eʂ| H"-;RBJF: ך.4j܈Ff(Zc4jԈ+VРA2z*{ↇhSv>佋ԩT*,--Q*r 㣏>bԩVPXXH߾}9v$SGR(JLdK^aeU Z~jzj6mDvv6۷_~CJLL ;wdݻ333oйs* -$QT*nݺETT׮]͛$&&DRR")呖~'V*}0;{{wpWWW֭Kݺuqww/!}.]bС1c &MTILLرc=zGr)pppUVn{{{FYzzzxzz|rZlY9F5Yh>|Ceʛ׸ {%4?G,Z~۷o8p 5juxB}:͚5cŊԫWOa EEEDFFj{CBBR100@R1rHJ <F&/6ʍ뱼b>6O>DT:|׬Z F[oU%GQիZK믿ᅬCBI.(݅ :t(׮]#((QF^ȥӧ;v@R=<lll?>C ͛һ?S-~WsQ7op*0LŽ;cС2?Wo_s%vܹsuhB !ٳgӹsg~'\\\tVBllcP(h4tc##Ν׺ʋlT*Ou}FFz"<7Ag}V&ͽj|Կ_ݝ;wPx!ċ&C߅O&003gзo_^{5 @jj҉?sαe.\ĉ8p [ť-6oތF{F:ɘbgeBj]WG֭[B+._#}eƍ=BKTTiwIzi7nJ)]t!CwaddE~7P*'%%hoUϾ}w~(‚/Jŷ~[z'''m9Ō7o&&&,^ؘ:u -?>3l>{V3f̠wcǎOϞ=S!ʂ$B'feeҥKٹs'GLJm۶: !88janbPBAs/wu(/T~~>۷o1d}ǜ>}ݻwzA.]y;# j|M]&Mw}w_?ȑ#K$[ 1<<ƍx,--j $33>N༷^֭}CddVBg%⩽\x޽{ӻwo BVV*׎=BeVޞh9z:ރ'pnY?a?m;'2.ߊCVs"_J _ϖdF|2-Ũ 㗮=k\6NzdZ٨T*4 ڤ}߾}%W^o!((lffw/011Ȉ >CLO׮]yIKK{AAA3qDBfJs2b6mJΝ EѰeF iii :;;;|}}9uꔶܭ[rY^~[n\p{jT*~W {V$$$籒kL^J-xZ@pp0}3F^v ݻwرcEQH.(3իWgӦM_cDz~~Ry Q54筗 Z[ )8WFݢ#[>xeWk R_/N:5U[Pf`frgQ- zBJلh{/+#G]xhK?y$>>>\xu;x{㵋=I&1h ϸqCMOO'''{RZj1b}6n- U^^^tooo}]2ec6Y>wʛjժ=tjB5.(sŋxyyѱcGLB~~9J.+o:S[g2Tjeg}'r}K}!#22FSN>XK5kdt҅-[V$jK/Mw ֮]ܹsKOII!**PT#ׯ]?ѣG?4-ͫYYY7|p Ÿٳ-[|9ܼ|jy|pqƒx:9rVծw1+9򡸇+{5 :L&>m[[OX_Ih  ֭[c…%z v'x>C|||߂-u֌?@; GV駟Ҿ}{x Ξ= @>}pww>f|'eڽNw(=Ԯ];cEm ֶ$%b[^^>&&&I?ܜcjj^_|nnnIHH`ܸq7Cnş{/wX_MihSbll 4B4gΜ9SA!*7fbKU8[yVd(޳&F kG_bS8zcaWqfʹabdh[f;eT+j;;dki]Ѱ+ԭȤ7z>pea<|3`2-cٲeꫯ2g-[kgﱑ|GѼyCٳyq9233‚ڵkw$z-8wƆlؼy3ԫWkkk֬Y@ƍ8p QQQ_߿-ZDjXh6I500ߟ~ 6wkӦ XSnޗ!Jr`DDDڵkٶmK,aĉ>| &PTTDPPΝ#//pttWWW֮]ˆ ƍ̚5p233ٵk7nd4hЀիWӸqcm<wرcL8[TT9'N ##\\\X`!!!dggSTTĮ]O<ȦMصk?Ci[|9ӧOg:!DW?{ҭ[7v}H-P 9y9/`ܸq O>_wޘ߃̙3S`>(=zP~}]R"##2dǏ߿Cy+ď?PXd Ǐ?f֬YGQD]1|ONӦMYbPԮUM߁uNq\qqV,7|&MĐ!CXhlx饗8|prrr\;&y:*ER~֬Y͛!00ݻӣG Br%vɮ]ؘ}oХK9BT?dٲeߟKVm&N[/2^UUzoFztHT}vmݻIJJUw ])#kׯcggGnݴ+:L!x !*;v0j(X|9;vuHe?`СPͪaWP_ ΝˤItZi߷oDI`` ;w}888:T!㄄pۛW^yΝ;Ӯ]; uB<-IԅKRRcǎe͌5W^cǎ1h@srZx:gy6AJ_VO>I|04 ^^^H@@:\Qj._ӧIyXX޴mۖ@:v숓B2#bZ~=ƍښ_~RmΨ#ٸiiRa=|mE:,ӵIRHH'OsssiҤ[z2S)**"<<3ghΝ;Gvv6K/D6md!De&⊏gرW_all̖-[x?!=-{n+(,b,vB9s2bdʢ/H.\R___A.x ΢\tpùt/^+W~ !#H.V\qss_% @!\͛ǂhrtlU.g+r4?wppVVVXZZjW&YYYdff}Fff&$''MʓGGG\]]qvv6ԨQU:[!t$QBTN*u IDAT f̘A&MXbuXMLL >t+WV116½#6V8X[P 3L040؈B /($[OrF $ed};\%\iۮ=m۶m۶F!!!Ad&''T",>MRVcaa!affVVV6 ZZZ7ZRy߹jDj ɡlʢ@=ޞիk55wrrB}B!$B-,,!CΌ34iRX}<''pBCC|ȘnINΝ?sJL111+K+pus |}}Z׷%DHOO/\u>>;v3f0}tڶmիWx5kزe P!BT$ FuBQY\pCży5j Bzll,^^^( BCCqssaB!Z'=BQ6lȱc7nݻw'&&FoM^^J^{"G,B!QBaС$&&2w\ƎK]}}}>C ])B!ʗu !sɓYx1x]P{nvB!D9#BFFF?} !(=!eo%44Ej5 0CLuݼySNAXXÉ!9)˲f͚x{y具75b,)JHKK#==,R$;; J%R>׮<+NMLL055&֘bff5昚bee%bkk fff-!G]!H6lxߐ{3c MDGGGsN  vl,zzz8tٽ5a\jêVadb"rs$#5ҒIIvtEEӺukھݻw!DRRIxq"~yyygii} jV=i^[[[,,,044|}y|LHdD$Y-%X*(JiRnۅRJ7[ZMU$P"bK/ЙkX9 q̜33~ΝBh4Q\\w.mYYY5NCAAEEE766օvmڢVGVV122u !#.JEEݺuѣU ػw/={N>}c--[7.^xĴ.C+gp2 Q?q|1СC֭[RRRHJJ"11k׮322HMM%++K8###j5[G ,,,-??jEnPIzz:%%%z粲Z.7mڔ͛䄓vvvzKX !$ !ăܹsٱcNvB9-lll8{,zj*,]J99٫7=h*+tQ{9/\@[WW^3QFѴiZIԮt\B||< $$$H||<$%%8]`iӦҴiSu=vvvXYY)xe^ꍄHMM50JFF8###h֬8::ꂼ3...Q$A]!jC^^ᄆr PTzk7lؐ^zsNT*,\y~KQQ=Ҿչ^9o@QaC amV}*++#>>*ۥKt8::Rf͚aiiՈׯL\\IIIU~vFcccpqqdzBu!P•+WسgwfdeeaddDYY3g 037@̢ҒbHOJdD` gA-\~gKll,111?*++6(|W$&&VHWT899榛 jW!MB('OΎ;F /`dlr1Ds)fڴiL4I}SPvv6QQQ7Vѭe(Krssu˜?^ؤacc\ၧ' W/w$A]!?QFq!<{ƭ sCUiI1+e?Ҫ VcǎJ̙3?~\ƢhhҤ ...kݟ-[ BYYY(PnҥB5ԅ.Xv-*VMym<-yS|>'*]CGr~7HHHyxzzΝ;pBԾTW舧']vGtU4B(@B(I2k, i TV5d eĈ}СC}pufff_|޽{9r={$##C鲄uu!{c\_u?_GTR!.$$3i$;vc*++y爏'22Z׏~M6mjWYY/ѣn:7nqqqn%W_ɓ}cǒNPP]ק(ծ];Zn}gN:ѪU[~udgg3`ʔ.IQHPB,--K/ֹKOpl_8?~)9{na~~ŚoX9_͊)Ƚ3g:=ޖa tj{U/̟?oooxtCo6o<"## m۶]]333311 +2e 'Nߟ'uc066fҤII把/yWҥ O=OF/ ggg9r$666o^d˖-8q988~z :tm~"""ܹskg޷omߧu/ݭ7xC˯uE @&M_=vԩS1b~~~:u>yGo3044Ww}a\|oZ˖-پ};O/P!D# !oO0:8i4RlJ; \1~0V; 0mst7@nu>5ʊr>_3 KMZB?Kyy9QQQXYYh4dff2~x&OLǎDqܸqL4 7779y$.\ //777 x9p#Fk׮o 6kRZZ뉾_oΝĉ4iGj5*ʤq7AzϹ{U][~Ǐ' 8&NHqq1'NcǎiӆJ._LYYjgggN>Mbb]}ghl۶-\~JСC?-8qOOOfϞ2sL;4<!YB`$%%ۿ|rFɓꫯt!̚5/چlݺʅ tU`ooOvv6%%%l:t(?۶m# J.3EEE\~UuuomC&*o0a7aèM6\vM~7AmCVyx뭷y淜-)) '''KXXzk˕+Wpqqȑ#(]nXnB!BT*F5z^JU斖fO:nb?Q7N;VM315#88r{h48y$p~xhH?tz3GFFSN?z(T֤I `ps-]T$^ttt縻sAu߬Tu릷 񶂻9>L0Aĉg…dffRRRZ9{F̈́ Xb ,,/55ޣ@;L*'ܣ.5Z}]LIk 7s-Ss*y9Y,ZFU0߱cGݰ)S<4!tUڐ\K_~QPPP5***aaaM(**_x^rw﮷e˖z3,33J#VSŚԡC\\\1c}HPח9s(\ύ9 SO=? iӦ1m4!nnnn Vjk_C;˖-vBɓ'ӴiS.]Zk?ޞS"--+WT9f߾}Un}js߿?ѷo_Ȁ{7_iժ;wdՔ3}[O;]נ9sйsgݜ BԅsuuĄk%뛾KYQ^΀QcqnՆkV{ήu+iɓ>/ ?/)׮}u.<@ee_&]=FSN5zޚ k?~/-5{lvͷ~TE]XXX)SINN_p!ݺu7{\YY'|B^>|8'N`РAɑ]a}-Ξ2_wd\Yj|DZcزe 6663y1w\{{[h1'OFV]eߗ;w_Mnn.)))ӲeK#Fu㶄hك/&?? ~giݺ57fڵ^СCc׮]ٳggg.\5 .ԅRCCC:uĢEذt[[[իnM㏹tϟYfhB:{޽{3f~L?3K.eÆ lذR֭[Gg666 DEE3o<ϟOFFK,klV\Ɏ;8|0&MՕ,VXAdd$ḻ_na:u^zI7ٵkHMMew 9ٳٵkYYYٱl24iRoڴ۷h"RJpp0׿>}:cǎU!DrVf}BZ0k,igw*] W&<=K'K~ˇpR%{{{/_ίJaa!>>>ӇpK:tYXpҥ<:t.)++cZf͚)]ӿ:tիWٵk[n-'"C߅M4%~{? t`˖_.۱cAAAٳwwwFK/ҥ !dV^MHHOG߿^.Pܣ.)??.ިL̘de͔^o"!ĉG*8qe˖fݻ7焨G_X|9v‚C+ԋUuu!m111tڕ' 3_O cz)QDii) }_~;;;KBEzz:;v 44;vPPP@>} d(].Jزe 1y(f|}o _[o>JKK" !h&,,m۶ӰaCz2d4 !ju!PO?İa=EFNNFw}5d k}͗_~ɓ.N*,,$<<0BCCIHH???ֱc{^]qg9sOC,~s>'!ӧKT- QQQ`eeڵ hٲ% 4Pr!F?ܹs={sCll,YYYvkժU?VP.uMbb"O+hq}}{ֻ/NF}7H/z^%ƒ)j E"Q^^εk׸|2qqqz)((wwwƧvaoo!].uѣGٹs'-ڸw#;QXEE'%Q|xO?Kw!''G#yy ͚5VZLqttY&Dv z׮]ƍ5 mVʛ4i!&A]!꺓'Oŗ_aMLx¿=>G#Ó/^ bDm!u{G=.MԐ*-11RsmmmqttYf899HqppZZP+tINN$'))IwKUFth7H!8 BQ_rJ,]JlL j5³ǓQ[R\œDEDį_E˖5QFѢEZE ))) a $$$Yaa1ViӦj6m=j&MФI055U͊";;LIMM%--tHIIх niffFft#0u9899aoo_nB$A]!SNuV6 QǏмM[tU8jcKLk`IҒbđx2ϜhbR^^F;4ҵkWb-n+;;[ns$'']ZZU711хgsss]733 333LMMiܸ15HW^YYRXXHaa!QTTDVVdggxVVvvvj5aehBܙu!҈$::?Hup+KkLؘFFPVRBiI 瓓y rLM!51J 6?__u놯/͚5Sêt]PkXU}100 LMM171E MMM111ARaee666{3ss[Ngddt5 )))rQ^^Niinⴜ*++)))эXr^PP@aa!yyycoE{چ;5ܼOV?BHPBMee%W\!66nLZZyyySRRBqQFƘbddnرkgLnժ-b]877| ,WVVTTT /$k{o ў: 65TJov4!XXX`hhw.KKK FcjjҨQ#@! B!B!DnL,B!B$ !B!uu!B!1~R!B!BpN rlrIENDB`lark-0.8.1/examples/fruitflies.py000066400000000000000000000016441361215331400170210ustar00rootroot00000000000000# # This example shows how to use get explicit ambiguity from Lark's Earley parser. # import sys from lark import Lark, tree grammar = """ sentence: noun verb noun -> simple | noun verb "like" noun -> comparative noun: adj? NOUN verb: VERB adj: ADJ NOUN: "flies" | "bananas" | "fruit" VERB: "like" | "flies" ADJ: "fruit" %import common.WS %ignore WS """ parser = Lark(grammar, start='sentence', ambiguity='explicit') sentence = 'fruit flies like bananas' def make_png(filename): tree.pydot__tree_to_png( parser.parse(sentence), filename) if __name__ == '__main__': print(parser.parse(sentence).pretty()) # make_png(sys.argv[1]) # Output: # # _ambig # comparative # noun fruit # verb flies # noun bananas # simple # noun # fruit # flies # verb like # noun bananas # # (or view a nicer version at "./fruitflies.png") lark-0.8.1/examples/indented_tree.py000066400000000000000000000017231361215331400174540ustar00rootroot00000000000000# # This example demonstrates usage of the Indenter class. # # Since indentation is context-sensitive, a postlex stage is introduced to # manufacture INDENT/DEDENT tokens. # # It is crucial for the indenter that the NL_type matches # the spaces (and tabs) after the newline. # from lark import Lark from lark.indenter import Indenter tree_grammar = r""" ?start: _NL* tree tree: NAME _NL [_INDENT tree+ _DEDENT] %import common.CNAME -> NAME %import common.WS_INLINE %declare _INDENT _DEDENT %ignore WS_INLINE _NL: /(\r?\n[\t ]*)+/ """ class TreeIndenter(Indenter): NL_type = '_NL' OPEN_PAREN_types = [] CLOSE_PAREN_types = [] INDENT_type = '_INDENT' DEDENT_type = '_DEDENT' tab_len = 8 parser = Lark(tree_grammar, parser='lalr', postlex=TreeIndenter()) test_tree = """ a b c d e f g """ def test(): print(parser.parse(test_tree).pretty()) if __name__ == '__main__': test() lark-0.8.1/examples/json_parser.py000066400000000000000000000047671361215331400172030ustar00rootroot00000000000000# # This example shows how to write a basic JSON parser # # The code is short and clear, and outperforms every other parser (that's written in Python). # For an explanation, check out the JSON parser tutorial at /docs/json_tutorial.md # import sys from lark import Lark, Transformer, v_args json_grammar = r""" ?start: value ?value: object | array | string | SIGNED_NUMBER -> number | "true" -> true | "false" -> false | "null" -> null array : "[" [value ("," value)*] "]" object : "{" [pair ("," pair)*] "}" pair : string ":" value string : ESCAPED_STRING %import common.ESCAPED_STRING %import common.SIGNED_NUMBER %import common.WS %ignore WS """ class TreeToJson(Transformer): @v_args(inline=True) def string(self, s): return s[1:-1].replace('\\"', '"') array = list pair = tuple object = dict number = v_args(inline=True)(float) null = lambda self, _: None true = lambda self, _: True false = lambda self, _: False ### Create the JSON parser with Lark, using the Earley algorithm # json_parser = Lark(json_grammar, parser='earley', lexer='standard') # def parse(x): # return TreeToJson().transform(json_parser.parse(x)) ### Create the JSON parser with Lark, using the LALR algorithm json_parser = Lark(json_grammar, parser='lalr', # Using the standard lexer isn't required, and isn't usually recommended. # But, it's good enough for JSON, and it's slightly faster. lexer='standard', # Disabling propagate_positions and placeholders slightly improves speed propagate_positions=False, maybe_placeholders=False, # Using an internal transformer is faster and more memory efficient transformer=TreeToJson()) parse = json_parser.parse def test(): test_json = ''' { "empty_object" : {}, "empty_array" : [], "booleans" : { "YES" : true, "NO" : false }, "numbers" : [ 0, 1, -2, 3.3, 4.4e5, 6.6e-7 ], "strings" : [ "This", [ "And" , "That", "And a \\"b" ] ], "nothing" : null } ''' j = parse(test_json) print(j) import json assert j == json.loads(test_json) if __name__ == '__main__': # test() with open(sys.argv[1]) as f: print(parse(f.read())) lark-0.8.1/examples/lark.lark000066400000000000000000000022011361215331400160650ustar00rootroot00000000000000start: (_item | _NL)* _item: rule | token | statement rule: RULE priority? ":" expansions _NL token: TOKEN priority? ":" expansions _NL priority: "." NUMBER statement: "%ignore" expansions _NL -> ignore | "%import" import_path ["->" name] _NL -> import | "%import" import_path name_list _NL -> multi_import | "%declare" name+ -> declare !import_path: "."? name ("." name)* name_list: "(" name ("," name)* ")" ?expansions: alias (_VBAR alias)* ?alias: expansion ["->" RULE] ?expansion: expr* ?expr: atom [OP | "~" NUMBER [".." NUMBER]] ?atom: "(" expansions ")" | "[" expansions "]" -> maybe | STRING ".." STRING -> literal_range | name | (REGEXP | STRING) -> literal name: RULE | TOKEN _VBAR: _NL? "|" OP: /[+*]|[?](?![a-z])/ RULE: /!?[_?]?[a-z][_a-z0-9]*/ TOKEN: /_?[A-Z][_A-Z0-9]*/ STRING: _STRING "i"? REGEXP: /\/(?!\/)(\\\/|\\\\|[^\/\n])*?\/[imslux]*/ _NL: /(\r?\n)+\s*/ %import common.ESCAPED_STRING -> _STRING %import common.INT -> NUMBER %import common.WS_INLINE COMMENT: /\s*/ "//" /[^\n]/* %ignore WS_INLINE %ignore COMMENT lark-0.8.1/examples/lark_grammar.py000066400000000000000000000010521361215331400172750ustar00rootroot00000000000000from lark import Lark parser = Lark(open('examples/lark.lark'), parser="lalr") grammar_files = [ 'examples/python2.lark', 'examples/python3.lark', 'examples/lark.lark', 'examples/relative-imports/multiples.lark', 'examples/relative-imports/multiple2.lark', 'examples/relative-imports/multiple3.lark', 'lark/grammars/common.lark', ] def test(): for grammar_file in grammar_files: tree = parser.parse(open(grammar_file).read()) print("All grammars parsed successfully") if __name__ == '__main__': test() lark-0.8.1/examples/python2.lark000066400000000000000000000144511361215331400165510ustar00rootroot00000000000000// Python 2 grammar for Lark // NOTE: Work in progress!!! (XXX TODO) // This grammar should parse all python 2.x code successfully, // but the resulting parse-tree is still not well-organized. // Adapted from: https://docs.python.org/2/reference/grammar.html // Adapted by: Erez Shinan // Start symbols for the grammar: // single_input is a single interactive statement; // file_input is a module or sequence of commands read from an input file; // eval_input is the input for the eval() and input() functions. // NB: compound_stmt in single_input is followed by extra _NEWLINE! single_input: _NEWLINE | simple_stmt | compound_stmt _NEWLINE ?file_input: (_NEWLINE | stmt)* eval_input: testlist _NEWLINE? decorator: "@" dotted_name [ "(" [arglist] ")" ] _NEWLINE decorators: decorator+ decorated: decorators (classdef | funcdef) funcdef: "def" NAME "(" parameters ")" ":" suite parameters: [paramlist] paramlist: param ("," param)* ["," [star_params ["," kw_params] | kw_params]] | star_params ["," kw_params] | kw_params star_params: "*" NAME kw_params: "**" NAME param: fpdef ["=" test] fpdef: NAME | "(" fplist ")" fplist: fpdef ("," fpdef)* [","] ?stmt: simple_stmt | compound_stmt ?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE ?small_stmt: (expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | exec_stmt | assert_stmt) expr_stmt: testlist augassign (yield_expr|testlist) -> augassign2 | testlist ("=" (yield_expr|testlist))+ -> assign | testlist augassign: ("+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" | "**=" | "//=") // For normal assignments, additional restrictions enforced by the interpreter print_stmt: "print" ( [ test ("," test)* [","] ] | ">>" test [ ("," test)+ [","] ] ) del_stmt: "del" exprlist pass_stmt: "pass" ?flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt break_stmt: "break" continue_stmt: "continue" return_stmt: "return" [testlist] yield_stmt: yield_expr raise_stmt: "raise" [test ["," test ["," test]]] import_stmt: import_name | import_from import_name: "import" dotted_as_names import_from: "from" ("."* dotted_name | "."+) "import" ("*" | "(" import_as_names ")" | import_as_names) ?import_as_name: NAME ["as" NAME] ?dotted_as_name: dotted_name ["as" NAME] import_as_names: import_as_name ("," import_as_name)* [","] dotted_as_names: dotted_as_name ("," dotted_as_name)* dotted_name: NAME ("." NAME)* global_stmt: "global" NAME ("," NAME)* exec_stmt: "exec" expr ["in" test ["," test]] assert_stmt: "assert" test ["," test] ?compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated if_stmt: "if" test ":" suite ("elif" test ":" suite)* ["else" ":" suite] while_stmt: "while" test ":" suite ["else" ":" suite] for_stmt: "for" exprlist "in" testlist ":" suite ["else" ":" suite] try_stmt: ("try" ":" suite ((except_clause ":" suite)+ ["else" ":" suite] ["finally" ":" suite] | "finally" ":" suite)) with_stmt: "with" with_item ("," with_item)* ":" suite with_item: test ["as" expr] // NB compile.c makes sure that the default except clause is last except_clause: "except" [test [("as" | ",") test]] suite: simple_stmt | _NEWLINE _INDENT _NEWLINE? stmt+ _DEDENT _NEWLINE? // Backward compatibility cruft to support: // [ x for x in lambda: True, lambda: False if x() ] // even while also allowing: // lambda x: 5 if x else 2 // (But not a mix of the two) testlist_safe: old_test [("," old_test)+ [","]] old_test: or_test | old_lambdef old_lambdef: "lambda" [paramlist] ":" old_test ?test: or_test ["if" or_test "else" test] | lambdef ?or_test: and_test ("or" and_test)* ?and_test: not_test ("and" not_test)* ?not_test: "not" not_test | comparison ?comparison: expr (comp_op expr)* comp_op: "<"|">"|"=="|">="|"<="|"<>"|"!="|"in"|"not" "in"|"is"|"is" "not" ?expr: xor_expr ("|" xor_expr)* ?xor_expr: and_expr ("^" and_expr)* ?and_expr: shift_expr ("&" shift_expr)* ?shift_expr: arith_expr (("<<"|">>") arith_expr)* ?arith_expr: term (("+"|"-") term)* ?term: factor (("*"|"/"|"%"|"//") factor)* ?factor: ("+"|"-"|"~") factor | power ?power: molecule ["**" factor] // _trailer: "(" [arglist] ")" | "[" subscriptlist "]" | "." NAME ?molecule: molecule "(" [arglist] ")" -> func_call | molecule "[" [subscriptlist] "]" -> getitem | molecule "." NAME -> getattr | atom ?atom: "(" [yield_expr|testlist_comp] ")" -> tuple | "[" [listmaker] "]" | "{" [dictorsetmaker] "}" | "`" testlist1 "`" | "(" test ")" | NAME | number | string+ listmaker: test ( list_for | ("," test)* [","] ) ?testlist_comp: test ( comp_for | ("," test)+ [","] | ",") lambdef: "lambda" [paramlist] ":" test ?subscriptlist: subscript ("," subscript)* [","] subscript: "." "." "." | test | [test] ":" [test] [sliceop] sliceop: ":" [test] ?exprlist: expr ("," expr)* [","] ?testlist: test ("," test)* [","] dictorsetmaker: ( (test ":" test (comp_for | ("," test ":" test)* [","])) | (test (comp_for | ("," test)* [","])) ) classdef: "class" NAME ["(" [testlist] ")"] ":" suite arglist: (argument ",")* (argument [","] | star_args ["," kw_args] | kw_args) star_args: "*" test kw_args: "**" test // The reason that keywords are test nodes instead of NAME is that using NAME // results in an ambiguity. ast.c makes sure it's a NAME. argument: test [comp_for] | test "=" test list_iter: list_for | list_if list_for: "for" exprlist "in" testlist_safe [list_iter] list_if: "if" old_test [list_iter] comp_iter: comp_for | comp_if comp_for: "for" exprlist "in" or_test [comp_iter] comp_if: "if" old_test [comp_iter] testlist1: test ("," test)* yield_expr: "yield" [testlist] number: DEC_NUMBER | HEX_NUMBER | OCT_NUMBER | FLOAT | IMAG_NUMBER string: STRING | LONG_STRING // Tokens COMMENT: /#[^\n]*/ _NEWLINE: ( /\r?\n[\t ]*/ | COMMENT )+ STRING : /[ubf]?r?("(?!"").*?(? FLOAT %import common.INT -> _INT %import common.CNAME -> NAME IMAG_NUMBER: (_INT | FLOAT) ("j"|"J") %ignore /[\t \f]+/ // WS %ignore /\\[\t \f]*\r?\n/ // LINE_CONT %ignore COMMENT %declare _INDENT _DEDENT lark-0.8.1/examples/python3.lark000066400000000000000000000156341361215331400165560ustar00rootroot00000000000000// Python 3 grammar for Lark // NOTE: Work in progress!!! (XXX TODO) // This grammar should parse all python 3.x code successfully, // but the resulting parse-tree is still not well-organized. // Adapted from: https://docs.python.org/3/reference/grammar.html // Adapted by: Erez Shinan // Start symbols for the grammar: // single_input is a single interactive statement; // file_input is a module or sequence of commands read from an input file; // eval_input is the input for the eval() functions. // NB: compound_stmt in single_input is followed by extra NEWLINE! single_input: _NEWLINE | simple_stmt | compound_stmt _NEWLINE file_input: (_NEWLINE | stmt)* eval_input: testlist _NEWLINE* decorator: "@" dotted_name [ "(" [arguments] ")" ] _NEWLINE decorators: decorator+ decorated: decorators (classdef | funcdef | async_funcdef) async_funcdef: "async" funcdef funcdef: "def" NAME "(" parameters? ")" ["->" test] ":" suite parameters: paramvalue ("," paramvalue)* ["," [ starparams | kwparams]] | starparams | kwparams starparams: "*" typedparam? ("," paramvalue)* ["," kwparams] kwparams: "**" typedparam ?paramvalue: typedparam ["=" test] ?typedparam: NAME [":" test] varargslist: (vfpdef ["=" test] ("," vfpdef ["=" test])* ["," [ "*" [vfpdef] ("," vfpdef ["=" test])* ["," ["**" vfpdef [","]]] | "**" vfpdef [","]]] | "*" [vfpdef] ("," vfpdef ["=" test])* ["," ["**" vfpdef [","]]] | "**" vfpdef [","]) vfpdef: NAME ?stmt: simple_stmt | compound_stmt ?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE ?small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt) ?expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) | ("=" (yield_expr|testlist_star_expr))*) annassign: ":" test ["=" test] ?testlist_star_expr: (test|star_expr) ("," (test|star_expr))* [","] !augassign: ("+=" | "-=" | "*=" | "@=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" | "**=" | "//=") // For normal and annotated assignments, additional restrictions enforced by the interpreter del_stmt: "del" exprlist pass_stmt: "pass" ?flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt break_stmt: "break" continue_stmt: "continue" return_stmt: "return" [testlist] yield_stmt: yield_expr raise_stmt: "raise" [test ["from" test]] import_stmt: import_name | import_from import_name: "import" dotted_as_names // note below: the ("." | "...") is necessary because "..." is tokenized as ELLIPSIS import_from: "from" (dots? dotted_name | dots) "import" ("*" | "(" import_as_names ")" | import_as_names) !dots: "."+ import_as_name: NAME ["as" NAME] dotted_as_name: dotted_name ["as" NAME] import_as_names: import_as_name ("," import_as_name)* [","] dotted_as_names: dotted_as_name ("," dotted_as_name)* dotted_name: NAME ("." NAME)* global_stmt: "global" NAME ("," NAME)* nonlocal_stmt: "nonlocal" NAME ("," NAME)* assert_stmt: "assert" test ["," test] compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated | async_stmt async_stmt: "async" (funcdef | with_stmt | for_stmt) if_stmt: "if" test ":" suite ("elif" test ":" suite)* ["else" ":" suite] while_stmt: "while" test ":" suite ["else" ":" suite] for_stmt: "for" exprlist "in" testlist ":" suite ["else" ":" suite] try_stmt: ("try" ":" suite ((except_clause ":" suite)+ ["else" ":" suite] ["finally" ":" suite] | "finally" ":" suite)) with_stmt: "with" with_item ("," with_item)* ":" suite with_item: test ["as" expr] // NB compile.c makes sure that the default except clause is last except_clause: "except" [test ["as" NAME]] suite: simple_stmt | _NEWLINE _INDENT stmt+ _DEDENT ?test: or_test ("if" or_test "else" test)? | lambdef ?test_nocond: or_test | lambdef_nocond lambdef: "lambda" [varargslist] ":" test lambdef_nocond: "lambda" [varargslist] ":" test_nocond ?or_test: and_test ("or" and_test)* ?and_test: not_test ("and" not_test)* ?not_test: "not" not_test -> not | comparison ?comparison: expr (_comp_op expr)* star_expr: "*" expr ?expr: xor_expr ("|" xor_expr)* ?xor_expr: and_expr ("^" and_expr)* ?and_expr: shift_expr ("&" shift_expr)* ?shift_expr: arith_expr (_shift_op arith_expr)* ?arith_expr: term (_add_op term)* ?term: factor (_mul_op factor)* ?factor: _factor_op factor | power !_factor_op: "+"|"-"|"~" !_add_op: "+"|"-" !_shift_op: "<<"|">>" !_mul_op: "*"|"@"|"/"|"%"|"//" // <> isn't actually a valid comparison operator in Python. It's here for the // sake of a __future__ import described in PEP 401 (which really works :-) !_comp_op: "<"|">"|"=="|">="|"<="|"<>"|"!="|"in"|"not" "in"|"is"|"is" "not" ?power: await_expr ("**" factor)? ?await_expr: AWAIT? atom_expr AWAIT: "await" ?atom_expr: atom_expr "(" [arguments] ")" -> funccall | atom_expr "[" subscriptlist "]" -> getitem | atom_expr "." NAME -> getattr | atom ?atom: "(" [yield_expr|testlist_comp] ")" -> tuple | "[" [testlist_comp] "]" -> list | "{" [dictorsetmaker] "}" -> dict | NAME -> var | number | string+ | "(" test ")" | "..." -> ellipsis | "None" -> const_none | "True" -> const_true | "False" -> const_false ?testlist_comp: (test|star_expr) [comp_for | ("," (test|star_expr))+ [","] | ","] subscriptlist: subscript ("," subscript)* [","] subscript: test | [test] ":" [test] [sliceop] sliceop: ":" [test] exprlist: (expr|star_expr) ("," (expr|star_expr))* [","] testlist: test ("," test)* [","] dictorsetmaker: ( ((test ":" test | "**" expr) (comp_for | ("," (test ":" test | "**" expr))* [","])) | ((test | star_expr) (comp_for | ("," (test | star_expr))* [","])) ) classdef: "class" NAME ["(" [arguments] ")"] ":" suite arguments: argvalue ("," argvalue)* ("," [ starargs | kwargs])? | starargs | kwargs | test comp_for starargs: "*" test ("," "*" test)* ("," argvalue)* ["," kwargs] kwargs: "**" test ?argvalue: test ("=" test)? comp_iter: comp_for | comp_if | async_for async_for: "async" "for" exprlist "in" or_test [comp_iter] comp_for: "for" exprlist "in" or_test [comp_iter] comp_if: "if" test_nocond [comp_iter] // not used in grammar, but may appear in "node" passed from Parser to Compiler encoding_decl: NAME yield_expr: "yield" [yield_arg] yield_arg: "from" test | testlist number: DEC_NUMBER | HEX_NUMBER | BIN_NUMBER | OCT_NUMBER | FLOAT_NUMBER | IMAG_NUMBER string: STRING | LONG_STRING // Tokens NAME: /[a-zA-Z_]\w*/ COMMENT: /#[^\n]*/ _NEWLINE: ( /\r?\n[\t ]*/ | COMMENT )+ STRING : /[ubf]?r?("(?!"").*?(? STRING %import common.SIGNED_NUMBER -> NUMBER %import common.WS %ignore WS ''' self.lark = Lark(grammar, parser=None, lexer='standard') # All tokens: print([t.name for t in self.lark.parser.lexer.tokens]) def defaultPaper(self, style): return QColor(39, 40, 34) def language(self): return "Json" def description(self, style): return {v: k for k, v in self.token_styles.items()}.get(style, "") def styleText(self, start, end): self.startStyling(start) text = self.parent().text()[start:end] last_pos = 0 try: for token in self.lark.lex(text): ws_len = token.pos_in_stream - last_pos if ws_len: self.setStyling(ws_len, 0) # whitespace token_len = len(bytearray(token, "utf-8")) self.setStyling( token_len, self.token_styles.get(token.type, 0)) last_pos = token.pos_in_stream + token_len except Exception as e: print(e) class EditorAll(QsciScintilla): def __init__(self, parent=None): super().__init__(parent) # Set font defaults font = QFont() font.setFamily('Consolas') font.setFixedPitch(True) font.setPointSize(8) font.setBold(True) self.setFont(font) # Set margin defaults fontmetrics = QFontMetrics(font) self.setMarginsFont(font) self.setMarginWidth(0, fontmetrics.width("000") + 6) self.setMarginLineNumbers(0, True) self.setMarginsForegroundColor(QColor(128, 128, 128)) self.setMarginsBackgroundColor(QColor(39, 40, 34)) self.setMarginType(1, self.SymbolMargin) self.setMarginWidth(1, 12) # Set indentation defaults self.setIndentationsUseTabs(False) self.setIndentationWidth(4) self.setBackspaceUnindents(True) self.setIndentationGuides(True) # self.setFolding(QsciScintilla.CircledFoldStyle) # Set caret defaults self.setCaretForegroundColor(QColor(247, 247, 241)) self.setCaretWidth(2) # Set selection color defaults self.setSelectionBackgroundColor(QColor(61, 61, 52)) self.resetSelectionForegroundColor() # Set multiselection defaults self.SendScintilla(QsciScintilla.SCI_SETMULTIPLESELECTION, True) self.SendScintilla(QsciScintilla.SCI_SETMULTIPASTE, 1) self.SendScintilla( QsciScintilla.SCI_SETADDITIONALSELECTIONTYPING, True) lexer = LexerJson(self) self.setLexer(lexer) EXAMPLE_TEXT = textwrap.dedent("""\ { "_id": "5b05ffcbcf8e597939b3f5ca", "about": "Excepteur consequat commodo esse voluptate aute aliquip ad sint deserunt commodo eiusmod irure. Sint aliquip sit magna duis eu est culpa aliqua excepteur ut tempor nulla. Aliqua ex pariatur id labore sit. Quis sit ex aliqua veniam exercitation laboris anim adipisicing. Lorem nisi reprehenderit ullamco labore qui sit ut aliqua tempor consequat pariatur proident.", "address": "665 Malbone Street, Thornport, Louisiana, 243", "age": 23, "balance": "$3,216.91", "company": "BULLJUICE", "email": "elisekelley@bulljuice.com", "eyeColor": "brown", "gender": "female", "guid": "d3a6d865-0f64-4042-8a78-4f53de9b0707", "index": 0, "isActive": false, "isActive2": true, "latitude": -18.660714, "longitude": -85.378048, "name": "Elise Kelley", "phone": "+1 (808) 543-3966", "picture": "http://placehold.it/32x32", "registered": "2017-09-30T03:47:40 -02:00", "tags": [ "et", "nostrud", "in", "fugiat", "incididunt", "labore", "nostrud" ] }\ """) def main(): app = QApplication(sys.argv) ex = EditorAll() ex.setWindowTitle(__file__) ex.setText(EXAMPLE_TEXT) ex.resize(800, 600) ex.show() sys.exit(app.exec_()) if __name__ == "__main__": main() lark-0.8.1/examples/reconstruct_json.py000066400000000000000000000025361361215331400202520ustar00rootroot00000000000000# # This example demonstrates an experimental feature: Text reconstruction # The Reconstructor takes a parse tree (already filtered from punctuation, of course), # and reconstructs it into correct text, that can be parsed correctly. # It can be useful for creating "hooks" to alter data before handing it to other parsers. You can also use it to generate samples from scratch. # import json from lark import Lark from lark.reconstruct import Reconstructor from .json_parser import json_grammar test_json = ''' { "empty_object" : {}, "empty_array" : [], "booleans" : { "YES" : true, "NO" : false }, "numbers" : [ 0, 1, -2, 3.3, 4.4e5, 6.6e-7 ], "strings" : [ "This", [ "And" , "That", "And a \\"b" ] ], "nothing" : null } ''' def test_earley(): json_parser = Lark(json_grammar, maybe_placeholders=False) tree = json_parser.parse(test_json) new_json = Reconstructor(json_parser).reconstruct(tree) print (new_json) print (json.loads(new_json) == json.loads(test_json)) def test_lalr(): json_parser = Lark(json_grammar, parser='lalr', maybe_placeholders=False) tree = json_parser.parse(test_json) new_json = Reconstructor(json_parser).reconstruct(tree) print (new_json) print (json.loads(new_json) == json.loads(test_json)) test_earley() test_lalr() lark-0.8.1/examples/relative-imports/000077500000000000000000000000001361215331400175745ustar00rootroot00000000000000lark-0.8.1/examples/relative-imports/multiple2.lark000066400000000000000000000000301361215331400223550ustar00rootroot00000000000000start: ("0" | "1")* "0" lark-0.8.1/examples/relative-imports/multiple3.lark000066400000000000000000000001621361215331400223640ustar00rootroot00000000000000start: mod0mod0+ mod0mod0: "0" | "1" mod1mod0 mod1mod0: "1" | "0" mod2mod1 mod1mod0 mod2mod1: "0" | "1" mod2mod1 lark-0.8.1/examples/relative-imports/multiples.lark000066400000000000000000000001711361215331400224640ustar00rootroot00000000000000start: "2:" multiple2 | "3:" multiple3 %import .multiple2.start -> multiple2 %import .multiple3.start -> multiple3 lark-0.8.1/examples/relative-imports/multiples.py000066400000000000000000000012751361215331400221710ustar00rootroot00000000000000# # This example demonstrates relative imports with rule rewrite # see multiples.lark # # # if b is a number written in binary, and m is either 2 or 3, # the grammar aims to recognise m:b iif b is a multiple of m # # for example, 3:1001 is recognised # because 9 (0b1001) is a multiple of 3 # from lark import Lark, UnexpectedInput parser = Lark.open('multiples.lark', parser='lalr') def is_in_grammar(data): try: parser.parse(data) except UnexpectedInput: return False return True for n_dec in range(100): n_bin = bin(n_dec)[2:] assert is_in_grammar('2:{}'.format(n_bin)) == (n_dec % 2 == 0) assert is_in_grammar('3:{}'.format(n_bin)) == (n_dec % 3 == 0) lark-0.8.1/examples/standalone/000077500000000000000000000000001361215331400164165ustar00rootroot00000000000000lark-0.8.1/examples/standalone/create_standalone.sh000077500000000000000000000001261361215331400224270ustar00rootroot00000000000000#!/bin/sh PYTHONPATH=../.. python -m lark.tools.standalone json.lark > json_parser.py lark-0.8.1/examples/standalone/json.lark000066400000000000000000000006561361215331400202510ustar00rootroot00000000000000?start: value ?value: object | array | string | SIGNED_NUMBER -> number | "true" -> true | "false" -> false | "null" -> null array : "[" [value ("," value)*] "]" object : "{" [pair ("," pair)*] "}" pair : string ":" value string : ESCAPED_STRING %import common.ESCAPED_STRING %import common.SIGNED_NUMBER %import common.WS %ignore WS lark-0.8.1/examples/standalone/json_parser.py000066400000000000000000002412631361215331400213250ustar00rootroot00000000000000# The file was automatically generated by Lark v0.8.0 # # # Lark Stand-alone Generator Tool # ---------------------------------- # Generates a stand-alone LALR(1) parser with a standard lexer # # Git: https://github.com/erezsh/lark # Author: Erez Shinan (erezshin@gmail.com) # # # >>> LICENSE # # This tool and its generated code use a separate license from Lark. # # It is licensed under GPLv2 or above. # # If you wish to purchase a commercial license for this tool and its # generated code, contact me via email. # # If GPL is incompatible with your free or open-source project, # contact me and we'll work it out (for free). # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # See . # # import os from io import open class LarkError(Exception): pass class GrammarError(LarkError): pass class ParseError(LarkError): pass class LexError(LarkError): pass class UnexpectedEOF(ParseError): def __init__(self, expected): self.expected = expected message = ("Unexpected end-of-input. Expected one of: \n\t* %s\n" % '\n\t* '.join(x.name for x in self.expected)) super(UnexpectedEOF, self).__init__(message) class UnexpectedInput(LarkError): pos_in_stream = None def get_context(self, text, span=40): pos = self.pos_in_stream start = max(pos - span, 0) end = pos + span before = text[start:pos].rsplit('\n', 1)[-1] after = text[pos:end].split('\n', 1)[0] return before + after + '\n' + ' ' * len(before) + '^\n' def match_examples(self, parse_fn, examples): """ Given a parser instance and a dictionary mapping some label with some malformed syntax examples, it'll return the label for the example that bests matches the current error. """ assert self.state is not None, "Not supported for this exception" candidate = None for label, example in examples.items(): assert not isinstance(example, STRING_TYPE) for malformed in example: try: parse_fn(malformed) except UnexpectedInput as ut: if ut.state == self.state: try: if ut.token == self.token: # Try exact match first return label except AttributeError: pass if not candidate: candidate = label return candidate class UnexpectedCharacters(LexError, UnexpectedInput): def __init__(self, seq, lex_pos, line, column, allowed=None, considered_tokens=None, state=None, token_history=None): message = "No terminal defined for '%s' at line %d col %d" % (seq[lex_pos], line, column) self.line = line self.column = column self.allowed = allowed self.considered_tokens = considered_tokens self.pos_in_stream = lex_pos self.state = state message += '\n\n' + self.get_context(seq) if allowed: message += '\nExpecting: %s\n' % allowed if token_history: message += '\nPrevious tokens: %s\n' % ', '.join(repr(t) for t in token_history) super(UnexpectedCharacters, self).__init__(message) class UnexpectedToken(ParseError, UnexpectedInput): def __init__(self, token, expected, considered_rules=None, state=None): self.token = token self.expected = expected # XXX str shouldn't necessary self.line = getattr(token, 'line', '?') self.column = getattr(token, 'column', '?') self.considered_rules = considered_rules self.state = state self.pos_in_stream = getattr(token, 'pos_in_stream', None) message = ("Unexpected token %r at line %s, column %s.\n" "Expected one of: \n\t* %s\n" % (token, self.line, self.column, '\n\t* '.join(self.expected))) super(UnexpectedToken, self).__init__(message) class VisitError(LarkError): def __init__(self, rule, obj, orig_exc): self.obj = obj self.orig_exc = orig_exc message = 'Error trying to process rule "%s":\n\n%s' % (rule, orig_exc) super(VisitError, self).__init__(message) def classify(seq, key=None, value=None): d = {} for item in seq: k = key(item) if (key is not None) else item v = value(item) if (value is not None) else item if k in d: d[k].append(v) else: d[k] = [v] return d def _deserialize(data, namespace, memo): if isinstance(data, dict): if '__type__' in data: # Object class_ = namespace[data['__type__']] return class_.deserialize(data, memo) elif '@' in data: return memo[data['@']] return {key:_deserialize(value, namespace, memo) for key, value in data.items()} elif isinstance(data, list): return [_deserialize(value, namespace, memo) for value in data] return data class Serialize(object): def memo_serialize(self, types_to_memoize): memo = SerializeMemoizer(types_to_memoize) return self.serialize(memo), memo.serialize() def serialize(self, memo=None): if memo and memo.in_types(self): return {'@': memo.memoized.get(self)} fields = getattr(self, '__serialize_fields__') res = {f: _serialize(getattr(self, f), memo) for f in fields} res['__type__'] = type(self).__name__ postprocess = getattr(self, '_serialize', None) if postprocess: postprocess(res, memo) return res @classmethod def deserialize(cls, data, memo): namespace = getattr(cls, '__serialize_namespace__', {}) namespace = {c.__name__:c for c in namespace} fields = getattr(cls, '__serialize_fields__') if '@' in data: return memo[data['@']] inst = cls.__new__(cls) for f in fields: try: setattr(inst, f, _deserialize(data[f], namespace, memo)) except KeyError as e: raise KeyError("Cannot find key for class", cls, e) postprocess = getattr(inst, '_deserialize', None) if postprocess: postprocess() return inst class SerializeMemoizer(Serialize): __serialize_fields__ = 'memoized', def __init__(self, types_to_memoize): self.types_to_memoize = tuple(types_to_memoize) self.memoized = Enumerator() def in_types(self, value): return isinstance(value, self.types_to_memoize) def serialize(self): return _serialize(self.memoized.reversed(), None) @classmethod def deserialize(cls, data, namespace, memo): return _deserialize(data, namespace, memo) try: STRING_TYPE = basestring except NameError: # Python 3 STRING_TYPE = str import types from functools import wraps, partial from contextlib import contextmanager Str = type(u'') try: classtype = types.ClassType # Python2 except AttributeError: classtype = type # Python3 def smart_decorator(f, create_decorator): if isinstance(f, types.FunctionType): return wraps(f)(create_decorator(f, True)) elif isinstance(f, (classtype, type, types.BuiltinFunctionType)): return wraps(f)(create_decorator(f, False)) elif isinstance(f, types.MethodType): return wraps(f)(create_decorator(f.__func__, True)) elif isinstance(f, partial): # wraps does not work for partials in 2.7: https://bugs.python.org/issue3445 return wraps(f.func)(create_decorator(lambda *args, **kw: f(*args[1:], **kw), True)) else: return create_decorator(f.__func__.__call__, True) import sys, re Py36 = (sys.version_info[:2] >= (3, 6)) import sre_parse import sre_constants def get_regexp_width(regexp): try: return [int(x) for x in sre_parse.parse(regexp).getwidth()] except sre_constants.error: raise ValueError(regexp) class Meta: def __init__(self): self.empty = True class Tree(object): def __init__(self, data, children, meta=None): self.data = data self.children = children self._meta = meta @property def meta(self): if self._meta is None: self._meta = Meta() return self._meta def __repr__(self): return 'Tree(%s, %s)' % (self.data, self.children) def _pretty_label(self): return self.data def _pretty(self, level, indent_str): if len(self.children) == 1 and not isinstance(self.children[0], Tree): return [ indent_str*level, self._pretty_label(), '\t', '%s' % (self.children[0],), '\n'] l = [ indent_str*level, self._pretty_label(), '\n' ] for n in self.children: if isinstance(n, Tree): l += n._pretty(level+1, indent_str) else: l += [ indent_str*(level+1), '%s' % (n,), '\n' ] return l def pretty(self, indent_str=' '): return ''.join(self._pretty(0, indent_str)) def __eq__(self, other): try: return self.data == other.data and self.children == other.children except AttributeError: return False def __ne__(self, other): return not (self == other) def __hash__(self): return hash((self.data, tuple(self.children))) def iter_subtrees(self): # TODO: Re-write as a more efficient version visited = set() q = [self] l = [] while q: subtree = q.pop() l.append( subtree ) if id(subtree) in visited: continue # already been here from another branch visited.add(id(subtree)) q += [c for c in subtree.children if isinstance(c, Tree)] seen = set() for x in reversed(l): if id(x) not in seen: yield x seen.add(id(x)) def find_pred(self, pred): "Find all nodes where pred(tree) == True" return filter(pred, self.iter_subtrees()) def find_data(self, data): "Find all nodes where tree.data == data" return self.find_pred(lambda t: t.data == data) from inspect import getmembers, getmro class Discard(Exception): pass # Transformers class Transformer: """Visits the tree recursively, starting with the leaves and finally the root (bottom-up) Calls its methods (provided by user via inheritance) according to tree.data The returned value replaces the old one in the structure. Can be used to implement map or reduce. """ __visit_tokens__ = True # For backwards compatibility def __init__(self, visit_tokens=True): self.__visit_tokens__ = visit_tokens def _call_userfunc(self, tree, new_children=None): # Assumes tree is already transformed children = new_children if new_children is not None else tree.children try: f = getattr(self, tree.data) except AttributeError: return self.__default__(tree.data, children, tree.meta) else: try: wrapper = getattr(f, 'visit_wrapper', None) if wrapper is not None: return f.visit_wrapper(f, tree.data, children, tree.meta) else: return f(children) except (GrammarError, Discard): raise except Exception as e: raise VisitError(tree.data, tree, e) def _call_userfunc_token(self, token): try: f = getattr(self, token.type) except AttributeError: return self.__default_token__(token) else: try: return f(token) except (GrammarError, Discard): raise except Exception as e: raise VisitError(token.type, token, e) def _transform_children(self, children): for c in children: try: if isinstance(c, Tree): yield self._transform_tree(c) elif self.__visit_tokens__ and isinstance(c, Token): yield self._call_userfunc_token(c) else: yield c except Discard: pass def _transform_tree(self, tree): children = list(self._transform_children(tree.children)) return self._call_userfunc(tree, children) def transform(self, tree): return self._transform_tree(tree) def __mul__(self, other): return TransformerChain(self, other) def __default__(self, data, children, meta): "Default operation on tree (for override)" return Tree(data, children, meta) def __default_token__(self, token): "Default operation on token (for override)" return token @classmethod def _apply_decorator(cls, decorator, **kwargs): mro = getmro(cls) assert mro[0] is cls libmembers = {name for _cls in mro[1:] for name, _ in getmembers(_cls)} for name, value in getmembers(cls): # Make sure the function isn't inherited (unless it's overwritten) if name.startswith('_') or (name in libmembers and name not in cls.__dict__): continue if not callable(cls.__dict__[name]): continue # Skip if v_args already applied (at the function level) if hasattr(cls.__dict__[name], 'vargs_applied'): continue static = isinstance(cls.__dict__[name], (staticmethod, classmethod)) setattr(cls, name, decorator(value, static=static, **kwargs)) return cls class InlineTransformer(Transformer): # XXX Deprecated def _call_userfunc(self, tree, new_children=None): # Assumes tree is already transformed children = new_children if new_children is not None else tree.children try: f = getattr(self, tree.data) except AttributeError: return self.__default__(tree.data, children, tree.meta) else: return f(*children) class TransformerChain(object): def __init__(self, *transformers): self.transformers = transformers def transform(self, tree): for t in self.transformers: tree = t.transform(tree) return tree def __mul__(self, other): return TransformerChain(*self.transformers + (other,)) class Transformer_InPlace(Transformer): "Non-recursive. Changes the tree in-place instead of returning new instances" def _transform_tree(self, tree): # Cancel recursion return self._call_userfunc(tree) def transform(self, tree): for subtree in tree.iter_subtrees(): subtree.children = list(self._transform_children(subtree.children)) return self._transform_tree(tree) class Transformer_InPlaceRecursive(Transformer): "Recursive. Changes the tree in-place instead of returning new instances" def _transform_tree(self, tree): tree.children = list(self._transform_children(tree.children)) return self._call_userfunc(tree) # Visitors class VisitorBase: def _call_userfunc(self, tree): return getattr(self, tree.data, self.__default__)(tree) def __default__(self, tree): "Default operation on tree (for override)" return tree class Visitor(VisitorBase): """Bottom-up visitor, non-recursive Visits the tree, starting with the leaves and finally the root (bottom-up) Calls its methods (provided by user via inheritance) according to tree.data """ def visit(self, tree): for subtree in tree.iter_subtrees(): self._call_userfunc(subtree) return tree def visit_topdown(self,tree): for subtree in tree.iter_subtrees_topdown(): self._call_userfunc(subtree) return tree class Visitor_Recursive(VisitorBase): """Bottom-up visitor, recursive Visits the tree, starting with the leaves and finally the root (bottom-up) Calls its methods (provided by user via inheritance) according to tree.data """ def visit(self, tree): for child in tree.children: if isinstance(child, Tree): self.visit(child) self._call_userfunc(tree) return tree def visit_topdown(self,tree): self._call_userfunc(tree) for child in tree.children: if isinstance(child, Tree): self.visit_topdown(child) return tree def visit_children_decor(func): "See Interpreter" @wraps(func) def inner(cls, tree): values = cls.visit_children(tree) return func(cls, values) return inner class Interpreter: """Top-down visitor, recursive Visits the tree, starting with the root and finally the leaves (top-down) Calls its methods (provided by user via inheritance) according to tree.data Unlike Transformer and Visitor, the Interpreter doesn't automatically visit its sub-branches. The user has to explicitly call visit_children, or use the @visit_children_decor """ def visit(self, tree): return getattr(self, tree.data)(tree) def visit_children(self, tree): return [self.visit(child) if isinstance(child, Tree) else child for child in tree.children] def __getattr__(self, name): return self.__default__ def __default__(self, tree): return self.visit_children(tree) # Decorators def _apply_decorator(obj, decorator, **kwargs): try: _apply = obj._apply_decorator except AttributeError: return decorator(obj, **kwargs) else: return _apply(decorator, **kwargs) def _inline_args__func(func): @wraps(func) def create_decorator(_f, with_self): if with_self: def f(self, children): return _f(self, *children) else: def f(self, children): return _f(*children) return f return smart_decorator(func, create_decorator) def inline_args(obj): # XXX Deprecated return _apply_decorator(obj, _inline_args__func) def _visitor_args_func_dec(func, visit_wrapper=None, static=False): def create_decorator(_f, with_self): if with_self: def f(self, *args, **kwargs): return _f(self, *args, **kwargs) else: def f(self, *args, **kwargs): return _f(*args, **kwargs) return f if static: f = wraps(func)(create_decorator(func, False)) else: f = smart_decorator(func, create_decorator) f.vargs_applied = True f.visit_wrapper = visit_wrapper return f def _vargs_inline(f, data, children, meta): return f(*children) def _vargs_meta_inline(f, data, children, meta): return f(meta, *children) def _vargs_meta(f, data, children, meta): return f(children, meta) # TODO swap these for consistency? Backwards incompatible! def _vargs_tree(f, data, children, meta): return f(Tree(data, children, meta)) def v_args(inline=False, meta=False, tree=False, wrapper=None): "A convenience decorator factory, for modifying the behavior of user-supplied visitor methods" if tree and (meta or inline): raise ValueError("Visitor functions cannot combine 'tree' with 'meta' or 'inline'.") func = None if meta: if inline: func = _vargs_meta_inline else: func = _vargs_meta elif inline: func = _vargs_inline elif tree: func = _vargs_tree if wrapper is not None: if func is not None: raise ValueError("Cannot use 'wrapper' along with 'tree', 'meta' or 'inline'.") func = wrapper def _visitor_args_dec(obj): return _apply_decorator(obj, _visitor_args_func_dec, visit_wrapper=func) return _visitor_args_dec class Indenter: def __init__(self): self.paren_level = None self.indent_level = None assert self.tab_len > 0 def handle_NL(self, token): if self.paren_level > 0: return yield token indent_str = token.rsplit('\n', 1)[1] # Tabs and spaces indent = indent_str.count(' ') + indent_str.count('\t') * self.tab_len if indent > self.indent_level[-1]: self.indent_level.append(indent) yield Token.new_borrow_pos(self.INDENT_type, indent_str, token) else: while indent < self.indent_level[-1]: self.indent_level.pop() yield Token.new_borrow_pos(self.DEDENT_type, indent_str, token) assert indent == self.indent_level[-1], '%s != %s' % (indent, self.indent_level[-1]) def _process(self, stream): for token in stream: if token.type == self.NL_type: for t in self.handle_NL(token): yield t else: yield token if token.type in self.OPEN_PAREN_types: self.paren_level += 1 elif token.type in self.CLOSE_PAREN_types: self.paren_level -= 1 assert self.paren_level >= 0 while len(self.indent_level) > 1: self.indent_level.pop() yield Token(self.DEDENT_type, '') assert self.indent_level == [0], self.indent_level def process(self, stream): self.paren_level = 0 self.indent_level = [0] return self._process(stream) # XXX Hack for ContextualLexer. Maybe there's a more elegant solution? @property def always_accept(self): return (self.NL_type,) class Symbol(Serialize): __slots__ = ('name',) is_term = NotImplemented def __init__(self, name): self.name = name def __eq__(self, other): assert isinstance(other, Symbol), other return self.is_term == other.is_term and self.name == other.name def __ne__(self, other): return not (self == other) def __hash__(self): return hash(self.name) def __repr__(self): return '%s(%r)' % (type(self).__name__, self.name) fullrepr = property(__repr__) class Terminal(Symbol): __serialize_fields__ = 'name', 'filter_out' is_term = True def __init__(self, name, filter_out=False): self.name = name self.filter_out = filter_out @property def fullrepr(self): return '%s(%r, %r)' % (type(self).__name__, self.name, self.filter_out) class NonTerminal(Symbol): __serialize_fields__ = 'name', is_term = False class RuleOptions(Serialize): __serialize_fields__ = 'keep_all_tokens', 'expand1', 'priority', 'empty_indices' def __init__(self, keep_all_tokens=False, expand1=False, priority=None, empty_indices=()): self.keep_all_tokens = keep_all_tokens self.expand1 = expand1 self.priority = priority self.empty_indices = empty_indices def __repr__(self): return 'RuleOptions(%r, %r, %r)' % ( self.keep_all_tokens, self.expand1, self.priority, ) class Rule(Serialize): """ origin : a symbol expansion : a list of symbols order : index of this expansion amongst all rules of the same name """ __slots__ = ('origin', 'expansion', 'alias', 'options', 'order', '_hash') __serialize_fields__ = 'origin', 'expansion', 'order', 'alias', 'options' __serialize_namespace__ = Terminal, NonTerminal, RuleOptions def __init__(self, origin, expansion, order=0, alias=None, options=None): self.origin = origin self.expansion = expansion self.alias = alias self.order = order self.options = options or RuleOptions() self._hash = hash((self.origin, tuple(self.expansion))) def _deserialize(self): self._hash = hash((self.origin, tuple(self.expansion))) def __str__(self): return '<%s : %s>' % (self.origin.name, ' '.join(x.name for x in self.expansion)) def __repr__(self): return 'Rule(%r, %r, %r, %r)' % (self.origin, self.expansion, self.alias, self.options) def __hash__(self): return self._hash def __eq__(self, other): if not isinstance(other, Rule): return False return self.origin == other.origin and self.expansion == other.expansion class Pattern(Serialize): def __init__(self, value, flags=()): self.value = value self.flags = frozenset(flags) def __repr__(self): return repr(self.to_regexp()) # Pattern Hashing assumes all subclasses have a different priority! def __hash__(self): return hash((type(self), self.value, self.flags)) def __eq__(self, other): return type(self) == type(other) and self.value == other.value and self.flags == other.flags def to_regexp(self): raise NotImplementedError() if Py36: # Python 3.6 changed syntax for flags in regular expression def _get_flags(self, value): for f in self.flags: value = ('(?%s:%s)' % (f, value)) return value else: def _get_flags(self, value): for f in self.flags: value = ('(?%s)' % f) + value return value class PatternStr(Pattern): __serialize_fields__ = 'value', 'flags' type = "str" def to_regexp(self): return self._get_flags(re.escape(self.value)) @property def min_width(self): return len(self.value) max_width = min_width class PatternRE(Pattern): __serialize_fields__ = 'value', 'flags', '_width' type = "re" def to_regexp(self): return self._get_flags(self.value) _width = None def _get_width(self): if self._width is None: self._width = get_regexp_width(self.to_regexp()) return self._width @property def min_width(self): return self._get_width()[0] @property def max_width(self): return self._get_width()[1] class TerminalDef(Serialize): __serialize_fields__ = 'name', 'pattern', 'priority' __serialize_namespace__ = PatternStr, PatternRE def __init__(self, name, pattern, priority=1): assert isinstance(pattern, Pattern), pattern self.name = name self.pattern = pattern self.priority = priority def __repr__(self): return '%s(%r, %r)' % (type(self).__name__, self.name, self.pattern) class Token(Str): __slots__ = ('type', 'pos_in_stream', 'value', 'line', 'column', 'end_line', 'end_column', 'end_pos') def __new__(cls, type_, value, pos_in_stream=None, line=None, column=None, end_line=None, end_column=None, end_pos=None): try: self = super(Token, cls).__new__(cls, value) except UnicodeDecodeError: value = value.decode('latin1') self = super(Token, cls).__new__(cls, value) self.type = type_ self.pos_in_stream = pos_in_stream self.value = value self.line = line self.column = column self.end_line = end_line self.end_column = end_column self.end_pos = end_pos return self def update(self, type_=None, value=None): return Token.new_borrow_pos( type_ if type_ is not None else self.type, value if value is not None else self.value, self ) @classmethod def new_borrow_pos(cls, type_, value, borrow_t): return cls(type_, value, borrow_t.pos_in_stream, borrow_t.line, borrow_t.column, borrow_t.end_line, borrow_t.end_column, borrow_t.end_pos) def __reduce__(self): return (self.__class__, (self.type, self.value, self.pos_in_stream, self.line, self.column, )) def __repr__(self): return 'Token(%s, %r)' % (self.type, self.value) def __deepcopy__(self, memo): return Token(self.type, self.value, self.pos_in_stream, self.line, self.column) def __eq__(self, other): if isinstance(other, Token) and self.type != other.type: return False return Str.__eq__(self, other) __hash__ = Str.__hash__ class LineCounter: def __init__(self): self.newline_char = '\n' self.char_pos = 0 self.line = 1 self.column = 1 self.line_start_pos = 0 def feed(self, token, test_newline=True): """Consume a token and calculate the new line & column. As an optional optimization, set test_newline=False is token doesn't contain a newline. """ if test_newline: newlines = token.count(self.newline_char) if newlines: self.line += newlines self.line_start_pos = self.char_pos + token.rindex(self.newline_char) + 1 self.char_pos += len(token) self.column = self.char_pos - self.line_start_pos + 1 class _Lex: "Built to serve both Lexer and ContextualLexer" def __init__(self, lexer, state=None): self.lexer = lexer self.state = state def lex(self, stream, newline_types, ignore_types): newline_types = frozenset(newline_types) ignore_types = frozenset(ignore_types) line_ctr = LineCounter() last_token = None while line_ctr.char_pos < len(stream): lexer = self.lexer res = lexer.match(stream, line_ctr.char_pos) if not res: allowed = {v for m, tfi in lexer.mres for v in tfi.values()} - ignore_types if not allowed: allowed = {""} raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token]) value, type_ = res if type_ not in ignore_types: t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) line_ctr.feed(value, type_ in newline_types) t.end_line = line_ctr.line t.end_column = line_ctr.column t.end_pos = line_ctr.char_pos if t.type in lexer.callback: t = lexer.callback[t.type](t) if not isinstance(t, Token): raise ValueError("Callbacks must return a token (returned %r)" % t) yield t last_token = t else: if type_ in lexer.callback: t2 = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) lexer.callback[type_](t2) line_ctr.feed(value, type_ in newline_types) class UnlessCallback: def __init__(self, mres): self.mres = mres def __call__(self, t): for mre, type_from_index in self.mres: m = mre.match(t.value) if m: t.type = type_from_index[m.lastindex] break return t class CallChain: def __init__(self, callback1, callback2, cond): self.callback1 = callback1 self.callback2 = callback2 self.cond = cond def __call__(self, t): t2 = self.callback1(t) return self.callback2(t) if self.cond(t2) else t2 def _create_unless(terminals): tokens_by_type = classify(terminals, lambda t: type(t.pattern)) assert len(tokens_by_type) <= 2, tokens_by_type.keys() embedded_strs = set() callback = {} for retok in tokens_by_type.get(PatternRE, []): unless = [] # {} for strtok in tokens_by_type.get(PatternStr, []): if strtok.priority > retok.priority: continue s = strtok.pattern.value m = re.match(retok.pattern.to_regexp(), s) if m and m.group(0) == s: unless.append(strtok) if strtok.pattern.flags <= retok.pattern.flags: embedded_strs.add(strtok) if unless: callback[retok.name] = UnlessCallback(build_mres(unless, match_whole=True)) terminals = [t for t in terminals if t not in embedded_strs] return terminals, callback def _build_mres(terminals, max_size, match_whole): # Python sets an unreasonable group limit (currently 100) in its re module # Worse, the only way to know we reached it is by catching an AssertionError! # This function recursively tries less and less groups until it's successful. postfix = '$' if match_whole else '' mres = [] while terminals: try: mre = re.compile(u'|'.join(u'(?P<%s>%s)'%(t.name, t.pattern.to_regexp()+postfix) for t in terminals[:max_size])) except AssertionError: # Yes, this is what Python provides us.. :/ return _build_mres(terminals, max_size//2, match_whole) # terms_from_name = {t.name: t for t in terminals[:max_size]} mres.append((mre, {i:n for n,i in mre.groupindex.items()} )) terminals = terminals[max_size:] return mres def build_mres(terminals, match_whole=False): return _build_mres(terminals, len(terminals), match_whole) def _regexp_has_newline(r): r"""Expressions that may indicate newlines in a regexp: - newlines (\n) - escaped newline (\\n) - anything but ([^...]) - any-char (.) when the flag (?s) exists - spaces (\s) """ return '\n' in r or '\\n' in r or '\\s' in r or '[^' in r or ('(?s' in r and '.' in r) class Lexer(object): """Lexer interface Method Signatures: lex(self, stream) -> Iterator[Token] """ lex = NotImplemented class TraditionalLexer(Lexer): def __init__(self, terminals, ignore=(), user_callbacks={}): assert all(isinstance(t, TerminalDef) for t in terminals), terminals terminals = list(terminals) # Sanitization for t in terminals: try: re.compile(t.pattern.to_regexp()) except re.error: raise LexError("Cannot compile token %s: %s" % (t.name, t.pattern)) if t.pattern.min_width == 0: raise LexError("Lexer does not allow zero-width terminals. (%s: %s)" % (t.name, t.pattern)) assert set(ignore) <= {t.name for t in terminals} # Init self.newline_types = [t.name for t in terminals if _regexp_has_newline(t.pattern.to_regexp())] self.ignore_types = list(ignore) terminals.sort(key=lambda x:(-x.priority, -x.pattern.max_width, -len(x.pattern.value), x.name)) self.terminals = terminals self.user_callbacks = user_callbacks self.build() def build(self): terminals, self.callback = _create_unless(self.terminals) assert all(self.callback.values()) for type_, f in self.user_callbacks.items(): if type_ in self.callback: # Already a callback there, probably UnlessCallback self.callback[type_] = CallChain(self.callback[type_], f, lambda t: t.type == type_) else: self.callback[type_] = f self.mres = build_mres(terminals) def match(self, stream, pos): for mre, type_from_index in self.mres: m = mre.match(stream, pos) if m: return m.group(0), type_from_index[m.lastindex] def lex(self, stream): return _Lex(self).lex(stream, self.newline_types, self.ignore_types) class ContextualLexer(Lexer): def __init__(self, terminals, states, ignore=(), always_accept=(), user_callbacks={}): tokens_by_name = {} for t in terminals: assert t.name not in tokens_by_name, t tokens_by_name[t.name] = t lexer_by_tokens = {} self.lexers = {} for state, accepts in states.items(): key = frozenset(accepts) try: lexer = lexer_by_tokens[key] except KeyError: accepts = set(accepts) | set(ignore) | set(always_accept) state_tokens = [tokens_by_name[n] for n in accepts if n and n in tokens_by_name] lexer = TraditionalLexer(state_tokens, ignore=ignore, user_callbacks=user_callbacks) lexer_by_tokens[key] = lexer self.lexers[state] = lexer self.root_lexer = TraditionalLexer(terminals, ignore=ignore, user_callbacks=user_callbacks) def lex(self, stream, get_parser_state): parser_state = get_parser_state() l = _Lex(self.lexers[parser_state], parser_state) try: for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types): yield x parser_state = get_parser_state() l.lexer = self.lexers[parser_state] l.state = parser_state # For debug only, no need to worry about multithreading except UnexpectedCharacters as e: # In the contextual lexer, UnexpectedCharacters can mean that the terminal is defined, # but not in the current context. # This tests the input against the global context, to provide a nicer error. root_match = self.root_lexer.match(stream, e.pos_in_stream) if not root_match: raise value, type_ = root_match t = Token(type_, value, e.pos_in_stream, e.line, e.column) raise UnexpectedToken(t, e.allowed, state=e.state) class LexerConf(Serialize): __serialize_fields__ = 'tokens', 'ignore' __serialize_namespace__ = TerminalDef, def __init__(self, tokens, ignore=(), postlex=None, callbacks=None): self.tokens = tokens self.ignore = ignore self.postlex = postlex self.callbacks = callbacks or {} def _deserialize(self): self.callbacks = {} # TODO from functools import partial, wraps from itertools import repeat, product class ExpandSingleChild: def __init__(self, node_builder): self.node_builder = node_builder def __call__(self, children): if len(children) == 1: return children[0] else: return self.node_builder(children) class PropagatePositions: def __init__(self, node_builder): self.node_builder = node_builder def __call__(self, children): res = self.node_builder(children) if isinstance(res, Tree): for c in children: if isinstance(c, Tree) and not c.meta.empty: res.meta.line = c.meta.line res.meta.column = c.meta.column res.meta.start_pos = c.meta.start_pos res.meta.empty = False break elif isinstance(c, Token): res.meta.line = c.line res.meta.column = c.column res.meta.start_pos = c.pos_in_stream res.meta.empty = False break for c in reversed(children): if isinstance(c, Tree) and not c.meta.empty: res.meta.end_line = c.meta.end_line res.meta.end_column = c.meta.end_column res.meta.end_pos = c.meta.end_pos res.meta.empty = False break elif isinstance(c, Token): res.meta.end_line = c.end_line res.meta.end_column = c.end_column res.meta.end_pos = c.end_pos res.meta.empty = False break return res class ChildFilter: def __init__(self, to_include, append_none, node_builder): self.node_builder = node_builder self.to_include = to_include self.append_none = append_none def __call__(self, children): filtered = [] for i, to_expand, add_none in self.to_include: if add_none: filtered += [None] * add_none if to_expand: filtered += children[i].children else: filtered.append(children[i]) if self.append_none: filtered += [None] * self.append_none return self.node_builder(filtered) class ChildFilterLALR(ChildFilter): "Optimized childfilter for LALR (assumes no duplication in parse tree, so it's safe to change it)" def __call__(self, children): filtered = [] for i, to_expand, add_none in self.to_include: if add_none: filtered += [None] * add_none if to_expand: if filtered: filtered += children[i].children else: # Optimize for left-recursion filtered = children[i].children else: filtered.append(children[i]) if self.append_none: filtered += [None] * self.append_none return self.node_builder(filtered) class ChildFilterLALR_NoPlaceholders(ChildFilter): "Optimized childfilter for LALR (assumes no duplication in parse tree, so it's safe to change it)" def __init__(self, to_include, node_builder): self.node_builder = node_builder self.to_include = to_include def __call__(self, children): filtered = [] for i, to_expand in self.to_include: if to_expand: if filtered: filtered += children[i].children else: # Optimize for left-recursion filtered = children[i].children else: filtered.append(children[i]) return self.node_builder(filtered) def _should_expand(sym): return not sym.is_term and sym.name.startswith('_') def maybe_create_child_filter(expansion, keep_all_tokens, ambiguous, _empty_indices): # Prepare empty_indices as: How many Nones to insert at each index? if _empty_indices: assert _empty_indices.count(False) == len(expansion) s = ''.join(str(int(b)) for b in _empty_indices) empty_indices = [len(ones) for ones in s.split('0')] assert len(empty_indices) == len(expansion)+1, (empty_indices, len(expansion)) else: empty_indices = [0] * (len(expansion)+1) to_include = [] nones_to_add = 0 for i, sym in enumerate(expansion): nones_to_add += empty_indices[i] if keep_all_tokens or not (sym.is_term and sym.filter_out): to_include.append((i, _should_expand(sym), nones_to_add)) nones_to_add = 0 nones_to_add += empty_indices[len(expansion)] if _empty_indices or len(to_include) < len(expansion) or any(to_expand for i, to_expand,_ in to_include): if _empty_indices or ambiguous: return partial(ChildFilter if ambiguous else ChildFilterLALR, to_include, nones_to_add) else: # LALR without placeholders return partial(ChildFilterLALR_NoPlaceholders, [(i, x) for i,x,_ in to_include]) class AmbiguousExpander: """Deal with the case where we're expanding children ('_rule') into a parent but the children are ambiguous. i.e. (parent->_ambig->_expand_this_rule). In this case, make the parent itself ambiguous with as many copies as their are ambiguous children, and then copy the ambiguous children into the right parents in the right places, essentially shifting the ambiguiuty up the tree.""" def __init__(self, to_expand, tree_class, node_builder): self.node_builder = node_builder self.tree_class = tree_class self.to_expand = to_expand def __call__(self, children): def _is_ambig_tree(child): return hasattr(child, 'data') and child.data == '_ambig' #### When we're repeatedly expanding ambiguities we can end up with nested ambiguities. # All children of an _ambig node should be a derivation of that ambig node, hence # it is safe to assume that if we see an _ambig node nested within an ambig node # it is safe to simply expand it into the parent _ambig node as an alternative derivation. ambiguous = [] for i, child in enumerate(children): if _is_ambig_tree(child): if i in self.to_expand: ambiguous.append(i) to_expand = [j for j, grandchild in enumerate(child.children) if _is_ambig_tree(grandchild)] child.expand_kids_by_index(*to_expand) if not ambiguous: return self.node_builder(children) expand = [ iter(child.children) if i in ambiguous else repeat(child) for i, child in enumerate(children) ] return self.tree_class('_ambig', [self.node_builder(list(f[0])) for f in product(zip(*expand))]) def maybe_create_ambiguous_expander(tree_class, expansion, keep_all_tokens): to_expand = [i for i, sym in enumerate(expansion) if keep_all_tokens or ((not (sym.is_term and sym.filter_out)) and _should_expand(sym))] if to_expand: return partial(AmbiguousExpander, to_expand, tree_class) def ptb_inline_args(func): @wraps(func) def f(children): return func(*children) return f def inplace_transformer(func): @wraps(func) def f(children): # function name in a Transformer is a rule name. tree = Tree(func.__name__, children) return func(tree) return f def apply_visit_wrapper(func, name, wrapper): if wrapper is _vargs_meta or wrapper is _vargs_meta_inline: raise NotImplementedError("Meta args not supported for internal transformer") @wraps(func) def f(children): return wrapper(func, name, children, None) return f class ParseTreeBuilder: def __init__(self, rules, tree_class, propagate_positions=False, keep_all_tokens=False, ambiguous=False, maybe_placeholders=False): self.tree_class = tree_class self.propagate_positions = propagate_positions self.always_keep_all_tokens = keep_all_tokens self.ambiguous = ambiguous self.maybe_placeholders = maybe_placeholders self.rule_builders = list(self._init_builders(rules)) def _init_builders(self, rules): for rule in rules: options = rule.options keep_all_tokens = self.always_keep_all_tokens or options.keep_all_tokens expand_single_child = options.expand1 wrapper_chain = list(filter(None, [ (expand_single_child and not rule.alias) and ExpandSingleChild, maybe_create_child_filter(rule.expansion, keep_all_tokens, self.ambiguous, options.empty_indices if self.maybe_placeholders else None), self.propagate_positions and PropagatePositions, self.ambiguous and maybe_create_ambiguous_expander(self.tree_class, rule.expansion, keep_all_tokens), ])) yield rule, wrapper_chain def create_callback(self, transformer=None): callbacks = {} for rule, wrapper_chain in self.rule_builders: user_callback_name = rule.alias or rule.origin.name try: f = getattr(transformer, user_callback_name) # XXX InlineTransformer is deprecated! wrapper = getattr(f, 'visit_wrapper', None) if wrapper is not None: f = apply_visit_wrapper(f, user_callback_name, wrapper) else: if isinstance(transformer, InlineTransformer): f = ptb_inline_args(f) elif isinstance(transformer, Transformer_InPlace): f = inplace_transformer(f) except AttributeError: f = partial(self.tree_class, user_callback_name) for w in wrapper_chain: f = w(f) if rule in callbacks: raise GrammarError("Rule '%s' already exists" % (rule,)) callbacks[rule] = f return callbacks class LALR_Parser(object): def __init__(self, parser_conf, debug=False): assert all(r.options.priority is None for r in parser_conf.rules), "LALR doesn't yet support prioritization" analysis = LALR_Analyzer(parser_conf, debug=debug) analysis.compute_lalr() callbacks = parser_conf.callbacks self._parse_table = analysis.parse_table self.parser_conf = parser_conf self.parser = _Parser(analysis.parse_table, callbacks) @classmethod def deserialize(cls, data, memo, callbacks): inst = cls.__new__(cls) inst._parse_table = IntParseTable.deserialize(data, memo) inst.parser = _Parser(inst._parse_table, callbacks) return inst def serialize(self, memo): return self._parse_table.serialize(memo) def parse(self, *args): return self.parser.parse(*args) class _Parser: def __init__(self, parse_table, callbacks): self.states = parse_table.states self.start_states = parse_table.start_states self.end_states = parse_table.end_states self.callbacks = callbacks def parse(self, seq, start, set_state=None): token = None stream = iter(seq) states = self.states start_state = self.start_states[start] end_state = self.end_states[start] state_stack = [start_state] value_stack = [] if set_state: set_state(start_state) def get_action(token): state = state_stack[-1] try: return states[state][token.type] except KeyError: expected = [s for s in states[state].keys() if s.isupper()] raise UnexpectedToken(token, expected, state=state) def reduce(rule): size = len(rule.expansion) if size: s = value_stack[-size:] del state_stack[-size:] del value_stack[-size:] else: s = [] value = self.callbacks[rule](s) _action, new_state = states[state_stack[-1]][rule.origin.name] assert _action is Shift state_stack.append(new_state) value_stack.append(value) # Main LALR-parser loop for token in stream: while True: action, arg = get_action(token) assert arg != end_state if action is Shift: state_stack.append(arg) value_stack.append(token) if set_state: set_state(arg) break # next token else: reduce(arg) token = Token.new_borrow_pos('$END', '', token) if token else Token('$END', '', 0, 1, 1) while True: _action, arg = get_action(token) assert(_action is Reduce) reduce(arg) if state_stack[-1] == end_state: return value_stack[-1] class Action: def __init__(self, name): self.name = name def __str__(self): return self.name def __repr__(self): return str(self) Shift = Action('Shift') Reduce = Action('Reduce') class ParseTable: def __init__(self, states, start_states, end_states): self.states = states self.start_states = start_states self.end_states = end_states def serialize(self, memo): tokens = Enumerator() rules = Enumerator() states = { state: {tokens.get(token): ((1, arg.serialize(memo)) if action is Reduce else (0, arg)) for token, (action, arg) in actions.items()} for state, actions in self.states.items() } return { 'tokens': tokens.reversed(), 'states': states, 'start_states': self.start_states, 'end_states': self.end_states, } @classmethod def deserialize(cls, data, memo): tokens = data['tokens'] states = { state: {tokens[token]: ((Reduce, Rule.deserialize(arg, memo)) if action==1 else (Shift, arg)) for token, (action, arg) in actions.items()} for state, actions in data['states'].items() } return cls(states, data['start_states'], data['end_states']) class IntParseTable(ParseTable): @classmethod def from_ParseTable(cls, parse_table): enum = list(parse_table.states) state_to_idx = {s:i for i,s in enumerate(enum)} int_states = {} for s, la in parse_table.states.items(): la = {k:(v[0], state_to_idx[v[1]]) if v[0] is Shift else v for k,v in la.items()} int_states[ state_to_idx[s] ] = la start_states = {start:state_to_idx[s] for start, s in parse_table.start_states.items()} end_states = {start:state_to_idx[s] for start, s in parse_table.end_states.items()} return cls(int_states, start_states, end_states) def get_frontend(parser, lexer): if parser=='lalr': if lexer is None: raise ValueError('The LALR parser requires use of a lexer') elif lexer == 'standard': return LALR_TraditionalLexer elif lexer == 'contextual': return LALR_ContextualLexer elif issubclass(lexer, Lexer): return partial(LALR_CustomLexer, lexer) else: raise ValueError('Unknown lexer: %s' % lexer) elif parser=='earley': if lexer=='standard': return Earley elif lexer=='dynamic': return XEarley elif lexer=='dynamic_complete': return XEarley_CompleteLex elif lexer=='contextual': raise ValueError('The Earley parser does not support the contextual parser') else: raise ValueError('Unknown lexer: %s' % lexer) elif parser == 'cyk': if lexer == 'standard': return CYK else: raise ValueError('CYK parser requires using standard parser.') else: raise ValueError('Unknown parser: %s' % parser) class _ParserFrontend(Serialize): def _parse(self, input, start, *args): if start is None: start = self.start if len(start) > 1: raise ValueError("Lark initialized with more than 1 possible start rule. Must specify which start rule to parse", start) start ,= start return self.parser.parse(input, start, *args) class WithLexer(_ParserFrontend): lexer = None parser = None lexer_conf = None start = None __serialize_fields__ = 'parser', 'lexer_conf', 'start' __serialize_namespace__ = LexerConf, def __init__(self, lexer_conf, parser_conf, options=None): self.lexer_conf = lexer_conf self.start = parser_conf.start self.postlex = lexer_conf.postlex @classmethod def deserialize(cls, data, memo, callbacks, postlex): inst = super(WithLexer, cls).deserialize(data, memo) inst.postlex = postlex inst.parser = LALR_Parser.deserialize(inst.parser, memo, callbacks) inst.init_lexer() return inst def _serialize(self, data, memo): data['parser'] = data['parser'].serialize(memo) def lex(self, *args): stream = self.lexer.lex(*args) return self.postlex.process(stream) if self.postlex else stream def parse(self, text, start=None): token_stream = self.lex(text) return self._parse(token_stream, start) def init_traditional_lexer(self): self.lexer = TraditionalLexer(self.lexer_conf.tokens, ignore=self.lexer_conf.ignore, user_callbacks=self.lexer_conf.callbacks) class LALR_WithLexer(WithLexer): def __init__(self, lexer_conf, parser_conf, options=None): debug = options.debug if options else False self.parser = LALR_Parser(parser_conf, debug=debug) WithLexer.__init__(self, lexer_conf, parser_conf, options) self.init_lexer() def init_lexer(self): raise NotImplementedError() class LALR_TraditionalLexer(LALR_WithLexer): def init_lexer(self): self.init_traditional_lexer() class LALR_ContextualLexer(LALR_WithLexer): def init_lexer(self): states = {idx:list(t.keys()) for idx, t in self.parser._parse_table.states.items()} always_accept = self.postlex.always_accept if self.postlex else () self.lexer = ContextualLexer(self.lexer_conf.tokens, states, ignore=self.lexer_conf.ignore, always_accept=always_accept, user_callbacks=self.lexer_conf.callbacks) def parse(self, text, start=None): parser_state = [None] def set_parser_state(s): parser_state[0] = s token_stream = self.lex(text, lambda: parser_state[0]) return self._parse(token_stream, start, set_parser_state) class LarkOptions(Serialize): """Specifies the options for Lark """ OPTIONS_DOC = """ parser - Decides which parser engine to use, "earley" or "lalr". (Default: "earley") Note: "lalr" requires a lexer lexer - Decides whether or not to use a lexer stage "standard": Use a standard lexer "contextual": Stronger lexer (only works with parser="lalr") "dynamic": Flexible and powerful (only with parser="earley") "dynamic_complete": Same as dynamic, but tries *every* variation of tokenizing possible. (only with parser="earley") "auto" (default): Choose for me based on grammar and parser ambiguity - Decides how to handle ambiguity in the parse. Only relevant if parser="earley" "resolve": The parser will automatically choose the simplest derivation (it chooses consistently: greedy for tokens, non-greedy for rules) "explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest). transformer - Applies the transformer to every parse tree debug - Affects verbosity (default: False) keep_all_tokens - Don't automagically remove "punctuation" tokens (default: False) cache_grammar - Cache the Lark grammar (Default: False) postlex - Lexer post-processing (Default: None) Only works with the standard and contextual lexers. start - The start symbol, either a string, or a list of strings for multiple possible starts (Default: "start") priority - How priorities should be evaluated - auto, none, normal, invert (Default: auto) propagate_positions - Propagates [line, column, end_line, end_column] attributes into all tree branches. lexer_callbacks - Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution. maybe_placeholders - Experimental feature. Instead of omitting optional rules (i.e. rule?), replace them with None """ if __doc__: __doc__ += OPTIONS_DOC _defaults = { 'debug': False, 'keep_all_tokens': False, 'tree_class': None, 'cache_grammar': False, 'postlex': None, 'parser': 'earley', 'lexer': 'auto', 'transformer': None, 'start': 'start', 'priority': 'auto', 'ambiguity': 'auto', 'propagate_positions': False, 'lexer_callbacks': {}, 'maybe_placeholders': True, 'edit_terminals': None, } def __init__(self, options_dict): o = dict(options_dict) options = {} for name, default in self._defaults.items(): if name in o: value = o.pop(name) if isinstance(default, bool): value = bool(value) else: value = default options[name] = value if isinstance(options['start'], STRING_TYPE): options['start'] = [options['start']] self.__dict__['options'] = options assert self.parser in ('earley', 'lalr', 'cyk', None) if self.parser == 'earley' and self.transformer: raise ValueError('Cannot specify an embedded transformer when using the Earley algorithm.' 'Please use your transformer on the resulting parse tree, or use a different algorithm (i.e. LALR)') if o: raise ValueError("Unknown options: %s" % o.keys()) def __getattr__(self, name): try: return self.options[name] except KeyError as e: raise AttributeError(e) def __setattr__(self, name, value): assert name in self.options self.options[name] = value def serialize(self, memo): return self.options @classmethod def deserialize(cls, data, memo): return cls(data) class Lark(Serialize): def __init__(self, grammar, **options): """ grammar : a string or file-object containing the grammar spec (using Lark's ebnf syntax) options : a dictionary controlling various aspects of Lark. """ self.options = LarkOptions(options) # Some, but not all file-like objects have a 'name' attribute try: self.source = grammar.name except AttributeError: self.source = '' # Drain file-like objects to get their contents try: read = grammar.read except AttributeError: pass else: grammar = read() assert isinstance(grammar, STRING_TYPE) if self.options.cache_grammar: raise NotImplementedError("Not available yet") if self.options.lexer == 'auto': if self.options.parser == 'lalr': self.options.lexer = 'contextual' elif self.options.parser == 'earley': self.options.lexer = 'dynamic' elif self.options.parser == 'cyk': self.options.lexer = 'standard' else: assert False, self.options.parser lexer = self.options.lexer assert lexer in ('standard', 'contextual', 'dynamic', 'dynamic_complete') or issubclass(lexer, Lexer) if self.options.ambiguity == 'auto': if self.options.parser == 'earley': self.options.ambiguity = 'resolve' else: disambig_parsers = ['earley', 'cyk'] assert self.options.parser in disambig_parsers, ( 'Only %s supports disambiguation right now') % ', '.join(disambig_parsers) if self.options.priority == 'auto': if self.options.parser in ('earley', 'cyk', ): self.options.priority = 'normal' elif self.options.parser in ('lalr', ): self.options.priority = None elif self.options.priority in ('invert', 'normal'): assert self.options.parser in ('earley', 'cyk'), "priorities are not supported for LALR at this time" assert self.options.priority in ('auto', None, 'normal', 'invert'), 'invalid priority option specified: {}. options are auto, none, normal, invert.'.format(self.options.priority) assert self.options.ambiguity not in ('resolve__antiscore_sum', ), 'resolve__antiscore_sum has been replaced with the option priority="invert"' assert self.options.ambiguity in ('resolve', 'explicit', 'auto', ) # Parse the grammar file and compose the grammars (TODO) self.grammar = load_grammar(grammar, self.source) # Compile the EBNF grammar into BNF self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start) if self.options.edit_terminals: for t in self.terminals: self.options.edit_terminals(t) self._terminals_dict = {t.name:t for t in self.terminals} # If the user asked to invert the priorities, negate them all here. # This replaces the old 'resolve__antiscore_sum' option. if self.options.priority == 'invert': for rule in self.rules: if rule.options.priority is not None: rule.options.priority = -rule.options.priority # Else, if the user asked to disable priorities, strip them from the # rules. This allows the Earley parsers to skip an extra forest walk # for improved performance, if you don't need them (or didn't specify any). elif self.options.priority == None: for rule in self.rules: if rule.options.priority is not None: rule.options.priority = None # TODO Deprecate lexer_callbacks? lexer_callbacks = dict(self.options.lexer_callbacks) if self.options.transformer: t = self.options.transformer for term in self.terminals: if hasattr(t, term.name): lexer_callbacks[term.name] = getattr(t, term.name) self.lexer_conf = LexerConf(self.terminals, self.ignore_tokens, self.options.postlex, lexer_callbacks) if self.options.parser: self.parser = self._build_parser() elif lexer: self.lexer = self._build_lexer() if __init__.__doc__: __init__.__doc__ += "\nOPTIONS:" + LarkOptions.OPTIONS_DOC __serialize_fields__ = 'parser', 'rules', 'options' def _build_lexer(self): return TraditionalLexer(self.lexer_conf.tokens, ignore=self.lexer_conf.ignore, user_callbacks=self.lexer_conf.callbacks) def _prepare_callbacks(self): self.parser_class = get_frontend(self.options.parser, self.options.lexer) self._parse_tree_builder = ParseTreeBuilder(self.rules, self.options.tree_class or Tree, self.options.propagate_positions, self.options.keep_all_tokens, self.options.parser!='lalr' and self.options.ambiguity=='explicit', self.options.maybe_placeholders) self._callbacks = self._parse_tree_builder.create_callback(self.options.transformer) def _build_parser(self): self._prepare_callbacks() parser_conf = ParserConf(self.rules, self._callbacks, self.options.start) return self.parser_class(self.lexer_conf, parser_conf, options=self.options) @classmethod def deserialize(cls, data, namespace, memo, transformer=None, postlex=None): if memo: memo = SerializeMemoizer.deserialize(memo, namespace, {}) inst = cls.__new__(cls) options = dict(data['options']) options['transformer'] = transformer options['postlex'] = postlex inst.options = LarkOptions.deserialize(options, memo) inst.rules = [Rule.deserialize(r, memo) for r in data['rules']] inst.source = '' inst._prepare_callbacks() inst.parser = inst.parser_class.deserialize(data['parser'], memo, inst._callbacks, inst.options.postlex) return inst @classmethod def open(cls, grammar_filename, rel_to=None, **options): """Create an instance of Lark with the grammar given by its filename If rel_to is provided, the function will find the grammar filename in relation to it. Example: >>> Lark.open("grammar_file.lark", rel_to=__file__, parser="lalr") Lark(...) """ if rel_to: basepath = os.path.dirname(rel_to) grammar_filename = os.path.join(basepath, grammar_filename) with open(grammar_filename, encoding='utf8') as f: return cls(f, **options) def __repr__(self): return 'Lark(open(%r), parser=%r, lexer=%r, ...)' % (self.source, self.options.parser, self.options.lexer) def lex(self, text): "Only lex (and postlex) the text, without parsing it. Only relevant when lexer='standard'" if not hasattr(self, 'lexer'): self.lexer = self._build_lexer() stream = self.lexer.lex(text) if self.options.postlex: return self.options.postlex.process(stream) return stream def get_terminal(self, name): "Get information about a terminal" return self._terminals_dict[name] def parse(self, text, start=None): """Parse the given text, according to the options provided. The 'start' parameter is required if Lark was given multiple possible start symbols (using the start option). Returns a tree, unless specified otherwise. """ return self.parser.parse(text, start=start) DATA = ( {'rules': [{'@': 26}, {'@': 30}, {'@': 25}, {'@': 31}, {'@': 23}, {'@': 19}, {'@': 14}, {'@': 22}, {'@': 27}, {'@': 16}, {'@': 28}, {'@': 12}, {'@': 24}, {'@': 29}, {'@': 20}, {'@': 21}, {'@': 15}, {'@': 13}, {'@': 17}, {'@': 18}], 'parser': {'lexer_conf': {'tokens': [{'@': 0}, {'@': 1}, {'@': 2}, {'@': 3}, {'@': 4}, {'@': 5}, {'@': 6}, {'@': 7}, {'@': 8}, {'@': 9}, {'@': 10}, {'@': 11}], 'ignore': [u'WS'], '__type__': 'LexerConf'}, 'parser': {'tokens': {0: 'RSQB', 1: 'COMMA', 2: 'RBRACE', 3: '$END', 4: u'__array_star_0', 5: 'COLON', 6: u'pair', 7: u'ESCAPED_STRING', 8: u'string', 9: 'LBRACE', 10: u'FALSE', 11: u'object', 12: u'NULL', 13: u'SIGNED_NUMBER', 14: u'value', 15: u'array', 16: u'TRUE', 17: 'LSQB', 18: u'__object_star_1', 19: 'start'}, 'states': {0: {0: (1, {'@': 12}), 1: (1, {'@': 12}), 2: (1, {'@': 12}), 3: (1, {'@': 12})}, 1: {0: (0, 11), 1: (0, 20), 4: (0, 17)}, 2: {1: (0, 23), 2: (0, 0)}, 3: {5: (0, 12)}, 4: {8: (0, 3), 6: (0, 13), 7: (0, 21)}, 5: {8: (0, 3), 2: (0, 30), 6: (0, 19), 7: (0, 21)}, 6: {0: (0, 29), 7: (0, 21), 8: (0, 33), 9: (0, 5), 10: (0, 8), 11: (0, 31), 12: (0, 22), 13: (0, 24), 14: (0, 1), 15: (0, 26), 16: (0, 16), 17: (0, 6)}, 7: {0: (1, {'@': 13}), 1: (1, {'@': 13})}, 8: {0: (1, {'@': 14}), 1: (1, {'@': 14}), 2: (1, {'@': 14}), 3: (1, {'@': 14})}, 9: {0: (1, {'@': 15}), 1: (1, {'@': 15})}, 10: {7: (0, 21), 8: (0, 33), 9: (0, 5), 10: (0, 8), 11: (0, 31), 12: (0, 22), 13: (0, 24), 14: (0, 7), 15: (0, 26), 16: (0, 16), 17: (0, 6)}, 11: {0: (1, {'@': 16}), 1: (1, {'@': 16}), 2: (1, {'@': 16}), 3: (1, {'@': 16})}, 12: {7: (0, 21), 8: (0, 33), 9: (0, 5), 10: (0, 8), 11: (0, 31), 12: (0, 22), 13: (0, 24), 14: (0, 18), 15: (0, 26), 16: (0, 16), 17: (0, 6)}, 13: {1: (1, {'@': 17}), 2: (1, {'@': 17})}, 14: {}, 15: {1: (1, {'@': 18}), 2: (1, {'@': 18})}, 16: {0: (1, {'@': 19}), 1: (1, {'@': 19}), 2: (1, {'@': 19}), 3: (1, {'@': 19})}, 17: {0: (0, 28), 1: (0, 10)}, 18: {1: (1, {'@': 20}), 2: (1, {'@': 20})}, 19: {1: (0, 4), 18: (0, 2), 2: (0, 25)}, 20: {7: (0, 21), 8: (0, 33), 9: (0, 5), 10: (0, 8), 11: (0, 31), 12: (0, 22), 13: (0, 24), 14: (0, 9), 15: (0, 26), 16: (0, 16), 17: (0, 6)}, 21: {0: (1, {'@': 21}), 1: (1, {'@': 21}), 2: (1, {'@': 21}), 3: (1, {'@': 21}), 5: (1, {'@': 21})}, 22: {0: (1, {'@': 22}), 1: (1, {'@': 22}), 2: (1, {'@': 22}), 3: (1, {'@': 22})}, 23: {8: (0, 3), 6: (0, 15), 7: (0, 21)}, 24: {0: (1, {'@': 23}), 1: (1, {'@': 23}), 2: (1, {'@': 23}), 3: (1, {'@': 23})}, 25: {0: (1, {'@': 24}), 1: (1, {'@': 24}), 2: (1, {'@': 24}), 3: (1, {'@': 24})}, 26: {0: (1, {'@': 25}), 1: (1, {'@': 25}), 2: (1, {'@': 25}), 3: (1, {'@': 25})}, 27: {3: (1, {'@': 26})}, 28: {0: (1, {'@': 27}), 1: (1, {'@': 27}), 2: (1, {'@': 27}), 3: (1, {'@': 27})}, 29: {0: (1, {'@': 28}), 1: (1, {'@': 28}), 2: (1, {'@': 28}), 3: (1, {'@': 28})}, 30: {0: (1, {'@': 29}), 1: (1, {'@': 29}), 2: (1, {'@': 29}), 3: (1, {'@': 29})}, 31: {0: (1, {'@': 30}), 1: (1, {'@': 30}), 2: (1, {'@': 30}), 3: (1, {'@': 30})}, 32: {7: (0, 21), 8: (0, 33), 9: (0, 5), 10: (0, 8), 11: (0, 31), 12: (0, 22), 13: (0, 24), 14: (0, 27), 15: (0, 26), 16: (0, 16), 17: (0, 6), 19: (0, 14)}, 33: {0: (1, {'@': 31}), 1: (1, {'@': 31}), 2: (1, {'@': 31}), 3: (1, {'@': 31})}}, 'end_states': {'start': 14}, 'start_states': {'start': 32}}, '__type__': 'LALR_ContextualLexer', 'start': ['start']}, '__type__': 'Lark', 'options': {'transformer': None, 'lexer': 'contextual', 'lexer_callbacks': {}, 'debug': False, 'postlex': None, 'parser': 'lalr', 'cache_grammar': False, 'tree_class': None, 'priority': None, 'start': ['start'], 'keep_all_tokens': False, 'ambiguity': 'auto', 'edit_terminals': None, 'propagate_positions': False, 'maybe_placeholders': True}} ) MEMO = ( {0: {'priority': 1, 'pattern': {'__type__': 'PatternRE', '_width': [2, 4294967295], 'flags': [], 'value': u'\\".*?(? movement | "c" COLOR [COLOR] -> change_color | "fill" code_block -> fill | "repeat" NUMBER code_block -> repeat code_block: "{" instruction+ "}" MOVEMENT: "f"|"b"|"l"|"r" COLOR: LETTER+ %import common.LETTER %import common.INT -> NUMBER %import common.WS %ignore WS """ parser = Lark(turtle_grammar) def run_instruction(t): if t.data == 'change_color': turtle.color(*t.children) # We just pass the color names as-is elif t.data == 'movement': name, number = t.children { 'f': turtle.fd, 'b': turtle.bk, 'l': turtle.lt, 'r': turtle.rt, }[name](int(number)) elif t.data == 'repeat': count, block = t.children for i in range(int(count)): run_instruction(block) elif t.data == 'fill': turtle.begin_fill() run_instruction(t.children[0]) turtle.end_fill() elif t.data == 'code_block': for cmd in t.children: run_instruction(cmd) else: raise SyntaxError('Unknown instruction: %s' % t.data) def run_turtle(program): parse_tree = parser.parse(program) for inst in parse_tree.children: run_instruction(inst) def main(): while True: code = input('> ') try: run_turtle(code) except Exception as e: print(e) def test(): text = """ c red yellow fill { repeat 36 { f200 l170 }} """ run_turtle(text) if __name__ == '__main__': # test() main() lark-0.8.1/lark/000077500000000000000000000000001361215331400134015ustar00rootroot00000000000000lark-0.8.1/lark/__init__.py000066400000000000000000000005251361215331400155140ustar00rootroot00000000000000from .tree import Tree from .visitors import Transformer, Visitor, v_args, Discard from .visitors import InlineTransformer, inline_args # XXX Deprecated from .exceptions import ParseError, LexError, GrammarError, UnexpectedToken, UnexpectedInput, UnexpectedCharacters from .lexer import Token from .lark import Lark __version__ = "0.8.1" lark-0.8.1/lark/common.py000066400000000000000000000012321361215331400152410ustar00rootroot00000000000000from .utils import Serialize from .lexer import TerminalDef ###{standalone class LexerConf(Serialize): __serialize_fields__ = 'tokens', 'ignore' __serialize_namespace__ = TerminalDef, def __init__(self, tokens, ignore=(), postlex=None, callbacks=None): self.tokens = tokens self.ignore = ignore self.postlex = postlex self.callbacks = callbacks or {} def _deserialize(self): self.callbacks = {} # TODO ###} class ParserConf: def __init__(self, rules, callbacks, start): assert isinstance(start, list) self.rules = rules self.callbacks = callbacks self.start = start lark-0.8.1/lark/exceptions.py000066400000000000000000000070601361215331400161370ustar00rootroot00000000000000from .utils import STRING_TYPE ###{standalone class LarkError(Exception): pass class GrammarError(LarkError): pass class ParseError(LarkError): pass class LexError(LarkError): pass class UnexpectedEOF(ParseError): def __init__(self, expected): self.expected = expected message = ("Unexpected end-of-input. Expected one of: \n\t* %s\n" % '\n\t* '.join(x.name for x in self.expected)) super(UnexpectedEOF, self).__init__(message) class UnexpectedInput(LarkError): pos_in_stream = None def get_context(self, text, span=40): pos = self.pos_in_stream start = max(pos - span, 0) end = pos + span before = text[start:pos].rsplit('\n', 1)[-1] after = text[pos:end].split('\n', 1)[0] return before + after + '\n' + ' ' * len(before) + '^\n' def match_examples(self, parse_fn, examples): """ Given a parser instance and a dictionary mapping some label with some malformed syntax examples, it'll return the label for the example that bests matches the current error. """ assert self.state is not None, "Not supported for this exception" candidate = None for label, example in examples.items(): assert not isinstance(example, STRING_TYPE) for malformed in example: try: parse_fn(malformed) except UnexpectedInput as ut: if ut.state == self.state: try: if ut.token == self.token: # Try exact match first return label except AttributeError: pass if not candidate: candidate = label return candidate class UnexpectedCharacters(LexError, UnexpectedInput): def __init__(self, seq, lex_pos, line, column, allowed=None, considered_tokens=None, state=None, token_history=None): message = "No terminal defined for '%s' at line %d col %d" % (seq[lex_pos], line, column) self.line = line self.column = column self.allowed = allowed self.considered_tokens = considered_tokens self.pos_in_stream = lex_pos self.state = state message += '\n\n' + self.get_context(seq) if allowed: message += '\nExpecting: %s\n' % allowed if token_history: message += '\nPrevious tokens: %s\n' % ', '.join(repr(t) for t in token_history) super(UnexpectedCharacters, self).__init__(message) class UnexpectedToken(ParseError, UnexpectedInput): def __init__(self, token, expected, considered_rules=None, state=None): self.token = token self.expected = expected # XXX str shouldn't necessary self.line = getattr(token, 'line', '?') self.column = getattr(token, 'column', '?') self.considered_rules = considered_rules self.state = state self.pos_in_stream = getattr(token, 'pos_in_stream', None) message = ("Unexpected token %r at line %s, column %s.\n" "Expected one of: \n\t* %s\n" % (token, self.line, self.column, '\n\t* '.join(self.expected))) super(UnexpectedToken, self).__init__(message) class VisitError(LarkError): def __init__(self, rule, obj, orig_exc): self.obj = obj self.orig_exc = orig_exc message = 'Error trying to process rule "%s":\n\n%s' % (rule, orig_exc) super(VisitError, self).__init__(message) ###} lark-0.8.1/lark/grammar.py000066400000000000000000000053511361215331400154050ustar00rootroot00000000000000from .utils import Serialize ###{standalone class Symbol(Serialize): __slots__ = ('name',) is_term = NotImplemented def __init__(self, name): self.name = name def __eq__(self, other): assert isinstance(other, Symbol), other return self.is_term == other.is_term and self.name == other.name def __ne__(self, other): return not (self == other) def __hash__(self): return hash(self.name) def __repr__(self): return '%s(%r)' % (type(self).__name__, self.name) fullrepr = property(__repr__) class Terminal(Symbol): __serialize_fields__ = 'name', 'filter_out' is_term = True def __init__(self, name, filter_out=False): self.name = name self.filter_out = filter_out @property def fullrepr(self): return '%s(%r, %r)' % (type(self).__name__, self.name, self.filter_out) class NonTerminal(Symbol): __serialize_fields__ = 'name', is_term = False class RuleOptions(Serialize): __serialize_fields__ = 'keep_all_tokens', 'expand1', 'priority', 'empty_indices' def __init__(self, keep_all_tokens=False, expand1=False, priority=None, empty_indices=()): self.keep_all_tokens = keep_all_tokens self.expand1 = expand1 self.priority = priority self.empty_indices = empty_indices def __repr__(self): return 'RuleOptions(%r, %r, %r)' % ( self.keep_all_tokens, self.expand1, self.priority, ) class Rule(Serialize): """ origin : a symbol expansion : a list of symbols order : index of this expansion amongst all rules of the same name """ __slots__ = ('origin', 'expansion', 'alias', 'options', 'order', '_hash') __serialize_fields__ = 'origin', 'expansion', 'order', 'alias', 'options' __serialize_namespace__ = Terminal, NonTerminal, RuleOptions def __init__(self, origin, expansion, order=0, alias=None, options=None): self.origin = origin self.expansion = expansion self.alias = alias self.order = order self.options = options or RuleOptions() self._hash = hash((self.origin, tuple(self.expansion))) def _deserialize(self): self._hash = hash((self.origin, tuple(self.expansion))) def __str__(self): return '<%s : %s>' % (self.origin.name, ' '.join(x.name for x in self.expansion)) def __repr__(self): return 'Rule(%r, %r, %r, %r)' % (self.origin, self.expansion, self.alias, self.options) def __hash__(self): return self._hash def __eq__(self, other): if not isinstance(other, Rule): return False return self.origin == other.origin and self.expansion == other.expansion ###} lark-0.8.1/lark/grammars/000077500000000000000000000000001361215331400152125ustar00rootroot00000000000000lark-0.8.1/lark/grammars/common.lark000066400000000000000000000013351361215331400173570ustar00rootroot00000000000000// // Numbers // DIGIT: "0".."9" HEXDIGIT: "a".."f"|"A".."F"|DIGIT INT: DIGIT+ SIGNED_INT: ["+"|"-"] INT DECIMAL: INT "." INT? | "." INT // float = /-?\d+(\.\d+)?([eE][+-]?\d+)?/ _EXP: ("e"|"E") SIGNED_INT FLOAT: INT _EXP | DECIMAL _EXP? SIGNED_FLOAT: ["+"|"-"] FLOAT NUMBER: FLOAT | INT SIGNED_NUMBER: ["+"|"-"] NUMBER // // Strings // _STRING_INNER: /.*?/ _STRING_ESC_INNER: _STRING_INNER /(? 0 def handle_NL(self, token): if self.paren_level > 0: return yield token indent_str = token.rsplit('\n', 1)[1] # Tabs and spaces indent = indent_str.count(' ') + indent_str.count('\t') * self.tab_len if indent > self.indent_level[-1]: self.indent_level.append(indent) yield Token.new_borrow_pos(self.INDENT_type, indent_str, token) else: while indent < self.indent_level[-1]: self.indent_level.pop() yield Token.new_borrow_pos(self.DEDENT_type, indent_str, token) assert indent == self.indent_level[-1], '%s != %s' % (indent, self.indent_level[-1]) def _process(self, stream): for token in stream: if token.type == self.NL_type: for t in self.handle_NL(token): yield t else: yield token if token.type in self.OPEN_PAREN_types: self.paren_level += 1 elif token.type in self.CLOSE_PAREN_types: self.paren_level -= 1 assert self.paren_level >= 0 while len(self.indent_level) > 1: self.indent_level.pop() yield Token(self.DEDENT_type, '') assert self.indent_level == [0], self.indent_level def process(self, stream): self.paren_level = 0 self.indent_level = [0] return self._process(stream) # XXX Hack for ContextualLexer. Maybe there's a more elegant solution? @property def always_accept(self): return (self.NL_type,) ###} lark-0.8.1/lark/lark.py000066400000000000000000000274151361215331400147150ustar00rootroot00000000000000from __future__ import absolute_import import os from io import open from .utils import STRING_TYPE, Serialize, SerializeMemoizer from .load_grammar import load_grammar from .tree import Tree from .common import LexerConf, ParserConf from .lexer import Lexer, TraditionalLexer from .parse_tree_builder import ParseTreeBuilder from .parser_frontends import get_frontend from .grammar import Rule ###{standalone class LarkOptions(Serialize): """Specifies the options for Lark """ OPTIONS_DOC = """ parser - Decides which parser engine to use, "earley" or "lalr". (Default: "earley") Note: "lalr" requires a lexer lexer - Decides whether or not to use a lexer stage "standard": Use a standard lexer "contextual": Stronger lexer (only works with parser="lalr") "dynamic": Flexible and powerful (only with parser="earley") "dynamic_complete": Same as dynamic, but tries *every* variation of tokenizing possible. (only with parser="earley") "auto" (default): Choose for me based on grammar and parser ambiguity - Decides how to handle ambiguity in the parse. Only relevant if parser="earley" "resolve": The parser will automatically choose the simplest derivation (it chooses consistently: greedy for tokens, non-greedy for rules) "explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest). transformer - Applies the transformer to every parse tree debug - Affects verbosity (default: False) keep_all_tokens - Don't automagically remove "punctuation" tokens (default: False) cache_grammar - Cache the Lark grammar (Default: False) postlex - Lexer post-processing (Default: None) Only works with the standard and contextual lexers. start - The start symbol, either a string, or a list of strings for multiple possible starts (Default: "start") priority - How priorities should be evaluated - auto, none, normal, invert (Default: auto) propagate_positions - Propagates [line, column, end_line, end_column] attributes into all tree branches. lexer_callbacks - Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution. maybe_placeholders - Experimental feature. Instead of omitting optional rules (i.e. rule?), replace them with None """ if __doc__: __doc__ += OPTIONS_DOC _defaults = { 'debug': False, 'keep_all_tokens': False, 'tree_class': None, 'cache_grammar': False, 'postlex': None, 'parser': 'earley', 'lexer': 'auto', 'transformer': None, 'start': 'start', 'priority': 'auto', 'ambiguity': 'auto', 'propagate_positions': False, 'lexer_callbacks': {}, 'maybe_placeholders': False, 'edit_terminals': None, } def __init__(self, options_dict): o = dict(options_dict) options = {} for name, default in self._defaults.items(): if name in o: value = o.pop(name) if isinstance(default, bool): value = bool(value) else: value = default options[name] = value if isinstance(options['start'], STRING_TYPE): options['start'] = [options['start']] self.__dict__['options'] = options assert self.parser in ('earley', 'lalr', 'cyk', None) if self.parser == 'earley' and self.transformer: raise ValueError('Cannot specify an embedded transformer when using the Earley algorithm.' 'Please use your transformer on the resulting parse tree, or use a different algorithm (i.e. LALR)') if o: raise ValueError("Unknown options: %s" % o.keys()) def __getattr__(self, name): try: return self.options[name] except KeyError as e: raise AttributeError(e) def __setattr__(self, name, value): assert name in self.options self.options[name] = value def serialize(self, memo): return self.options @classmethod def deserialize(cls, data, memo): return cls(data) class Lark(Serialize): def __init__(self, grammar, **options): """ grammar : a string or file-object containing the grammar spec (using Lark's ebnf syntax) options : a dictionary controlling various aspects of Lark. """ self.options = LarkOptions(options) # Some, but not all file-like objects have a 'name' attribute try: self.source = grammar.name except AttributeError: self.source = '' # Drain file-like objects to get their contents try: read = grammar.read except AttributeError: pass else: grammar = read() assert isinstance(grammar, STRING_TYPE) if self.options.cache_grammar: raise NotImplementedError("Not available yet") if self.options.lexer == 'auto': if self.options.parser == 'lalr': self.options.lexer = 'contextual' elif self.options.parser == 'earley': self.options.lexer = 'dynamic' elif self.options.parser == 'cyk': self.options.lexer = 'standard' else: assert False, self.options.parser lexer = self.options.lexer assert lexer in ('standard', 'contextual', 'dynamic', 'dynamic_complete') or issubclass(lexer, Lexer) if self.options.ambiguity == 'auto': if self.options.parser == 'earley': self.options.ambiguity = 'resolve' else: disambig_parsers = ['earley', 'cyk'] assert self.options.parser in disambig_parsers, ( 'Only %s supports disambiguation right now') % ', '.join(disambig_parsers) if self.options.priority == 'auto': if self.options.parser in ('earley', 'cyk', ): self.options.priority = 'normal' elif self.options.parser in ('lalr', ): self.options.priority = None elif self.options.priority in ('invert', 'normal'): assert self.options.parser in ('earley', 'cyk'), "priorities are not supported for LALR at this time" assert self.options.priority in ('auto', None, 'normal', 'invert'), 'invalid priority option specified: {}. options are auto, none, normal, invert.'.format(self.options.priority) assert self.options.ambiguity not in ('resolve__antiscore_sum', ), 'resolve__antiscore_sum has been replaced with the option priority="invert"' assert self.options.ambiguity in ('resolve', 'explicit', 'auto', ) # Parse the grammar file and compose the grammars (TODO) self.grammar = load_grammar(grammar, self.source) # Compile the EBNF grammar into BNF self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start) if self.options.edit_terminals: for t in self.terminals: self.options.edit_terminals(t) self._terminals_dict = {t.name:t for t in self.terminals} # If the user asked to invert the priorities, negate them all here. # This replaces the old 'resolve__antiscore_sum' option. if self.options.priority == 'invert': for rule in self.rules: if rule.options.priority is not None: rule.options.priority = -rule.options.priority # Else, if the user asked to disable priorities, strip them from the # rules. This allows the Earley parsers to skip an extra forest walk # for improved performance, if you don't need them (or didn't specify any). elif self.options.priority == None: for rule in self.rules: if rule.options.priority is not None: rule.options.priority = None # TODO Deprecate lexer_callbacks? lexer_callbacks = dict(self.options.lexer_callbacks) if self.options.transformer: t = self.options.transformer for term in self.terminals: if hasattr(t, term.name): lexer_callbacks[term.name] = getattr(t, term.name) self.lexer_conf = LexerConf(self.terminals, self.ignore_tokens, self.options.postlex, lexer_callbacks) if self.options.parser: self.parser = self._build_parser() elif lexer: self.lexer = self._build_lexer() if __init__.__doc__: __init__.__doc__ += "\nOPTIONS:" + LarkOptions.OPTIONS_DOC __serialize_fields__ = 'parser', 'rules', 'options' def _build_lexer(self): return TraditionalLexer(self.lexer_conf.tokens, ignore=self.lexer_conf.ignore, user_callbacks=self.lexer_conf.callbacks) def _prepare_callbacks(self): self.parser_class = get_frontend(self.options.parser, self.options.lexer) self._parse_tree_builder = ParseTreeBuilder(self.rules, self.options.tree_class or Tree, self.options.propagate_positions, self.options.keep_all_tokens, self.options.parser!='lalr' and self.options.ambiguity=='explicit', self.options.maybe_placeholders) self._callbacks = self._parse_tree_builder.create_callback(self.options.transformer) def _build_parser(self): self._prepare_callbacks() parser_conf = ParserConf(self.rules, self._callbacks, self.options.start) return self.parser_class(self.lexer_conf, parser_conf, options=self.options) @classmethod def deserialize(cls, data, namespace, memo, transformer=None, postlex=None): if memo: memo = SerializeMemoizer.deserialize(memo, namespace, {}) inst = cls.__new__(cls) options = dict(data['options']) options['transformer'] = transformer options['postlex'] = postlex inst.options = LarkOptions.deserialize(options, memo) inst.rules = [Rule.deserialize(r, memo) for r in data['rules']] inst.source = '' inst._prepare_callbacks() inst.parser = inst.parser_class.deserialize(data['parser'], memo, inst._callbacks, inst.options.postlex) return inst @classmethod def open(cls, grammar_filename, rel_to=None, **options): """Create an instance of Lark with the grammar given by its filename If rel_to is provided, the function will find the grammar filename in relation to it. Example: >>> Lark.open("grammar_file.lark", rel_to=__file__, parser="lalr") Lark(...) """ if rel_to: basepath = os.path.dirname(rel_to) grammar_filename = os.path.join(basepath, grammar_filename) with open(grammar_filename, encoding='utf8') as f: return cls(f, **options) def __repr__(self): return 'Lark(open(%r), parser=%r, lexer=%r, ...)' % (self.source, self.options.parser, self.options.lexer) def lex(self, text): "Only lex (and postlex) the text, without parsing it. Only relevant when lexer='standard'" if not hasattr(self, 'lexer'): self.lexer = self._build_lexer() stream = self.lexer.lex(text) if self.options.postlex: return self.options.postlex.process(stream) return stream def get_terminal(self, name): "Get information about a terminal" return self._terminals_dict[name] def parse(self, text, start=None): """Parse the given text, according to the options provided. The 'start' parameter is required if Lark was given multiple possible start symbols (using the start option). Returns a tree, unless specified otherwise. """ return self.parser.parse(text, start=start) ###} lark-0.8.1/lark/lexer.py000066400000000000000000000320001361215331400150650ustar00rootroot00000000000000## Lexer Implementation import re from .utils import Str, classify, get_regexp_width, Py36, Serialize from .exceptions import UnexpectedCharacters, LexError, UnexpectedToken ###{standalone class Pattern(Serialize): def __init__(self, value, flags=()): self.value = value self.flags = frozenset(flags) def __repr__(self): return repr(self.to_regexp()) # Pattern Hashing assumes all subclasses have a different priority! def __hash__(self): return hash((type(self), self.value, self.flags)) def __eq__(self, other): return type(self) == type(other) and self.value == other.value and self.flags == other.flags def to_regexp(self): raise NotImplementedError() if Py36: # Python 3.6 changed syntax for flags in regular expression def _get_flags(self, value): for f in self.flags: value = ('(?%s:%s)' % (f, value)) return value else: def _get_flags(self, value): for f in self.flags: value = ('(?%s)' % f) + value return value class PatternStr(Pattern): __serialize_fields__ = 'value', 'flags' type = "str" def to_regexp(self): return self._get_flags(re.escape(self.value)) @property def min_width(self): return len(self.value) max_width = min_width class PatternRE(Pattern): __serialize_fields__ = 'value', 'flags', '_width' type = "re" def to_regexp(self): return self._get_flags(self.value) _width = None def _get_width(self): if self._width is None: self._width = get_regexp_width(self.to_regexp()) return self._width @property def min_width(self): return self._get_width()[0] @property def max_width(self): return self._get_width()[1] class TerminalDef(Serialize): __serialize_fields__ = 'name', 'pattern', 'priority' __serialize_namespace__ = PatternStr, PatternRE def __init__(self, name, pattern, priority=1): assert isinstance(pattern, Pattern), pattern self.name = name self.pattern = pattern self.priority = priority def __repr__(self): return '%s(%r, %r)' % (type(self).__name__, self.name, self.pattern) class Token(Str): __slots__ = ('type', 'pos_in_stream', 'value', 'line', 'column', 'end_line', 'end_column', 'end_pos') def __new__(cls, type_, value, pos_in_stream=None, line=None, column=None, end_line=None, end_column=None, end_pos=None): try: self = super(Token, cls).__new__(cls, value) except UnicodeDecodeError: value = value.decode('latin1') self = super(Token, cls).__new__(cls, value) self.type = type_ self.pos_in_stream = pos_in_stream self.value = value self.line = line self.column = column self.end_line = end_line self.end_column = end_column self.end_pos = end_pos return self def update(self, type_=None, value=None): return Token.new_borrow_pos( type_ if type_ is not None else self.type, value if value is not None else self.value, self ) @classmethod def new_borrow_pos(cls, type_, value, borrow_t): return cls(type_, value, borrow_t.pos_in_stream, borrow_t.line, borrow_t.column, borrow_t.end_line, borrow_t.end_column, borrow_t.end_pos) def __reduce__(self): return (self.__class__, (self.type, self.value, self.pos_in_stream, self.line, self.column, )) def __repr__(self): return 'Token(%s, %r)' % (self.type, self.value) def __deepcopy__(self, memo): return Token(self.type, self.value, self.pos_in_stream, self.line, self.column) def __eq__(self, other): if isinstance(other, Token) and self.type != other.type: return False return Str.__eq__(self, other) __hash__ = Str.__hash__ class LineCounter: def __init__(self): self.newline_char = '\n' self.char_pos = 0 self.line = 1 self.column = 1 self.line_start_pos = 0 def feed(self, token, test_newline=True): """Consume a token and calculate the new line & column. As an optional optimization, set test_newline=False is token doesn't contain a newline. """ if test_newline: newlines = token.count(self.newline_char) if newlines: self.line += newlines self.line_start_pos = self.char_pos + token.rindex(self.newline_char) + 1 self.char_pos += len(token) self.column = self.char_pos - self.line_start_pos + 1 class _Lex: "Built to serve both Lexer and ContextualLexer" def __init__(self, lexer, state=None): self.lexer = lexer self.state = state def lex(self, stream, newline_types, ignore_types): newline_types = frozenset(newline_types) ignore_types = frozenset(ignore_types) line_ctr = LineCounter() last_token = None while line_ctr.char_pos < len(stream): lexer = self.lexer res = lexer.match(stream, line_ctr.char_pos) if not res: allowed = {v for m, tfi in lexer.mres for v in tfi.values()} - ignore_types if not allowed: allowed = {""} raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token]) value, type_ = res if type_ not in ignore_types: t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) line_ctr.feed(value, type_ in newline_types) t.end_line = line_ctr.line t.end_column = line_ctr.column t.end_pos = line_ctr.char_pos if t.type in lexer.callback: t = lexer.callback[t.type](t) if not isinstance(t, Token): raise ValueError("Callbacks must return a token (returned %r)" % t) yield t last_token = t else: if type_ in lexer.callback: t2 = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) lexer.callback[type_](t2) line_ctr.feed(value, type_ in newline_types) class UnlessCallback: def __init__(self, mres): self.mres = mres def __call__(self, t): for mre, type_from_index in self.mres: m = mre.match(t.value) if m: t.type = type_from_index[m.lastindex] break return t class CallChain: def __init__(self, callback1, callback2, cond): self.callback1 = callback1 self.callback2 = callback2 self.cond = cond def __call__(self, t): t2 = self.callback1(t) return self.callback2(t) if self.cond(t2) else t2 def _create_unless(terminals): tokens_by_type = classify(terminals, lambda t: type(t.pattern)) assert len(tokens_by_type) <= 2, tokens_by_type.keys() embedded_strs = set() callback = {} for retok in tokens_by_type.get(PatternRE, []): unless = [] # {} for strtok in tokens_by_type.get(PatternStr, []): if strtok.priority > retok.priority: continue s = strtok.pattern.value m = re.match(retok.pattern.to_regexp(), s) if m and m.group(0) == s: unless.append(strtok) if strtok.pattern.flags <= retok.pattern.flags: embedded_strs.add(strtok) if unless: callback[retok.name] = UnlessCallback(build_mres(unless, match_whole=True)) terminals = [t for t in terminals if t not in embedded_strs] return terminals, callback def _build_mres(terminals, max_size, match_whole): # Python sets an unreasonable group limit (currently 100) in its re module # Worse, the only way to know we reached it is by catching an AssertionError! # This function recursively tries less and less groups until it's successful. postfix = '$' if match_whole else '' mres = [] while terminals: try: mre = re.compile(u'|'.join(u'(?P<%s>%s)'%(t.name, t.pattern.to_regexp()+postfix) for t in terminals[:max_size])) except AssertionError: # Yes, this is what Python provides us.. :/ return _build_mres(terminals, max_size//2, match_whole) # terms_from_name = {t.name: t for t in terminals[:max_size]} mres.append((mre, {i:n for n,i in mre.groupindex.items()} )) terminals = terminals[max_size:] return mres def build_mres(terminals, match_whole=False): return _build_mres(terminals, len(terminals), match_whole) def _regexp_has_newline(r): r"""Expressions that may indicate newlines in a regexp: - newlines (\n) - escaped newline (\\n) - anything but ([^...]) - any-char (.) when the flag (?s) exists - spaces (\s) """ return '\n' in r or '\\n' in r or '\\s' in r or '[^' in r or ('(?s' in r and '.' in r) class Lexer(object): """Lexer interface Method Signatures: lex(self, stream) -> Iterator[Token] """ lex = NotImplemented class TraditionalLexer(Lexer): def __init__(self, terminals, ignore=(), user_callbacks={}): assert all(isinstance(t, TerminalDef) for t in terminals), terminals terminals = list(terminals) # Sanitization for t in terminals: try: re.compile(t.pattern.to_regexp()) except re.error: raise LexError("Cannot compile token %s: %s" % (t.name, t.pattern)) if t.pattern.min_width == 0: raise LexError("Lexer does not allow zero-width terminals. (%s: %s)" % (t.name, t.pattern)) assert set(ignore) <= {t.name for t in terminals} # Init self.newline_types = [t.name for t in terminals if _regexp_has_newline(t.pattern.to_regexp())] self.ignore_types = list(ignore) terminals.sort(key=lambda x:(-x.priority, -x.pattern.max_width, -len(x.pattern.value), x.name)) self.terminals = terminals self.user_callbacks = user_callbacks self.build() def build(self): terminals, self.callback = _create_unless(self.terminals) assert all(self.callback.values()) for type_, f in self.user_callbacks.items(): if type_ in self.callback: # Already a callback there, probably UnlessCallback self.callback[type_] = CallChain(self.callback[type_], f, lambda t: t.type == type_) else: self.callback[type_] = f self.mres = build_mres(terminals) def match(self, stream, pos): for mre, type_from_index in self.mres: m = mre.match(stream, pos) if m: return m.group(0), type_from_index[m.lastindex] def lex(self, stream): return _Lex(self).lex(stream, self.newline_types, self.ignore_types) class ContextualLexer(Lexer): def __init__(self, terminals, states, ignore=(), always_accept=(), user_callbacks={}): tokens_by_name = {} for t in terminals: assert t.name not in tokens_by_name, t tokens_by_name[t.name] = t lexer_by_tokens = {} self.lexers = {} for state, accepts in states.items(): key = frozenset(accepts) try: lexer = lexer_by_tokens[key] except KeyError: accepts = set(accepts) | set(ignore) | set(always_accept) state_tokens = [tokens_by_name[n] for n in accepts if n and n in tokens_by_name] lexer = TraditionalLexer(state_tokens, ignore=ignore, user_callbacks=user_callbacks) lexer_by_tokens[key] = lexer self.lexers[state] = lexer self.root_lexer = TraditionalLexer(terminals, ignore=ignore, user_callbacks=user_callbacks) def lex(self, stream, get_parser_state): parser_state = get_parser_state() l = _Lex(self.lexers[parser_state], parser_state) try: for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types): yield x parser_state = get_parser_state() l.lexer = self.lexers[parser_state] l.state = parser_state # For debug only, no need to worry about multithreading except UnexpectedCharacters as e: # In the contextual lexer, UnexpectedCharacters can mean that the terminal is defined, # but not in the current context. # This tests the input against the global context, to provide a nicer error. root_match = self.root_lexer.match(stream, e.pos_in_stream) if not root_match: raise value, type_ = root_match t = Token(type_, value, e.pos_in_stream, e.line, e.column) raise UnexpectedToken(t, e.allowed, state=e.state) ###} lark-0.8.1/lark/load_grammar.py000066400000000000000000000744721361215331400164160ustar00rootroot00000000000000"Parses and creates Grammar objects" import os.path import sys from copy import copy, deepcopy from io import open from .utils import bfs, eval_escaping from .lexer import Token, TerminalDef, PatternStr, PatternRE from .parse_tree_builder import ParseTreeBuilder from .parser_frontends import LALR_TraditionalLexer from .common import LexerConf, ParserConf from .grammar import RuleOptions, Rule, Terminal, NonTerminal, Symbol from .utils import classify, suppress, dedup_list, Str from .exceptions import GrammarError, UnexpectedCharacters, UnexpectedToken from .tree import Tree, SlottedTree as ST from .visitors import Transformer, Visitor, v_args, Transformer_InPlace inline_args = v_args(inline=True) __path__ = os.path.dirname(__file__) IMPORT_PATHS = [os.path.join(__path__, 'grammars')] EXT = '.lark' _RE_FLAGS = 'imslux' _EMPTY = Symbol('__empty__') _TERMINAL_NAMES = { '.' : 'DOT', ',' : 'COMMA', ':' : 'COLON', ';' : 'SEMICOLON', '+' : 'PLUS', '-' : 'MINUS', '*' : 'STAR', '/' : 'SLASH', '\\' : 'BACKSLASH', '|' : 'VBAR', '?' : 'QMARK', '!' : 'BANG', '@' : 'AT', '#' : 'HASH', '$' : 'DOLLAR', '%' : 'PERCENT', '^' : 'CIRCUMFLEX', '&' : 'AMPERSAND', '_' : 'UNDERSCORE', '<' : 'LESSTHAN', '>' : 'MORETHAN', '=' : 'EQUAL', '"' : 'DBLQUOTE', '\'' : 'QUOTE', '`' : 'BACKQUOTE', '~' : 'TILDE', '(' : 'LPAR', ')' : 'RPAR', '{' : 'LBRACE', '}' : 'RBRACE', '[' : 'LSQB', ']' : 'RSQB', '\n' : 'NEWLINE', '\r\n' : 'CRLF', '\t' : 'TAB', ' ' : 'SPACE', } # Grammar Parser TERMINALS = { '_LPAR': r'\(', '_RPAR': r'\)', '_LBRA': r'\[', '_RBRA': r'\]', 'OP': '[+*]|[?](?![a-z])', '_COLON': ':', '_COMMA': ',', '_OR': r'\|', '_DOT': r'\.(?!\.)', '_DOTDOT': r'\.\.', 'TILDE': '~', 'RULE': '!?[_?]?[a-z][_a-z0-9]*', 'TERMINAL': '_?[A-Z][_A-Z0-9]*', 'STRING': r'"(\\"|\\\\|[^"\n])*?"i?', 'REGEXP': r'/(?!/)(\\/|\\\\|[^/\n])*?/[%s]*' % _RE_FLAGS, '_NL': r'(\r?\n)+\s*', 'WS': r'[ \t]+', 'COMMENT': r'\s*//[^\n]*', '_TO': '->', '_IGNORE': r'%ignore', '_DECLARE': r'%declare', '_IMPORT': r'%import', 'NUMBER': r'[+-]?\d+', } RULES = { 'start': ['_list'], '_list': ['_item', '_list _item'], '_item': ['rule', 'term', 'statement', '_NL'], 'rule': ['RULE _COLON expansions _NL', 'RULE _DOT NUMBER _COLON expansions _NL'], 'expansions': ['alias', 'expansions _OR alias', 'expansions _NL _OR alias'], '?alias': ['expansion _TO RULE', 'expansion'], 'expansion': ['_expansion'], '_expansion': ['', '_expansion expr'], '?expr': ['atom', 'atom OP', 'atom TILDE NUMBER', 'atom TILDE NUMBER _DOTDOT NUMBER', ], '?atom': ['_LPAR expansions _RPAR', 'maybe', 'value'], 'value': ['terminal', 'nonterminal', 'literal', 'range'], 'terminal': ['TERMINAL'], 'nonterminal': ['RULE'], '?name': ['RULE', 'TERMINAL'], 'maybe': ['_LBRA expansions _RBRA'], 'range': ['STRING _DOTDOT STRING'], 'term': ['TERMINAL _COLON expansions _NL', 'TERMINAL _DOT NUMBER _COLON expansions _NL'], 'statement': ['ignore', 'import', 'declare'], 'ignore': ['_IGNORE expansions _NL'], 'declare': ['_DECLARE _declare_args _NL'], 'import': ['_IMPORT _import_path _NL', '_IMPORT _import_path _LPAR name_list _RPAR _NL', '_IMPORT _import_path _TO name _NL'], '_import_path': ['import_lib', 'import_rel'], 'import_lib': ['_import_args'], 'import_rel': ['_DOT _import_args'], '_import_args': ['name', '_import_args _DOT name'], 'name_list': ['_name_list'], '_name_list': ['name', '_name_list _COMMA name'], '_declare_args': ['name', '_declare_args name'], 'literal': ['REGEXP', 'STRING'], } @inline_args class EBNF_to_BNF(Transformer_InPlace): def __init__(self): self.new_rules = [] self.rules_by_expr = {} self.prefix = 'anon' self.i = 0 self.rule_options = None def _add_recurse_rule(self, type_, expr): if expr in self.rules_by_expr: return self.rules_by_expr[expr] new_name = '__%s_%s_%d' % (self.prefix, type_, self.i) self.i += 1 t = NonTerminal(new_name) tree = ST('expansions', [ST('expansion', [expr]), ST('expansion', [t, expr])]) self.new_rules.append((new_name, tree, self.rule_options)) self.rules_by_expr[expr] = t return t def expr(self, rule, op, *args): if op.value == '?': empty = ST('expansion', []) return ST('expansions', [rule, empty]) elif op.value == '+': # a : b c+ d # --> # a : b _c d # _c : _c c | c; return self._add_recurse_rule('plus', rule) elif op.value == '*': # a : b c* d # --> # a : b _c? d # _c : _c c | c; new_name = self._add_recurse_rule('star', rule) return ST('expansions', [new_name, ST('expansion', [])]) elif op.value == '~': if len(args) == 1: mn = mx = int(args[0]) else: mn, mx = map(int, args) if mx < mn or mn < 0: raise GrammarError("Bad Range for %s (%d..%d isn't allowed)" % (rule, mn, mx)) return ST('expansions', [ST('expansion', [rule] * n) for n in range(mn, mx+1)]) assert False, op def maybe(self, rule): keep_all_tokens = self.rule_options and self.rule_options.keep_all_tokens def will_not_get_removed(sym): if isinstance(sym, NonTerminal): return not sym.name.startswith('_') if isinstance(sym, Terminal): return keep_all_tokens or not sym.filter_out assert False if any(rule.scan_values(will_not_get_removed)): empty = _EMPTY else: empty = ST('expansion', []) return ST('expansions', [rule, empty]) class SimplifyRule_Visitor(Visitor): @staticmethod def _flatten(tree): while True: to_expand = [i for i, child in enumerate(tree.children) if isinstance(child, Tree) and child.data == tree.data] if not to_expand: break tree.expand_kids_by_index(*to_expand) def expansion(self, tree): # rules_list unpacking # a : b (c|d) e # --> # a : b c e | b d e # # In AST terms: # expansion(b, expansions(c, d), e) # --> # expansions( expansion(b, c, e), expansion(b, d, e) ) self._flatten(tree) for i, child in enumerate(tree.children): if isinstance(child, Tree) and child.data == 'expansions': tree.data = 'expansions' tree.children = [self.visit(ST('expansion', [option if i==j else other for j, other in enumerate(tree.children)])) for option in dedup_list(child.children)] self._flatten(tree) break def alias(self, tree): rule, alias_name = tree.children if rule.data == 'expansions': aliases = [] for child in tree.children[0].children: aliases.append(ST('alias', [child, alias_name])) tree.data = 'expansions' tree.children = aliases def expansions(self, tree): self._flatten(tree) tree.children = dedup_list(tree.children) class RuleTreeToText(Transformer): def expansions(self, x): return x def expansion(self, symbols): return symbols, None def alias(self, x): (expansion, _alias), alias = x assert _alias is None, (alias, expansion, '-', _alias) # Double alias not allowed return expansion, alias.value @inline_args class CanonizeTree(Transformer_InPlace): def tokenmods(self, *args): if len(args) == 1: return list(args) tokenmods, value = args return tokenmods + [value] class PrepareAnonTerminals(Transformer_InPlace): "Create a unique list of anonymous terminals. Attempt to give meaningful names to them when we add them" def __init__(self, terminals): self.terminals = terminals self.term_set = {td.name for td in self.terminals} self.term_reverse = {td.pattern: td for td in terminals} self.i = 0 @inline_args def pattern(self, p): value = p.value if p in self.term_reverse and p.flags != self.term_reverse[p].pattern.flags: raise GrammarError(u'Conflicting flags for the same terminal: %s' % p) term_name = None if isinstance(p, PatternStr): try: # If already defined, use the user-defined terminal name term_name = self.term_reverse[p].name except KeyError: # Try to assign an indicative anon-terminal name try: term_name = _TERMINAL_NAMES[value] except KeyError: if value.isalnum() and value[0].isalpha() and value.upper() not in self.term_set: with suppress(UnicodeEncodeError): value.upper().encode('ascii') # Make sure we don't have unicode in our terminal names term_name = value.upper() if term_name in self.term_set: term_name = None elif isinstance(p, PatternRE): if p in self.term_reverse: # Kind of a wierd placement.name term_name = self.term_reverse[p].name else: assert False, p if term_name is None: term_name = '__ANON_%d' % self.i self.i += 1 if term_name not in self.term_set: assert p not in self.term_reverse self.term_set.add(term_name) termdef = TerminalDef(term_name, p) self.term_reverse[p] = termdef self.terminals.append(termdef) return Terminal(term_name, filter_out=isinstance(p, PatternStr)) def _rfind(s, choices): return max(s.rfind(c) for c in choices) def _literal_to_pattern(literal): v = literal.value flag_start = _rfind(v, '/"')+1 assert flag_start > 0 flags = v[flag_start:] assert all(f in _RE_FLAGS for f in flags), flags v = v[:flag_start] assert v[0] == v[-1] and v[0] in '"/' x = v[1:-1] s = eval_escaping(x) if literal.type == 'STRING': s = s.replace('\\\\', '\\') return { 'STRING': PatternStr, 'REGEXP': PatternRE }[literal.type](s, flags) @inline_args class PrepareLiterals(Transformer_InPlace): def literal(self, literal): return ST('pattern', [_literal_to_pattern(literal)]) def range(self, start, end): assert start.type == end.type == 'STRING' start = start.value[1:-1] end = end.value[1:-1] assert len(eval_escaping(start)) == len(eval_escaping(end)) == 1, (start, end, len(eval_escaping(start)), len(eval_escaping(end))) regexp = '[%s-%s]' % (start, end) return ST('pattern', [PatternRE(regexp)]) class TerminalTreeToPattern(Transformer): def pattern(self, ps): p ,= ps return p def expansion(self, items): assert items if len(items) == 1: return items[0] if len({i.flags for i in items}) > 1: raise GrammarError("Lark doesn't support joining terminals with conflicting flags!") return PatternRE(''.join(i.to_regexp() for i in items), items[0].flags if items else ()) def expansions(self, exps): if len(exps) == 1: return exps[0] if len({i.flags for i in exps}) > 1: raise GrammarError("Lark doesn't support joining terminals with conflicting flags!") return PatternRE('(?:%s)' % ('|'.join(i.to_regexp() for i in exps)), exps[0].flags) def expr(self, args): inner, op = args[:2] if op == '~': if len(args) == 3: op = "{%d}" % int(args[2]) else: mn, mx = map(int, args[2:]) if mx < mn: raise GrammarError("Bad Range for %s (%d..%d isn't allowed)" % (inner, mn, mx)) op = "{%d,%d}" % (mn, mx) else: assert len(args) == 2 return PatternRE('(?:%s)%s' % (inner.to_regexp(), op), inner.flags) def maybe(self, expr): return self.expr(expr + ['?']) def alias(self, t): raise GrammarError("Aliasing not allowed in terminals (You used -> in the wrong place)") def value(self, v): return v[0] class PrepareSymbols(Transformer_InPlace): def value(self, v): v ,= v if isinstance(v, Tree): return v elif v.type == 'RULE': return NonTerminal(Str(v.value)) elif v.type == 'TERMINAL': return Terminal(Str(v.value), filter_out=v.startswith('_')) assert False def _choice_of_rules(rules): return ST('expansions', [ST('expansion', [Token('RULE', name)]) for name in rules]) class Grammar: def __init__(self, rule_defs, term_defs, ignore): self.term_defs = term_defs self.rule_defs = rule_defs self.ignore = ignore def compile(self, start): # We change the trees in-place (to support huge grammars) # So deepcopy allows calling compile more than once. term_defs = deepcopy(list(self.term_defs)) rule_defs = deepcopy(self.rule_defs) # =================== # Compile Terminals # =================== # Convert terminal-trees to strings/regexps for name, (term_tree, priority) in term_defs: if term_tree is None: # Terminal added through %declare continue expansions = list(term_tree.find_data('expansion')) if len(expansions) == 1 and not expansions[0].children: raise GrammarError("Terminals cannot be empty (%s)" % name) transformer = PrepareLiterals() * TerminalTreeToPattern() terminals = [TerminalDef(name, transformer.transform( term_tree ), priority) for name, (term_tree, priority) in term_defs if term_tree] # ================= # Compile Rules # ================= # 1. Pre-process terminals transformer = PrepareLiterals() * PrepareSymbols() * PrepareAnonTerminals(terminals) # Adds to terminals # 2. Convert EBNF to BNF (and apply step 1) ebnf_to_bnf = EBNF_to_BNF() rules = [] for name, rule_tree, options in rule_defs: ebnf_to_bnf.rule_options = RuleOptions(keep_all_tokens=True) if options.keep_all_tokens else None ebnf_to_bnf.prefix = name tree = transformer.transform(rule_tree) res = ebnf_to_bnf.transform(tree) rules.append((name, res, options)) rules += ebnf_to_bnf.new_rules assert len(rules) == len({name for name, _t, _o in rules}), "Whoops, name collision" # 3. Compile tree to Rule objects rule_tree_to_text = RuleTreeToText() simplify_rule = SimplifyRule_Visitor() compiled_rules = [] for rule_content in rules: name, tree, options = rule_content simplify_rule.visit(tree) expansions = rule_tree_to_text.transform(tree) for i, (expansion, alias) in enumerate(expansions): if alias and name.startswith('_'): raise GrammarError("Rule %s is marked for expansion (it starts with an underscore) and isn't allowed to have aliases (alias=%s)" % (name, alias)) empty_indices = [x==_EMPTY for x in expansion] if any(empty_indices): exp_options = copy(options) or RuleOptions() exp_options.empty_indices = empty_indices expansion = [x for x in expansion if x!=_EMPTY] else: exp_options = options assert all(isinstance(x, Symbol) for x in expansion), expansion rule = Rule(NonTerminal(name), expansion, i, alias, exp_options) compiled_rules.append(rule) # Remove duplicates of empty rules, throw error for non-empty duplicates if len(set(compiled_rules)) != len(compiled_rules): duplicates = classify(compiled_rules, lambda x: x) for dups in duplicates.values(): if len(dups) > 1: if dups[0].expansion: raise GrammarError("Rules defined twice: %s\n\n(Might happen due to colliding expansion of optionals: [] or ?)" % ''.join('\n * %s' % i for i in dups)) # Empty rule; assert all other attributes are equal assert len({(r.alias, r.order, r.options) for r in dups}) == len(dups) # Remove duplicates compiled_rules = list(set(compiled_rules)) # Filter out unused rules while True: c = len(compiled_rules) used_rules = {s for r in compiled_rules for s in r.expansion if isinstance(s, NonTerminal) and s != r.origin} used_rules |= {NonTerminal(s) for s in start} compiled_rules = [r for r in compiled_rules if r.origin in used_rules] if len(compiled_rules) == c: break # Filter out unused terminals used_terms = {t.name for r in compiled_rules for t in r.expansion if isinstance(t, Terminal)} terminals = [t for t in terminals if t.name in used_terms or t.name in self.ignore] return terminals, compiled_rules, self.ignore _imported_grammars = {} def import_grammar(grammar_path, base_paths=[]): if grammar_path not in _imported_grammars: import_paths = base_paths + IMPORT_PATHS for import_path in import_paths: with suppress(IOError): joined_path = os.path.join(import_path, grammar_path) with open(joined_path, encoding='utf8') as f: text = f.read() grammar = load_grammar(text, joined_path) _imported_grammars[grammar_path] = grammar break else: open(grammar_path, encoding='utf8') assert False return _imported_grammars[grammar_path] def import_from_grammar_into_namespace(grammar, namespace, aliases): """Returns all rules and terminals of grammar, prepended with a 'namespace' prefix, except for those which are aliased. """ imported_terms = dict(grammar.term_defs) imported_rules = {n:(n,deepcopy(t),o) for n,t,o in grammar.rule_defs} term_defs = [] rule_defs = [] def rule_dependencies(symbol): if symbol.type != 'RULE': return [] try: _, tree, _ = imported_rules[symbol] except KeyError: raise GrammarError("Missing symbol '%s' in grammar %s" % (symbol, namespace)) return _find_used_symbols(tree) def get_namespace_name(name): try: return aliases[name].value except KeyError: if name[0] == '_': return '_%s__%s' % (namespace, name[1:]) return '%s__%s' % (namespace, name) to_import = list(bfs(aliases, rule_dependencies)) for symbol in to_import: if symbol.type == 'TERMINAL': term_defs.append([get_namespace_name(symbol), imported_terms[symbol]]) else: assert symbol.type == 'RULE' rule = imported_rules[symbol] for t in rule[1].iter_subtrees(): for i, c in enumerate(t.children): if isinstance(c, Token) and c.type in ('RULE', 'TERMINAL'): t.children[i] = Token(c.type, get_namespace_name(c)) rule_defs.append((get_namespace_name(symbol), rule[1], rule[2])) return term_defs, rule_defs def resolve_term_references(term_defs): # TODO Solve with transitive closure (maybe) term_dict = {k:t for k, (t,_p) in term_defs} assert len(term_dict) == len(term_defs), "Same name defined twice?" while True: changed = False for name, (token_tree, _p) in term_defs: if token_tree is None: # Terminal added through %declare continue for exp in token_tree.find_data('value'): item ,= exp.children if isinstance(item, Token): if item.type == 'RULE': raise GrammarError("Rules aren't allowed inside terminals (%s in %s)" % (item, name)) if item.type == 'TERMINAL': term_value = term_dict[item] assert term_value is not None exp.children[0] = term_value changed = True if not changed: break for name, term in term_dict.items(): if term: # Not just declared for child in term.children: ids = [id(x) for x in child.iter_subtrees()] if id(term) in ids: raise GrammarError("Recursion in terminal '%s' (recursion is only allowed in rules, not terminals)" % name) def options_from_rule(name, *x): if len(x) > 1: priority, expansions = x priority = int(priority) else: expansions ,= x priority = None keep_all_tokens = name.startswith('!') name = name.lstrip('!') expand1 = name.startswith('?') name = name.lstrip('?') return name, expansions, RuleOptions(keep_all_tokens, expand1, priority=priority) def symbols_from_strcase(expansion): return [Terminal(x, filter_out=x.startswith('_')) if x.isupper() else NonTerminal(x) for x in expansion] @inline_args class PrepareGrammar(Transformer_InPlace): def terminal(self, name): return name def nonterminal(self, name): return name def _find_used_symbols(tree): assert tree.data == 'expansions' return {t for x in tree.find_data('expansion') for t in x.scan_values(lambda t: t.type in ('RULE', 'TERMINAL'))} class GrammarLoader: def __init__(self): terminals = [TerminalDef(name, PatternRE(value)) for name, value in TERMINALS.items()] rules = [options_from_rule(name, x) for name, x in RULES.items()] rules = [Rule(NonTerminal(r), symbols_from_strcase(x.split()), i, None, o) for r, xs, o in rules for i, x in enumerate(xs)] callback = ParseTreeBuilder(rules, ST).create_callback() lexer_conf = LexerConf(terminals, ['WS', 'COMMENT']) parser_conf = ParserConf(rules, callback, ['start']) self.parser = LALR_TraditionalLexer(lexer_conf, parser_conf) self.canonize_tree = CanonizeTree() def load_grammar(self, grammar_text, grammar_name=''): "Parse grammar_text, verify, and create Grammar object. Display nice messages on error." try: tree = self.canonize_tree.transform( self.parser.parse(grammar_text+'\n') ) except UnexpectedCharacters as e: context = e.get_context(grammar_text) raise GrammarError("Unexpected input at line %d column %d in %s: \n\n%s" % (e.line, e.column, grammar_name, context)) except UnexpectedToken as e: context = e.get_context(grammar_text) error = e.match_examples(self.parser.parse, { 'Unclosed parenthesis': ['a: (\n'], 'Umatched closing parenthesis': ['a: )\n', 'a: [)\n', 'a: (]\n'], 'Expecting rule or terminal definition (missing colon)': ['a\n', 'a->\n', 'A->\n', 'a A\n'], 'Alias expects lowercase name': ['a: -> "a"\n'], 'Unexpected colon': ['a::\n', 'a: b:\n', 'a: B:\n', 'a: "a":\n'], 'Misplaced operator': ['a: b??', 'a: b(?)', 'a:+\n', 'a:?\n', 'a:*\n', 'a:|*\n'], 'Expecting option ("|") or a new rule or terminal definition': ['a:a\n()\n'], '%import expects a name': ['%import "a"\n'], '%ignore expects a value': ['%ignore %import\n'], }) if error: raise GrammarError("%s at line %s column %s\n\n%s" % (error, e.line, e.column, context)) elif 'STRING' in e.expected: raise GrammarError("Expecting a value at line %s column %s\n\n%s" % (e.line, e.column, context)) raise tree = PrepareGrammar().transform(tree) # Extract grammar items defs = classify(tree.children, lambda c: c.data, lambda c: c.children) term_defs = defs.pop('term', []) rule_defs = defs.pop('rule', []) statements = defs.pop('statement', []) assert not defs term_defs = [td if len(td)==3 else (td[0], 1, td[1]) for td in term_defs] term_defs = [(name.value, (t, int(p))) for name, p, t in term_defs] rule_defs = [options_from_rule(*x) for x in rule_defs] # Execute statements ignore, imports = [], {} for (stmt,) in statements: if stmt.data == 'ignore': t ,= stmt.children ignore.append(t) elif stmt.data == 'import': if len(stmt.children) > 1: path_node, arg1 = stmt.children else: path_node, = stmt.children arg1 = None if isinstance(arg1, Tree): # Multi import dotted_path = tuple(path_node.children) names = arg1.children aliases = dict(zip(names, names)) # Can't have aliased multi import, so all aliases will be the same as names else: # Single import dotted_path = tuple(path_node.children[:-1]) name = path_node.children[-1] # Get name from dotted path aliases = {name: arg1 or name} # Aliases if exist if path_node.data == 'import_lib': # Import from library base_paths = [] else: # Relative import if grammar_name == '': # Import relative to script file path if grammar is coded in script try: base_file = os.path.abspath(sys.modules['__main__'].__file__) except AttributeError: base_file = None else: base_file = grammar_name # Import relative to grammar file path if external grammar file if base_file: base_paths = [os.path.split(base_file)[0]] else: base_paths = [os.path.abspath(os.path.curdir)] try: import_base_paths, import_aliases = imports[dotted_path] assert base_paths == import_base_paths, 'Inconsistent base_paths for %s.' % '.'.join(dotted_path) import_aliases.update(aliases) except KeyError: imports[dotted_path] = base_paths, aliases elif stmt.data == 'declare': for t in stmt.children: term_defs.append([t.value, (None, None)]) else: assert False, stmt # import grammars for dotted_path, (base_paths, aliases) in imports.items(): grammar_path = os.path.join(*dotted_path) + EXT g = import_grammar(grammar_path, base_paths=base_paths) new_td, new_rd = import_from_grammar_into_namespace(g, '__'.join(dotted_path), aliases) term_defs += new_td rule_defs += new_rd # Verify correctness 1 for name, _ in term_defs: if name.startswith('__'): raise GrammarError('Names starting with double-underscore are reserved (Error at %s)' % name) # Handle ignore tokens # XXX A slightly hacky solution. Recognition of %ignore TERMINAL as separate comes from the lexer's # inability to handle duplicate terminals (two names, one value) ignore_names = [] for t in ignore: if t.data=='expansions' and len(t.children) == 1: t2 ,= t.children if t2.data=='expansion' and len(t2.children) == 1: item ,= t2.children if item.data == 'value': item ,= item.children if isinstance(item, Token) and item.type == 'TERMINAL': ignore_names.append(item.value) continue name = '__IGNORE_%d'% len(ignore_names) ignore_names.append(name) term_defs.append((name, (t, 1))) # Verify correctness 2 terminal_names = set() for name, _ in term_defs: if name in terminal_names: raise GrammarError("Terminal '%s' defined more than once" % name) terminal_names.add(name) if set(ignore_names) > terminal_names: raise GrammarError("Terminals %s were marked to ignore but were not defined!" % (set(ignore_names) - terminal_names)) resolve_term_references(term_defs) rules = rule_defs rule_names = set() for name, _x, _o in rules: if name.startswith('__'): raise GrammarError('Names starting with double-underscore are reserved (Error at %s)' % name) if name in rule_names: raise GrammarError("Rule '%s' defined more than once" % name) rule_names.add(name) for name, expansions, _o in rules: for sym in _find_used_symbols(expansions): if sym.type == 'TERMINAL': if sym not in terminal_names: raise GrammarError("Token '%s' used but not defined (in rule %s)" % (sym, name)) else: if sym not in rule_names: raise GrammarError("Rule '%s' used but not defined (in rule %s)" % (sym, name)) return Grammar(rules, term_defs, ignore_names) load_grammar = GrammarLoader().load_grammar lark-0.8.1/lark/parse_tree_builder.py000066400000000000000000000243221361215331400176150ustar00rootroot00000000000000from .exceptions import GrammarError from .lexer import Token from .tree import Tree from .visitors import InlineTransformer # XXX Deprecated from .visitors import Transformer_InPlace from .visitors import _vargs_meta, _vargs_meta_inline ###{standalone from functools import partial, wraps from itertools import repeat, product class ExpandSingleChild: def __init__(self, node_builder): self.node_builder = node_builder def __call__(self, children): if len(children) == 1: return children[0] else: return self.node_builder(children) class PropagatePositions: def __init__(self, node_builder): self.node_builder = node_builder def __call__(self, children): res = self.node_builder(children) if isinstance(res, Tree): for c in children: if isinstance(c, Tree) and not c.meta.empty: res.meta.line = c.meta.line res.meta.column = c.meta.column res.meta.start_pos = c.meta.start_pos res.meta.empty = False break elif isinstance(c, Token): res.meta.line = c.line res.meta.column = c.column res.meta.start_pos = c.pos_in_stream res.meta.empty = False break for c in reversed(children): if isinstance(c, Tree) and not c.meta.empty: res.meta.end_line = c.meta.end_line res.meta.end_column = c.meta.end_column res.meta.end_pos = c.meta.end_pos res.meta.empty = False break elif isinstance(c, Token): res.meta.end_line = c.end_line res.meta.end_column = c.end_column res.meta.end_pos = c.end_pos res.meta.empty = False break return res class ChildFilter: def __init__(self, to_include, append_none, node_builder): self.node_builder = node_builder self.to_include = to_include self.append_none = append_none def __call__(self, children): filtered = [] for i, to_expand, add_none in self.to_include: if add_none: filtered += [None] * add_none if to_expand: filtered += children[i].children else: filtered.append(children[i]) if self.append_none: filtered += [None] * self.append_none return self.node_builder(filtered) class ChildFilterLALR(ChildFilter): "Optimized childfilter for LALR (assumes no duplication in parse tree, so it's safe to change it)" def __call__(self, children): filtered = [] for i, to_expand, add_none in self.to_include: if add_none: filtered += [None] * add_none if to_expand: if filtered: filtered += children[i].children else: # Optimize for left-recursion filtered = children[i].children else: filtered.append(children[i]) if self.append_none: filtered += [None] * self.append_none return self.node_builder(filtered) class ChildFilterLALR_NoPlaceholders(ChildFilter): "Optimized childfilter for LALR (assumes no duplication in parse tree, so it's safe to change it)" def __init__(self, to_include, node_builder): self.node_builder = node_builder self.to_include = to_include def __call__(self, children): filtered = [] for i, to_expand in self.to_include: if to_expand: if filtered: filtered += children[i].children else: # Optimize for left-recursion filtered = children[i].children else: filtered.append(children[i]) return self.node_builder(filtered) def _should_expand(sym): return not sym.is_term and sym.name.startswith('_') def maybe_create_child_filter(expansion, keep_all_tokens, ambiguous, _empty_indices): # Prepare empty_indices as: How many Nones to insert at each index? if _empty_indices: assert _empty_indices.count(False) == len(expansion) s = ''.join(str(int(b)) for b in _empty_indices) empty_indices = [len(ones) for ones in s.split('0')] assert len(empty_indices) == len(expansion)+1, (empty_indices, len(expansion)) else: empty_indices = [0] * (len(expansion)+1) to_include = [] nones_to_add = 0 for i, sym in enumerate(expansion): nones_to_add += empty_indices[i] if keep_all_tokens or not (sym.is_term and sym.filter_out): to_include.append((i, _should_expand(sym), nones_to_add)) nones_to_add = 0 nones_to_add += empty_indices[len(expansion)] if _empty_indices or len(to_include) < len(expansion) or any(to_expand for i, to_expand,_ in to_include): if _empty_indices or ambiguous: return partial(ChildFilter if ambiguous else ChildFilterLALR, to_include, nones_to_add) else: # LALR without placeholders return partial(ChildFilterLALR_NoPlaceholders, [(i, x) for i,x,_ in to_include]) class AmbiguousExpander: """Deal with the case where we're expanding children ('_rule') into a parent but the children are ambiguous. i.e. (parent->_ambig->_expand_this_rule). In this case, make the parent itself ambiguous with as many copies as their are ambiguous children, and then copy the ambiguous children into the right parents in the right places, essentially shifting the ambiguiuty up the tree.""" def __init__(self, to_expand, tree_class, node_builder): self.node_builder = node_builder self.tree_class = tree_class self.to_expand = to_expand def __call__(self, children): def _is_ambig_tree(child): return hasattr(child, 'data') and child.data == '_ambig' #### When we're repeatedly expanding ambiguities we can end up with nested ambiguities. # All children of an _ambig node should be a derivation of that ambig node, hence # it is safe to assume that if we see an _ambig node nested within an ambig node # it is safe to simply expand it into the parent _ambig node as an alternative derivation. ambiguous = [] for i, child in enumerate(children): if _is_ambig_tree(child): if i in self.to_expand: ambiguous.append(i) to_expand = [j for j, grandchild in enumerate(child.children) if _is_ambig_tree(grandchild)] child.expand_kids_by_index(*to_expand) if not ambiguous: return self.node_builder(children) expand = [ iter(child.children) if i in ambiguous else repeat(child) for i, child in enumerate(children) ] return self.tree_class('_ambig', [self.node_builder(list(f[0])) for f in product(zip(*expand))]) def maybe_create_ambiguous_expander(tree_class, expansion, keep_all_tokens): to_expand = [i for i, sym in enumerate(expansion) if keep_all_tokens or ((not (sym.is_term and sym.filter_out)) and _should_expand(sym))] if to_expand: return partial(AmbiguousExpander, to_expand, tree_class) def ptb_inline_args(func): @wraps(func) def f(children): return func(*children) return f def inplace_transformer(func): @wraps(func) def f(children): # function name in a Transformer is a rule name. tree = Tree(func.__name__, children) return func(tree) return f def apply_visit_wrapper(func, name, wrapper): if wrapper is _vargs_meta or wrapper is _vargs_meta_inline: raise NotImplementedError("Meta args not supported for internal transformer") @wraps(func) def f(children): return wrapper(func, name, children, None) return f class ParseTreeBuilder: def __init__(self, rules, tree_class, propagate_positions=False, keep_all_tokens=False, ambiguous=False, maybe_placeholders=False): self.tree_class = tree_class self.propagate_positions = propagate_positions self.always_keep_all_tokens = keep_all_tokens self.ambiguous = ambiguous self.maybe_placeholders = maybe_placeholders self.rule_builders = list(self._init_builders(rules)) def _init_builders(self, rules): for rule in rules: options = rule.options keep_all_tokens = self.always_keep_all_tokens or options.keep_all_tokens expand_single_child = options.expand1 wrapper_chain = list(filter(None, [ (expand_single_child and not rule.alias) and ExpandSingleChild, maybe_create_child_filter(rule.expansion, keep_all_tokens, self.ambiguous, options.empty_indices if self.maybe_placeholders else None), self.propagate_positions and PropagatePositions, self.ambiguous and maybe_create_ambiguous_expander(self.tree_class, rule.expansion, keep_all_tokens), ])) yield rule, wrapper_chain def create_callback(self, transformer=None): callbacks = {} for rule, wrapper_chain in self.rule_builders: user_callback_name = rule.alias or rule.origin.name try: f = getattr(transformer, user_callback_name) # XXX InlineTransformer is deprecated! wrapper = getattr(f, 'visit_wrapper', None) if wrapper is not None: f = apply_visit_wrapper(f, user_callback_name, wrapper) else: if isinstance(transformer, InlineTransformer): f = ptb_inline_args(f) elif isinstance(transformer, Transformer_InPlace): f = inplace_transformer(f) except AttributeError: f = partial(self.tree_class, user_callback_name) for w in wrapper_chain: f = w(f) if rule in callbacks: raise GrammarError("Rule '%s' already exists" % (rule,)) callbacks[rule] = f return callbacks ###} lark-0.8.1/lark/parser_frontends.py000066400000000000000000000175751361215331400173500ustar00rootroot00000000000000import re from functools import partial from .utils import get_regexp_width, Serialize from .parsers.grammar_analysis import GrammarAnalyzer from .lexer import TraditionalLexer, ContextualLexer, Lexer, Token from .parsers import earley, xearley, cyk from .parsers.lalr_parser import LALR_Parser from .grammar import Rule from .tree import Tree from .common import LexerConf ###{standalone def get_frontend(parser, lexer): if parser=='lalr': if lexer is None: raise ValueError('The LALR parser requires use of a lexer') elif lexer == 'standard': return LALR_TraditionalLexer elif lexer == 'contextual': return LALR_ContextualLexer elif issubclass(lexer, Lexer): return partial(LALR_CustomLexer, lexer) else: raise ValueError('Unknown lexer: %s' % lexer) elif parser=='earley': if lexer=='standard': return Earley elif lexer=='dynamic': return XEarley elif lexer=='dynamic_complete': return XEarley_CompleteLex elif lexer=='contextual': raise ValueError('The Earley parser does not support the contextual parser') else: raise ValueError('Unknown lexer: %s' % lexer) elif parser == 'cyk': if lexer == 'standard': return CYK else: raise ValueError('CYK parser requires using standard parser.') else: raise ValueError('Unknown parser: %s' % parser) class _ParserFrontend(Serialize): def _parse(self, input, start, *args): if start is None: start = self.start if len(start) > 1: raise ValueError("Lark initialized with more than 1 possible start rule. Must specify which start rule to parse", start) start ,= start return self.parser.parse(input, start, *args) class WithLexer(_ParserFrontend): lexer = None parser = None lexer_conf = None start = None __serialize_fields__ = 'parser', 'lexer_conf', 'start' __serialize_namespace__ = LexerConf, def __init__(self, lexer_conf, parser_conf, options=None): self.lexer_conf = lexer_conf self.start = parser_conf.start self.postlex = lexer_conf.postlex @classmethod def deserialize(cls, data, memo, callbacks, postlex): inst = super(WithLexer, cls).deserialize(data, memo) inst.postlex = postlex inst.parser = LALR_Parser.deserialize(inst.parser, memo, callbacks) inst.init_lexer() return inst def _serialize(self, data, memo): data['parser'] = data['parser'].serialize(memo) def lex(self, *args): stream = self.lexer.lex(*args) return self.postlex.process(stream) if self.postlex else stream def parse(self, text, start=None): token_stream = self.lex(text) return self._parse(token_stream, start) def init_traditional_lexer(self): self.lexer = TraditionalLexer(self.lexer_conf.tokens, ignore=self.lexer_conf.ignore, user_callbacks=self.lexer_conf.callbacks) class LALR_WithLexer(WithLexer): def __init__(self, lexer_conf, parser_conf, options=None): debug = options.debug if options else False self.parser = LALR_Parser(parser_conf, debug=debug) WithLexer.__init__(self, lexer_conf, parser_conf, options) self.init_lexer() def init_lexer(self): raise NotImplementedError() class LALR_TraditionalLexer(LALR_WithLexer): def init_lexer(self): self.init_traditional_lexer() class LALR_ContextualLexer(LALR_WithLexer): def init_lexer(self): states = {idx:list(t.keys()) for idx, t in self.parser._parse_table.states.items()} always_accept = self.postlex.always_accept if self.postlex else () self.lexer = ContextualLexer(self.lexer_conf.tokens, states, ignore=self.lexer_conf.ignore, always_accept=always_accept, user_callbacks=self.lexer_conf.callbacks) def parse(self, text, start=None): parser_state = [None] def set_parser_state(s): parser_state[0] = s token_stream = self.lex(text, lambda: parser_state[0]) return self._parse(token_stream, start, set_parser_state) ###} class LALR_CustomLexer(LALR_WithLexer): def __init__(self, lexer_cls, lexer_conf, parser_conf, options=None): self.lexer = lexer_cls(lexer_conf) debug = options.debug if options else False self.parser = LALR_Parser(parser_conf, debug=debug) WithLexer.__init__(self, lexer_conf, parser_conf, options) def tokenize_text(text): line = 1 col_start_pos = 0 for i, ch in enumerate(text): if '\n' in ch: line += ch.count('\n') col_start_pos = i + ch.rindex('\n') yield Token('CHAR', ch, line=line, column=i - col_start_pos) class Earley(WithLexer): def __init__(self, lexer_conf, parser_conf, options=None): WithLexer.__init__(self, lexer_conf, parser_conf, options) self.init_traditional_lexer() resolve_ambiguity = options.ambiguity == 'resolve' debug = options.debug if options else False self.parser = earley.Parser(parser_conf, self.match, resolve_ambiguity=resolve_ambiguity, debug=debug) def match(self, term, token): return term.name == token.type class XEarley(_ParserFrontend): def __init__(self, lexer_conf, parser_conf, options=None, **kw): self.token_by_name = {t.name:t for t in lexer_conf.tokens} self.start = parser_conf.start self._prepare_match(lexer_conf) resolve_ambiguity = options.ambiguity == 'resolve' debug = options.debug if options else False self.parser = xearley.Parser(parser_conf, self.match, ignore=lexer_conf.ignore, resolve_ambiguity=resolve_ambiguity, debug=debug, **kw ) def match(self, term, text, index=0): return self.regexps[term.name].match(text, index) def _prepare_match(self, lexer_conf): self.regexps = {} for t in lexer_conf.tokens: if t.priority != 1: raise ValueError("Dynamic Earley doesn't support weights on terminals", t, t.priority) regexp = t.pattern.to_regexp() try: width = get_regexp_width(regexp)[0] except ValueError: raise ValueError("Bad regexp in token %s: %s" % (t.name, regexp)) else: if width == 0: raise ValueError("Dynamic Earley doesn't allow zero-width regexps", t) self.regexps[t.name] = re.compile(regexp) def parse(self, text, start): return self._parse(text, start) class XEarley_CompleteLex(XEarley): def __init__(self, *args, **kw): XEarley.__init__(self, *args, complete_lex=True, **kw) class CYK(WithLexer): def __init__(self, lexer_conf, parser_conf, options=None): WithLexer.__init__(self, lexer_conf, parser_conf, options) self.init_traditional_lexer() self._analysis = GrammarAnalyzer(parser_conf) self.parser = cyk.Parser(parser_conf.rules) self.callbacks = parser_conf.callbacks def parse(self, text, start): tokens = list(self.lex(text)) parse = self._parse(tokens, start) parse = self._transform(parse) return parse def _transform(self, tree): subtrees = list(tree.iter_subtrees()) for subtree in subtrees: subtree.children = [self._apply_callback(c) if isinstance(c, Tree) else c for c in subtree.children] return self._apply_callback(tree) def _apply_callback(self, tree): return self.callbacks[tree.rule](tree.children) lark-0.8.1/lark/parsers/000077500000000000000000000000001361215331400150605ustar00rootroot00000000000000lark-0.8.1/lark/parsers/__init__.py000066400000000000000000000000001361215331400171570ustar00rootroot00000000000000lark-0.8.1/lark/parsers/cyk.py000066400000000000000000000300031361215331400162140ustar00rootroot00000000000000"""This module implements a CYK parser.""" # Author: https://github.com/ehudt (2018) # # Adapted by Erez from collections import defaultdict import itertools from ..exceptions import ParseError from ..lexer import Token from ..tree import Tree from ..grammar import Terminal as T, NonTerminal as NT, Symbol try: xrange except NameError: xrange = range def match(t, s): assert isinstance(t, T) return t.name == s.type class Rule(object): """Context-free grammar rule.""" def __init__(self, lhs, rhs, weight, alias): super(Rule, self).__init__() assert isinstance(lhs, NT), lhs assert all(isinstance(x, NT) or isinstance(x, T) for x in rhs), rhs self.lhs = lhs self.rhs = rhs self.weight = weight self.alias = alias def __str__(self): return '%s -> %s' % (str(self.lhs), ' '.join(str(x) for x in self.rhs)) def __repr__(self): return str(self) def __hash__(self): return hash((self.lhs, tuple(self.rhs))) def __eq__(self, other): return self.lhs == other.lhs and self.rhs == other.rhs def __ne__(self, other): return not (self == other) class Grammar(object): """Context-free grammar.""" def __init__(self, rules): self.rules = frozenset(rules) def __eq__(self, other): return self.rules == other.rules def __str__(self): return '\n' + '\n'.join(sorted(repr(x) for x in self.rules)) + '\n' def __repr__(self): return str(self) # Parse tree data structures class RuleNode(object): """A node in the parse tree, which also contains the full rhs rule.""" def __init__(self, rule, children, weight=0): self.rule = rule self.children = children self.weight = weight def __repr__(self): return 'RuleNode(%s, [%s])' % (repr(self.rule.lhs), ', '.join(str(x) for x in self.children)) class Parser(object): """Parser wrapper.""" def __init__(self, rules): super(Parser, self).__init__() self.orig_rules = {rule: rule for rule in rules} rules = [self._to_rule(rule) for rule in rules] self.grammar = to_cnf(Grammar(rules)) def _to_rule(self, lark_rule): """Converts a lark rule, (lhs, rhs, callback, options), to a Rule.""" assert isinstance(lark_rule.origin, NT) assert all(isinstance(x, Symbol) for x in lark_rule.expansion) return Rule( lark_rule.origin, lark_rule.expansion, weight=lark_rule.options.priority if lark_rule.options.priority else 0, alias=lark_rule) def parse(self, tokenized, start): # pylint: disable=invalid-name """Parses input, which is a list of tokens.""" assert start start = NT(start) table, trees = _parse(tokenized, self.grammar) # Check if the parse succeeded. if all(r.lhs != start for r in table[(0, len(tokenized) - 1)]): raise ParseError('Parsing failed.') parse = trees[(0, len(tokenized) - 1)][start] return self._to_tree(revert_cnf(parse)) def _to_tree(self, rule_node): """Converts a RuleNode parse tree to a lark Tree.""" orig_rule = self.orig_rules[rule_node.rule.alias] children = [] for child in rule_node.children: if isinstance(child, RuleNode): children.append(self._to_tree(child)) else: assert isinstance(child.name, Token) children.append(child.name) t = Tree(orig_rule.origin, children) t.rule=orig_rule return t def print_parse(node, indent=0): if isinstance(node, RuleNode): print(' ' * (indent * 2) + str(node.rule.lhs)) for child in node.children: print_parse(child, indent + 1) else: print(' ' * (indent * 2) + str(node.s)) def _parse(s, g): """Parses sentence 's' using CNF grammar 'g'.""" # The CYK table. Indexed with a 2-tuple: (start pos, end pos) table = defaultdict(set) # Top-level structure is similar to the CYK table. Each cell is a dict from # rule name to the best (lightest) tree for that rule. trees = defaultdict(dict) # Populate base case with existing terminal production rules for i, w in enumerate(s): for terminal, rules in g.terminal_rules.items(): if match(terminal, w): for rule in rules: table[(i, i)].add(rule) if (rule.lhs not in trees[(i, i)] or rule.weight < trees[(i, i)][rule.lhs].weight): trees[(i, i)][rule.lhs] = RuleNode(rule, [T(w)], weight=rule.weight) # Iterate over lengths of sub-sentences for l in xrange(2, len(s) + 1): # Iterate over sub-sentences with the given length for i in xrange(len(s) - l + 1): # Choose partition of the sub-sentence in [1, l) for p in xrange(i + 1, i + l): span1 = (i, p - 1) span2 = (p, i + l - 1) for r1, r2 in itertools.product(table[span1], table[span2]): for rule in g.nonterminal_rules.get((r1.lhs, r2.lhs), []): table[(i, i + l - 1)].add(rule) r1_tree = trees[span1][r1.lhs] r2_tree = trees[span2][r2.lhs] rule_total_weight = rule.weight + r1_tree.weight + r2_tree.weight if (rule.lhs not in trees[(i, i + l - 1)] or rule_total_weight < trees[(i, i + l - 1)][rule.lhs].weight): trees[(i, i + l - 1)][rule.lhs] = RuleNode(rule, [r1_tree, r2_tree], weight=rule_total_weight) return table, trees # This section implements context-free grammar converter to Chomsky normal form. # It also implements a conversion of parse trees from its CNF to the original # grammar. # Overview: # Applies the following operations in this order: # * TERM: Eliminates non-solitary terminals from all rules # * BIN: Eliminates rules with more than 2 symbols on their right-hand-side. # * UNIT: Eliminates non-terminal unit rules # # The following grammar characteristics aren't featured: # * Start symbol appears on RHS # * Empty rules (epsilon rules) class CnfWrapper(object): """CNF wrapper for grammar. Validates that the input grammar is CNF and provides helper data structures. """ def __init__(self, grammar): super(CnfWrapper, self).__init__() self.grammar = grammar self.rules = grammar.rules self.terminal_rules = defaultdict(list) self.nonterminal_rules = defaultdict(list) for r in self.rules: # Validate that the grammar is CNF and populate auxiliary data structures. assert isinstance(r.lhs, NT), r if len(r.rhs) not in [1, 2]: raise ParseError("CYK doesn't support empty rules") if len(r.rhs) == 1 and isinstance(r.rhs[0], T): self.terminal_rules[r.rhs[0]].append(r) elif len(r.rhs) == 2 and all(isinstance(x, NT) for x in r.rhs): self.nonterminal_rules[tuple(r.rhs)].append(r) else: assert False, r def __eq__(self, other): return self.grammar == other.grammar def __repr__(self): return repr(self.grammar) class UnitSkipRule(Rule): """A rule that records NTs that were skipped during transformation.""" def __init__(self, lhs, rhs, skipped_rules, weight, alias): super(UnitSkipRule, self).__init__(lhs, rhs, weight, alias) self.skipped_rules = skipped_rules def __eq__(self, other): return isinstance(other, type(self)) and self.skipped_rules == other.skipped_rules __hash__ = Rule.__hash__ def build_unit_skiprule(unit_rule, target_rule): skipped_rules = [] if isinstance(unit_rule, UnitSkipRule): skipped_rules += unit_rule.skipped_rules skipped_rules.append(target_rule) if isinstance(target_rule, UnitSkipRule): skipped_rules += target_rule.skipped_rules return UnitSkipRule(unit_rule.lhs, target_rule.rhs, skipped_rules, weight=unit_rule.weight + target_rule.weight, alias=unit_rule.alias) def get_any_nt_unit_rule(g): """Returns a non-terminal unit rule from 'g', or None if there is none.""" for rule in g.rules: if len(rule.rhs) == 1 and isinstance(rule.rhs[0], NT): return rule return None def _remove_unit_rule(g, rule): """Removes 'rule' from 'g' without changing the langugage produced by 'g'.""" new_rules = [x for x in g.rules if x != rule] refs = [x for x in g.rules if x.lhs == rule.rhs[0]] new_rules += [build_unit_skiprule(rule, ref) for ref in refs] return Grammar(new_rules) def _split(rule): """Splits a rule whose len(rhs) > 2 into shorter rules.""" rule_str = str(rule.lhs) + '__' + '_'.join(str(x) for x in rule.rhs) rule_name = '__SP_%s' % (rule_str) + '_%d' yield Rule(rule.lhs, [rule.rhs[0], NT(rule_name % 1)], weight=rule.weight, alias=rule.alias) for i in xrange(1, len(rule.rhs) - 2): yield Rule(NT(rule_name % i), [rule.rhs[i], NT(rule_name % (i + 1))], weight=0, alias='Split') yield Rule(NT(rule_name % (len(rule.rhs) - 2)), rule.rhs[-2:], weight=0, alias='Split') def _term(g): """Applies the TERM rule on 'g' (see top comment).""" all_t = {x for rule in g.rules for x in rule.rhs if isinstance(x, T)} t_rules = {t: Rule(NT('__T_%s' % str(t)), [t], weight=0, alias='Term') for t in all_t} new_rules = [] for rule in g.rules: if len(rule.rhs) > 1 and any(isinstance(x, T) for x in rule.rhs): new_rhs = [t_rules[x].lhs if isinstance(x, T) else x for x in rule.rhs] new_rules.append(Rule(rule.lhs, new_rhs, weight=rule.weight, alias=rule.alias)) new_rules.extend(v for k, v in t_rules.items() if k in rule.rhs) else: new_rules.append(rule) return Grammar(new_rules) def _bin(g): """Applies the BIN rule to 'g' (see top comment).""" new_rules = [] for rule in g.rules: if len(rule.rhs) > 2: new_rules += _split(rule) else: new_rules.append(rule) return Grammar(new_rules) def _unit(g): """Applies the UNIT rule to 'g' (see top comment).""" nt_unit_rule = get_any_nt_unit_rule(g) while nt_unit_rule: g = _remove_unit_rule(g, nt_unit_rule) nt_unit_rule = get_any_nt_unit_rule(g) return g def to_cnf(g): """Creates a CNF grammar from a general context-free grammar 'g'.""" g = _unit(_bin(_term(g))) return CnfWrapper(g) def unroll_unit_skiprule(lhs, orig_rhs, skipped_rules, children, weight, alias): if not skipped_rules: return RuleNode(Rule(lhs, orig_rhs, weight=weight, alias=alias), children, weight=weight) else: weight = weight - skipped_rules[0].weight return RuleNode( Rule(lhs, [skipped_rules[0].lhs], weight=weight, alias=alias), [ unroll_unit_skiprule(skipped_rules[0].lhs, orig_rhs, skipped_rules[1:], children, skipped_rules[0].weight, skipped_rules[0].alias) ], weight=weight) def revert_cnf(node): """Reverts a parse tree (RuleNode) to its original non-CNF form (Node).""" if isinstance(node, T): return node # Reverts TERM rule. if node.rule.lhs.name.startswith('__T_'): return node.children[0] else: children = [] for child in map(revert_cnf, node.children): # Reverts BIN rule. if isinstance(child, RuleNode) and child.rule.lhs.name.startswith('__SP_'): children += child.children else: children.append(child) # Reverts UNIT rule. if isinstance(node.rule, UnitSkipRule): return unroll_unit_skiprule(node.rule.lhs, node.rule.rhs, node.rule.skipped_rules, children, node.rule.weight, node.rule.alias) else: return RuleNode(node.rule, children) lark-0.8.1/lark/parsers/earley.py000066400000000000000000000347151361215331400167250ustar00rootroot00000000000000"""This module implements an scanerless Earley parser. The core Earley algorithm used here is based on Elizabeth Scott's implementation, here: https://www.sciencedirect.com/science/article/pii/S1571066108001497 That is probably the best reference for understanding the algorithm here. The Earley parser outputs an SPPF-tree as per that document. The SPPF tree format is better documented here: http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/ """ import logging from collections import deque from ..visitors import Transformer_InPlace, v_args from ..exceptions import UnexpectedEOF, UnexpectedToken from .grammar_analysis import GrammarAnalyzer from ..grammar import NonTerminal from .earley_common import Item, TransitiveItem from .earley_forest import ForestToTreeVisitor, ForestSumVisitor, SymbolNode, ForestToAmbiguousTreeVisitor class Parser: def __init__(self, parser_conf, term_matcher, resolve_ambiguity=True, debug=False): analysis = GrammarAnalyzer(parser_conf) self.parser_conf = parser_conf self.resolve_ambiguity = resolve_ambiguity self.debug = debug self.FIRST = analysis.FIRST self.NULLABLE = analysis.NULLABLE self.callbacks = parser_conf.callbacks self.predictions = {} ## These could be moved to the grammar analyzer. Pre-computing these is *much* faster than # the slow 'isupper' in is_terminal. self.TERMINALS = { sym for r in parser_conf.rules for sym in r.expansion if sym.is_term } self.NON_TERMINALS = { sym for r in parser_conf.rules for sym in r.expansion if not sym.is_term } self.forest_sum_visitor = None for rule in parser_conf.rules: self.predictions[rule.origin] = [x.rule for x in analysis.expand_rule(rule.origin)] ## Detect if any rules have priorities set. If the user specified priority = "none" then # the priorities will be stripped from all rules before they reach us, allowing us to # skip the extra tree walk. We'll also skip this if the user just didn't specify priorities # on any rules. if self.forest_sum_visitor is None and rule.options.priority is not None: self.forest_sum_visitor = ForestSumVisitor self.term_matcher = term_matcher def predict_and_complete(self, i, to_scan, columns, transitives): """The core Earley Predictor and Completer. At each stage of the input, we handling any completed items (things that matched on the last cycle) and use those to predict what should come next in the input stream. The completions and any predicted non-terminals are recursively processed until we reach a set of, which can be added to the scan list for the next scanner cycle.""" # Held Completions (H in E.Scotts paper). node_cache = {} held_completions = {} column = columns[i] # R (items) = Ei (column.items) items = deque(column) while items: item = items.pop() # remove an element, A say, from R ### The Earley completer if item.is_complete: ### (item.s == string) if item.node is None: label = (item.s, item.start, i) item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, SymbolNode(*label)) item.node.add_family(item.s, item.rule, item.start, None, None) # create_leo_transitives(item.rule.origin, item.start) ###R Joop Leo right recursion Completer if item.rule.origin in transitives[item.start]: transitive = transitives[item.start][item.s] if transitive.previous in transitives[transitive.column]: root_transitive = transitives[transitive.column][transitive.previous] else: root_transitive = transitive new_item = Item(transitive.rule, transitive.ptr, transitive.start) label = (root_transitive.s, root_transitive.start, i) new_item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, SymbolNode(*label)) new_item.node.add_path(root_transitive, item.node) if new_item.expect in self.TERMINALS: # Add (B :: aC.B, h, y) to Q to_scan.add(new_item) elif new_item not in column: # Add (B :: aC.B, h, y) to Ei and R column.add(new_item) items.append(new_item) ###R Regular Earley completer else: # Empty has 0 length. If we complete an empty symbol in a particular # parse step, we need to be able to use that same empty symbol to complete # any predictions that result, that themselves require empty. Avoids # infinite recursion on empty symbols. # held_completions is 'H' in E.Scott's paper. is_empty_item = item.start == i if is_empty_item: held_completions[item.rule.origin] = item.node originators = [originator for originator in columns[item.start] if originator.expect is not None and originator.expect == item.s] for originator in originators: new_item = originator.advance() label = (new_item.s, originator.start, i) new_item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, SymbolNode(*label)) new_item.node.add_family(new_item.s, new_item.rule, i, originator.node, item.node) if new_item.expect in self.TERMINALS: # Add (B :: aC.B, h, y) to Q to_scan.add(new_item) elif new_item not in column: # Add (B :: aC.B, h, y) to Ei and R column.add(new_item) items.append(new_item) ### The Earley predictor elif item.expect in self.NON_TERMINALS: ### (item.s == lr0) new_items = [] for rule in self.predictions[item.expect]: new_item = Item(rule, 0, i) new_items.append(new_item) # Process any held completions (H). if item.expect in held_completions: new_item = item.advance() label = (new_item.s, item.start, i) new_item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, SymbolNode(*label)) new_item.node.add_family(new_item.s, new_item.rule, new_item.start, item.node, held_completions[item.expect]) new_items.append(new_item) for new_item in new_items: if new_item.expect in self.TERMINALS: to_scan.add(new_item) elif new_item not in column: column.add(new_item) items.append(new_item) def _parse(self, stream, columns, to_scan, start_symbol=None): def is_quasi_complete(item): if item.is_complete: return True quasi = item.advance() while not quasi.is_complete: if quasi.expect not in self.NULLABLE: return False if quasi.rule.origin == start_symbol and quasi.expect == start_symbol: return False quasi = quasi.advance() return True def create_leo_transitives(origin, start): visited = set() to_create = [] trule = None previous = None ### Recursively walk backwards through the Earley sets until we find the # first transitive candidate. If this is done continuously, we shouldn't # have to walk more than 1 hop. while True: if origin in transitives[start]: previous = trule = transitives[start][origin] break is_empty_rule = not self.FIRST[origin] if is_empty_rule: break candidates = [ candidate for candidate in columns[start] if candidate.expect is not None and origin == candidate.expect ] if len(candidates) != 1: break originator = next(iter(candidates)) if originator is None or originator in visited: break visited.add(originator) if not is_quasi_complete(originator): break trule = originator.advance() if originator.start != start: visited.clear() to_create.append((origin, start, originator)) origin = originator.rule.origin start = originator.start # If a suitable Transitive candidate is not found, bail. if trule is None: return #### Now walk forwards and create Transitive Items in each set we walked through; and link # each transitive item to the next set forwards. while to_create: origin, start, originator = to_create.pop() titem = None if previous is not None: titem = previous.next_titem = TransitiveItem(origin, trule, originator, previous.column) else: titem = TransitiveItem(origin, trule, originator, start) previous = transitives[start][origin] = titem def scan(i, token, to_scan): """The core Earley Scanner. This is a custom implementation of the scanner that uses the Lark lexer to match tokens. The scan list is built by the Earley predictor, based on the previously completed tokens. This ensures that at each phase of the parse we have a custom lexer context, allowing for more complex ambiguities.""" next_to_scan = set() next_set = set() columns.append(next_set) transitives.append({}) node_cache = {} for item in set(to_scan): if match(item.expect, token): new_item = item.advance() label = (new_item.s, new_item.start, i) new_item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, SymbolNode(*label)) new_item.node.add_family(new_item.s, item.rule, new_item.start, item.node, token) if new_item.expect in self.TERMINALS: # add (B ::= Aai+1.B, h, y) to Q' next_to_scan.add(new_item) else: # add (B ::= Aa+1.B, h, y) to Ei+1 next_set.add(new_item) if not next_set and not next_to_scan: expect = {i.expect.name for i in to_scan} raise UnexpectedToken(token, expect, considered_rules = set(to_scan)) return next_to_scan # Define parser functions match = self.term_matcher # Cache for nodes & tokens created in a particular parse step. transitives = [{}] ## The main Earley loop. # Run the Prediction/Completion cycle for any Items in the current Earley set. # Completions will be added to the SPPF tree, and predictions will be recursively # processed down to terminals/empty nodes to be added to the scanner for the next # step. i = 0 for token in stream: self.predict_and_complete(i, to_scan, columns, transitives) to_scan = scan(i, token, to_scan) i += 1 self.predict_and_complete(i, to_scan, columns, transitives) ## Column is now the final column in the parse. assert i == len(columns)-1 return to_scan def parse(self, stream, start): assert start, start start_symbol = NonTerminal(start) columns = [set()] to_scan = set() # The scan buffer. 'Q' in E.Scott's paper. ## Predict for the start_symbol. # Add predicted items to the first Earley set (for the predictor) if they # result in a non-terminal, or the scanner if they result in a terminal. for rule in self.predictions[start_symbol]: item = Item(rule, 0, 0) if item.expect in self.TERMINALS: to_scan.add(item) else: columns[0].add(item) to_scan = self._parse(stream, columns, to_scan, start_symbol) # If the parse was successful, the start # symbol should have been completed in the last step of the Earley cycle, and will be in # this column. Find the item for the start_symbol, which is the root of the SPPF tree. solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0] if self.debug: from .earley_forest import ForestToPyDotVisitor try: debug_walker = ForestToPyDotVisitor() except ImportError: logging.warning("Cannot find dependency 'pydot', will not generate sppf debug image") else: debug_walker.visit(solutions[0], "sppf.png") if not solutions: expected_tokens = [t.expect for t in to_scan] raise UnexpectedEOF(expected_tokens) elif len(solutions) > 1: assert False, 'Earley should not generate multiple start symbol items!' # Perform our SPPF -> AST conversion using the right ForestVisitor. forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else ForestToAmbiguousTreeVisitor forest_tree_visitor = forest_tree_visitor_cls(self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor()) return forest_tree_visitor.visit(solutions[0]) class ApplyCallbacks(Transformer_InPlace): def __init__(self, postprocess): self.postprocess = postprocess @v_args(meta=True) def drv(self, children, meta): return self.postprocess[meta.rule](children) lark-0.8.1/lark/parsers/earley_common.py000066400000000000000000000063071361215331400202710ustar00rootroot00000000000000"This module implements an Earley Parser" # The parser uses a parse-forest to keep track of derivations and ambiguations. # When the parse ends successfully, a disambiguation stage resolves all ambiguity # (right now ambiguity resolution is not developed beyond the needs of lark) # Afterwards the parse tree is reduced (transformed) according to user callbacks. # I use the no-recursion version of Transformer, because the tree might be # deeper than Python's recursion limit (a bit absurd, but that's life) # # The algorithm keeps track of each state set, using a corresponding Column instance. # Column keeps track of new items using NewsList instances. # # Author: Erez Shinan (2017) # Email : erezshin@gmail.com from ..grammar import NonTerminal, Terminal class Item(object): "An Earley Item, the atom of the algorithm." __slots__ = ('s', 'rule', 'ptr', 'start', 'is_complete', 'expect', 'previous', 'node', '_hash') def __init__(self, rule, ptr, start): self.is_complete = len(rule.expansion) == ptr self.rule = rule # rule self.ptr = ptr # ptr self.start = start # j self.node = None # w if self.is_complete: self.s = rule.origin self.expect = None self.previous = rule.expansion[ptr - 1] if ptr > 0 and len(rule.expansion) else None else: self.s = (rule, ptr) self.expect = rule.expansion[ptr] self.previous = rule.expansion[ptr - 1] if ptr > 0 and len(rule.expansion) else None self._hash = hash((self.s, self.start)) def advance(self): return Item(self.rule, self.ptr + 1, self.start) def __eq__(self, other): return self is other or (self.s == other.s and self.start == other.start) def __hash__(self): return self._hash def __repr__(self): before = ( expansion.name for expansion in self.rule.expansion[:self.ptr] ) after = ( expansion.name for expansion in self.rule.expansion[self.ptr:] ) symbol = "{} ::= {}* {}".format(self.rule.origin.name, ' '.join(before), ' '.join(after)) return '%s (%d)' % (symbol, self.start) class TransitiveItem(Item): __slots__ = ('recognized', 'reduction', 'column', 'next_titem') def __init__(self, recognized, trule, originator, start): super(TransitiveItem, self).__init__(trule.rule, trule.ptr, trule.start) self.recognized = recognized self.reduction = originator self.column = start self.next_titem = None self._hash = hash((self.s, self.start, self.recognized)) def __eq__(self, other): if not isinstance(other, TransitiveItem): return False return self is other or (type(self.s) == type(other.s) and self.s == other.s and self.start == other.start and self.recognized == other.recognized) def __hash__(self): return self._hash def __repr__(self): before = ( expansion.name for expansion in self.rule.expansion[:self.ptr] ) after = ( expansion.name for expansion in self.rule.expansion[self.ptr:] ) return '{} : {} -> {}* {} ({}, {})'.format(self.recognized.name, self.rule.origin.name, ' '.join(before), ' '.join(after), self.column, self.start) lark-0.8.1/lark/parsers/earley_forest.py000066400000000000000000000422661361215331400203070ustar00rootroot00000000000000""""This module implements an SPPF implementation This is used as the primary output mechanism for the Earley parser in order to store complex ambiguities. Full reference and more details is here: http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/ """ from random import randint from math import isinf from collections import deque from operator import attrgetter from importlib import import_module from ..tree import Tree from ..exceptions import ParseError class ForestNode(object): pass class SymbolNode(ForestNode): """ A Symbol Node represents a symbol (or Intermediate LR0). Symbol nodes are keyed by the symbol (s). For intermediate nodes s will be an LR0, stored as a tuple of (rule, ptr). For completed symbol nodes, s will be a string representing the non-terminal origin (i.e. the left hand side of the rule). The children of a Symbol or Intermediate Node will always be Packed Nodes; with each Packed Node child representing a single derivation of a production. Hence a Symbol Node with a single child is unambiguous. """ __slots__ = ('s', 'start', 'end', '_children', 'paths', 'paths_loaded', 'priority', 'is_intermediate', '_hash') def __init__(self, s, start, end): self.s = s self.start = start self.end = end self._children = set() self.paths = set() self.paths_loaded = False ### We use inf here as it can be safely negated without resorting to conditionals, # unlike None or float('NaN'), and sorts appropriately. self.priority = float('-inf') self.is_intermediate = isinstance(s, tuple) self._hash = hash((self.s, self.start, self.end)) def add_family(self, lr0, rule, start, left, right): self._children.add(PackedNode(self, lr0, rule, start, left, right)) def add_path(self, transitive, node): self.paths.add((transitive, node)) def load_paths(self): for transitive, node in self.paths: if transitive.next_titem is not None: vn = SymbolNode(transitive.next_titem.s, transitive.next_titem.start, self.end) vn.add_path(transitive.next_titem, node) self.add_family(transitive.reduction.rule.origin, transitive.reduction.rule, transitive.reduction.start, transitive.reduction.node, vn) else: self.add_family(transitive.reduction.rule.origin, transitive.reduction.rule, transitive.reduction.start, transitive.reduction.node, node) self.paths_loaded = True @property def is_ambiguous(self): return len(self.children) > 1 @property def children(self): if not self.paths_loaded: self.load_paths() return sorted(self._children, key=attrgetter('sort_key')) def __iter__(self): return iter(self._children) def __eq__(self, other): if not isinstance(other, SymbolNode): return False return self is other or (type(self.s) == type(other.s) and self.s == other.s and self.start == other.start and self.end is other.end) def __hash__(self): return self._hash def __repr__(self): if self.is_intermediate: rule = self.s[0] ptr = self.s[1] before = ( expansion.name for expansion in rule.expansion[:ptr] ) after = ( expansion.name for expansion in rule.expansion[ptr:] ) symbol = "{} ::= {}* {}".format(rule.origin.name, ' '.join(before), ' '.join(after)) else: symbol = self.s.name return "({}, {}, {}, {})".format(symbol, self.start, self.end, self.priority) class PackedNode(ForestNode): """ A Packed Node represents a single derivation in a symbol node. """ __slots__ = ('parent', 's', 'rule', 'start', 'left', 'right', 'priority', '_hash') def __init__(self, parent, s, rule, start, left, right): self.parent = parent self.s = s self.start = start self.rule = rule self.left = left self.right = right self.priority = float('-inf') self._hash = hash((self.left, self.right)) @property def is_empty(self): return self.left is None and self.right is None @property def sort_key(self): """ Used to sort PackedNode children of SymbolNodes. A SymbolNode has multiple PackedNodes if it matched ambiguously. Hence, we use the sort order to identify the order in which ambiguous children should be considered. """ return self.is_empty, -self.priority, self.rule.order def __iter__(self): return iter([self.left, self.right]) def __eq__(self, other): if not isinstance(other, PackedNode): return False return self is other or (self.left == other.left and self.right == other.right) def __hash__(self): return self._hash def __repr__(self): if isinstance(self.s, tuple): rule = self.s[0] ptr = self.s[1] before = ( expansion.name for expansion in rule.expansion[:ptr] ) after = ( expansion.name for expansion in rule.expansion[ptr:] ) symbol = "{} ::= {}* {}".format(rule.origin.name, ' '.join(before), ' '.join(after)) else: symbol = self.s.name return "({}, {}, {}, {})".format(symbol, self.start, self.priority, self.rule.order) class ForestVisitor(object): """ An abstract base class for building forest visitors. Use this as a base when you need to walk the forest. """ __slots__ = ['result'] def visit_token_node(self, node): pass def visit_symbol_node_in(self, node): pass def visit_symbol_node_out(self, node): pass def visit_packed_node_in(self, node): pass def visit_packed_node_out(self, node): pass def visit(self, root): self.result = None # Visiting is a list of IDs of all symbol/intermediate nodes currently in # the stack. It serves two purposes: to detect when we 'recurse' in and out # of a symbol/intermediate so that we can process both up and down. Also, # since the SPPF can have cycles it allows us to detect if we're trying # to recurse into a node that's already on the stack (infinite recursion). visiting = set() # We do not use recursion here to walk the Forest due to the limited # stack size in python. Therefore input_stack is essentially our stack. input_stack = deque([root]) # It is much faster to cache these as locals since they are called # many times in large parses. vpno = getattr(self, 'visit_packed_node_out') vpni = getattr(self, 'visit_packed_node_in') vsno = getattr(self, 'visit_symbol_node_out') vsni = getattr(self, 'visit_symbol_node_in') vtn = getattr(self, 'visit_token_node') while input_stack: current = next(reversed(input_stack)) try: next_node = next(current) except StopIteration: input_stack.pop() continue except TypeError: ### If the current object is not an iterator, pass through to Token/SymbolNode pass else: if next_node is None: continue if id(next_node) in visiting: raise ParseError("Infinite recursion in grammar, in rule '%s'!" % next_node.s.name) input_stack.append(next_node) continue if not isinstance(current, ForestNode): vtn(current) input_stack.pop() continue current_id = id(current) if current_id in visiting: if isinstance(current, PackedNode): vpno(current) else: vsno(current) input_stack.pop() visiting.remove(current_id) continue else: visiting.add(current_id) if isinstance(current, PackedNode): next_node = vpni(current) else: next_node = vsni(current) if next_node is None: continue if id(next_node) in visiting: raise ParseError("Infinite recursion in grammar!") input_stack.append(next_node) continue return self.result class ForestSumVisitor(ForestVisitor): """ A visitor for prioritizing ambiguous parts of the Forest. This visitor is used when support for explicit priorities on rules is requested (whether normal, or invert). It walks the forest (or subsets thereof) and cascades properties upwards from the leaves. It would be ideal to do this during parsing, however this would require processing each Earley item multiple times. That's a big performance drawback; so running a forest walk is the lesser of two evils: there can be significantly more Earley items created during parsing than there are SPPF nodes in the final tree. """ def visit_packed_node_in(self, node): return iter([node.left, node.right]) def visit_symbol_node_in(self, node): return iter(node.children) def visit_packed_node_out(self, node): priority = node.rule.options.priority if not node.parent.is_intermediate and node.rule.options.priority else 0 priority += getattr(node.right, 'priority', 0) priority += getattr(node.left, 'priority', 0) node.priority = priority def visit_symbol_node_out(self, node): node.priority = max(child.priority for child in node.children) class ForestToTreeVisitor(ForestVisitor): """ A Forest visitor which converts an SPPF forest to an unambiguous AST. The implementation in this visitor walks only the first ambiguous child of each symbol node. When it finds an ambiguous symbol node it first calls the forest_sum_visitor implementation to sort the children into preference order using the algorithms defined there; so the first child should always be the highest preference. The forest_sum_visitor implementation should be another ForestVisitor which sorts the children according to some priority mechanism. """ __slots__ = ['forest_sum_visitor', 'callbacks', 'output_stack'] def __init__(self, callbacks, forest_sum_visitor = None): assert callbacks self.forest_sum_visitor = forest_sum_visitor self.callbacks = callbacks def visit(self, root): self.output_stack = deque() return super(ForestToTreeVisitor, self).visit(root) def visit_token_node(self, node): self.output_stack[-1].append(node) def visit_symbol_node_in(self, node): if self.forest_sum_visitor and node.is_ambiguous and isinf(node.priority): self.forest_sum_visitor.visit(node) return next(iter(node.children)) def visit_packed_node_in(self, node): if not node.parent.is_intermediate: self.output_stack.append([]) return iter([node.left, node.right]) def visit_packed_node_out(self, node): if not node.parent.is_intermediate: result = self.callbacks[node.rule](self.output_stack.pop()) if self.output_stack: self.output_stack[-1].append(result) else: self.result = result class ForestToAmbiguousTreeVisitor(ForestToTreeVisitor): """ A Forest visitor which converts an SPPF forest to an ambiguous AST. Because of the fundamental disparity between what can be stored in an SPPF and what can be stored in a Tree; this implementation is not complete. It correctly deals with ambiguities that occur on symbol nodes only, and cannot deal with ambiguities that occur on intermediate nodes. Usually, most parsers can be rewritten to avoid intermediate node ambiguities. Also, this implementation could be fixed, however the code to handle intermediate node ambiguities is messy and would not be performant. It is much better not to use this and instead to correctly disambiguate the forest and only store unambiguous parses in Trees. It is here just to provide some parity with the old ambiguity='explicit'. This is mainly used by the test framework, to make it simpler to write tests ensuring the SPPF contains the right results. """ def __init__(self, callbacks, forest_sum_visitor = ForestSumVisitor): super(ForestToAmbiguousTreeVisitor, self).__init__(callbacks, forest_sum_visitor) def visit_token_node(self, node): self.output_stack[-1].children.append(node) def visit_symbol_node_in(self, node): if self.forest_sum_visitor and node.is_ambiguous and isinf(node.priority): self.forest_sum_visitor.visit(node) if not node.is_intermediate and node.is_ambiguous: self.output_stack.append(Tree('_ambig', [])) return iter(node.children) def visit_symbol_node_out(self, node): if not node.is_intermediate and node.is_ambiguous: result = self.output_stack.pop() if self.output_stack: self.output_stack[-1].children.append(result) else: self.result = result def visit_packed_node_in(self, node): if not node.parent.is_intermediate: self.output_stack.append(Tree('drv', [])) return iter([node.left, node.right]) def visit_packed_node_out(self, node): if not node.parent.is_intermediate: result = self.callbacks[node.rule](self.output_stack.pop().children) if self.output_stack: self.output_stack[-1].children.append(result) else: self.result = result class ForestToPyDotVisitor(ForestVisitor): """ A Forest visitor which writes the SPPF to a PNG. The SPPF can get really large, really quickly because of the amount of meta-data it stores, so this is probably only useful for trivial trees and learning how the SPPF is structured. """ def __init__(self, rankdir="TB"): self.pydot = import_module('pydot') self.graph = self.pydot.Dot(graph_type='digraph', rankdir=rankdir) def visit(self, root, filename): super(ForestToPyDotVisitor, self).visit(root) self.graph.write_png(filename) def visit_token_node(self, node): graph_node_id = str(id(node)) graph_node_label = "\"{}\"".format(node.value.replace('"', '\\"')) graph_node_color = 0x808080 graph_node_style = "\"filled,rounded\"" graph_node_shape = "diamond" graph_node = self.pydot.Node(graph_node_id, style=graph_node_style, fillcolor="#{:06x}".format(graph_node_color), shape=graph_node_shape, label=graph_node_label) self.graph.add_node(graph_node) def visit_packed_node_in(self, node): graph_node_id = str(id(node)) graph_node_label = repr(node) graph_node_color = 0x808080 graph_node_style = "filled" graph_node_shape = "diamond" graph_node = self.pydot.Node(graph_node_id, style=graph_node_style, fillcolor="#{:06x}".format(graph_node_color), shape=graph_node_shape, label=graph_node_label) self.graph.add_node(graph_node) return iter([node.left, node.right]) def visit_packed_node_out(self, node): graph_node_id = str(id(node)) graph_node = self.graph.get_node(graph_node_id)[0] for child in [node.left, node.right]: if child is not None: child_graph_node_id = str(id(child)) child_graph_node = self.graph.get_node(child_graph_node_id)[0] self.graph.add_edge(self.pydot.Edge(graph_node, child_graph_node)) else: #### Try and be above the Python object ID range; probably impl. specific, but maybe this is okay. child_graph_node_id = str(randint(100000000000000000000000000000,123456789012345678901234567890)) child_graph_node_style = "invis" child_graph_node = self.pydot.Node(child_graph_node_id, style=child_graph_node_style, label="None") child_edge_style = "invis" self.graph.add_node(child_graph_node) self.graph.add_edge(self.pydot.Edge(graph_node, child_graph_node, style=child_edge_style)) def visit_symbol_node_in(self, node): graph_node_id = str(id(node)) graph_node_label = repr(node) graph_node_color = 0x808080 graph_node_style = "\"filled\"" if node.is_intermediate: graph_node_shape = "ellipse" else: graph_node_shape = "rectangle" graph_node = self.pydot.Node(graph_node_id, style=graph_node_style, fillcolor="#{:06x}".format(graph_node_color), shape=graph_node_shape, label=graph_node_label) self.graph.add_node(graph_node) return iter(node.children) def visit_symbol_node_out(self, node): graph_node_id = str(id(node)) graph_node = self.graph.get_node(graph_node_id)[0] for child in node.children: child_graph_node_id = str(id(child)) child_graph_node = self.graph.get_node(child_graph_node_id)[0] self.graph.add_edge(self.pydot.Edge(graph_node, child_graph_node)) lark-0.8.1/lark/parsers/grammar_analysis.py000066400000000000000000000145041361215331400207670ustar00rootroot00000000000000from collections import Counter, defaultdict from ..utils import bfs, fzset, classify from ..exceptions import GrammarError from ..grammar import Rule, Terminal, NonTerminal class RulePtr(object): __slots__ = ('rule', 'index') def __init__(self, rule, index): assert isinstance(rule, Rule) assert index <= len(rule.expansion) self.rule = rule self.index = index def __repr__(self): before = [x.name for x in self.rule.expansion[:self.index]] after = [x.name for x in self.rule.expansion[self.index:]] return '<%s : %s * %s>' % (self.rule.origin.name, ' '.join(before), ' '.join(after)) @property def next(self): return self.rule.expansion[self.index] def advance(self, sym): assert self.next == sym return RulePtr(self.rule, self.index+1) @property def is_satisfied(self): return self.index == len(self.rule.expansion) def __eq__(self, other): return self.rule == other.rule and self.index == other.index def __hash__(self): return hash((self.rule, self.index)) # state generation ensures no duplicate LR0ItemSets class LR0ItemSet(object): __slots__ = ('kernel', 'closure', 'transitions', 'lookaheads') def __init__(self, kernel, closure): self.kernel = fzset(kernel) self.closure = fzset(closure) self.transitions = {} self.lookaheads = defaultdict(set) def __repr__(self): return '{%s | %s}' % (', '.join([repr(r) for r in self.kernel]), ', '.join([repr(r) for r in self.closure])) def update_set(set1, set2): if not set2 or set1 > set2: return False copy = set(set1) set1 |= set2 return set1 != copy def calculate_sets(rules): """Calculate FOLLOW sets. Adapted from: http://lara.epfl.ch/w/cc09:algorithm_for_first_and_follow_sets""" symbols = {sym for rule in rules for sym in rule.expansion} | {rule.origin for rule in rules} # foreach grammar rule X ::= Y(1) ... Y(k) # if k=0 or {Y(1),...,Y(k)} subset of NULLABLE then # NULLABLE = NULLABLE union {X} # for i = 1 to k # if i=1 or {Y(1),...,Y(i-1)} subset of NULLABLE then # FIRST(X) = FIRST(X) union FIRST(Y(i)) # for j = i+1 to k # if i=k or {Y(i+1),...Y(k)} subset of NULLABLE then # FOLLOW(Y(i)) = FOLLOW(Y(i)) union FOLLOW(X) # if i+1=j or {Y(i+1),...,Y(j-1)} subset of NULLABLE then # FOLLOW(Y(i)) = FOLLOW(Y(i)) union FIRST(Y(j)) # until none of NULLABLE,FIRST,FOLLOW changed in last iteration NULLABLE = set() FIRST = {} FOLLOW = {} for sym in symbols: FIRST[sym]={sym} if sym.is_term else set() FOLLOW[sym]=set() # Calculate NULLABLE and FIRST changed = True while changed: changed = False for rule in rules: if set(rule.expansion) <= NULLABLE: if update_set(NULLABLE, {rule.origin}): changed = True for i, sym in enumerate(rule.expansion): if set(rule.expansion[:i]) <= NULLABLE: if update_set(FIRST[rule.origin], FIRST[sym]): changed = True else: break # Calculate FOLLOW changed = True while changed: changed = False for rule in rules: for i, sym in enumerate(rule.expansion): if i==len(rule.expansion)-1 or set(rule.expansion[i+1:]) <= NULLABLE: if update_set(FOLLOW[sym], FOLLOW[rule.origin]): changed = True for j in range(i+1, len(rule.expansion)): if set(rule.expansion[i+1:j]) <= NULLABLE: if update_set(FOLLOW[sym], FIRST[rule.expansion[j]]): changed = True return FIRST, FOLLOW, NULLABLE class GrammarAnalyzer(object): def __init__(self, parser_conf, debug=False): self.debug = debug root_rules = {start: Rule(NonTerminal('$root_' + start), [NonTerminal(start), Terminal('$END')]) for start in parser_conf.start} rules = parser_conf.rules + list(root_rules.values()) self.rules_by_origin = classify(rules, lambda r: r.origin) if len(rules) != len(set(rules)): duplicates = [item for item, count in Counter(rules).items() if count > 1] raise GrammarError("Rules defined twice: %s" % ', '.join(str(i) for i in duplicates)) for r in rules: for sym in r.expansion: if not (sym.is_term or sym in self.rules_by_origin): raise GrammarError("Using an undefined rule: %s" % sym) # TODO test validation self.start_states = {start: self.expand_rule(root_rule.origin) for start, root_rule in root_rules.items()} self.end_states = {start: fzset({RulePtr(root_rule, len(root_rule.expansion))}) for start, root_rule in root_rules.items()} lr0_root_rules = {start: Rule(NonTerminal('$root_' + start), [NonTerminal(start)]) for start in parser_conf.start} lr0_rules = parser_conf.rules + list(lr0_root_rules.values()) assert(len(lr0_rules) == len(set(lr0_rules))) self.lr0_rules_by_origin = classify(lr0_rules, lambda r: r.origin) # cache RulePtr(r, 0) in r (no duplicate RulePtr objects) self.lr0_start_states = {start: LR0ItemSet([RulePtr(root_rule, 0)], self.expand_rule(root_rule.origin, self.lr0_rules_by_origin)) for start, root_rule in lr0_root_rules.items()} self.FIRST, self.FOLLOW, self.NULLABLE = calculate_sets(rules) def expand_rule(self, source_rule, rules_by_origin=None): "Returns all init_ptrs accessible by rule (recursive)" if rules_by_origin is None: rules_by_origin = self.rules_by_origin init_ptrs = set() def _expand_rule(rule): assert not rule.is_term, rule for r in rules_by_origin[rule]: init_ptr = RulePtr(r, 0) init_ptrs.add(init_ptr) if r.expansion: # if not empty rule new_r = init_ptr.next if not new_r.is_term: yield new_r for _ in bfs([source_rule], _expand_rule): pass return fzset(init_ptrs) lark-0.8.1/lark/parsers/lalr_analysis.py000066400000000000000000000233101361215331400202660ustar00rootroot00000000000000"""This module builds a LALR(1) transition-table for lalr_parser.py For now, shift/reduce conflicts are automatically resolved as shifts. """ # Author: Erez Shinan (2017) # Email : erezshin@gmail.com import logging from collections import defaultdict, deque from ..utils import classify, classify_bool, bfs, fzset, Serialize, Enumerator from ..exceptions import GrammarError from .grammar_analysis import GrammarAnalyzer, Terminal, LR0ItemSet from ..grammar import Rule ###{standalone class Action: def __init__(self, name): self.name = name def __str__(self): return self.name def __repr__(self): return str(self) Shift = Action('Shift') Reduce = Action('Reduce') class ParseTable: def __init__(self, states, start_states, end_states): self.states = states self.start_states = start_states self.end_states = end_states def serialize(self, memo): tokens = Enumerator() rules = Enumerator() states = { state: {tokens.get(token): ((1, arg.serialize(memo)) if action is Reduce else (0, arg)) for token, (action, arg) in actions.items()} for state, actions in self.states.items() } return { 'tokens': tokens.reversed(), 'states': states, 'start_states': self.start_states, 'end_states': self.end_states, } @classmethod def deserialize(cls, data, memo): tokens = data['tokens'] states = { state: {tokens[token]: ((Reduce, Rule.deserialize(arg, memo)) if action==1 else (Shift, arg)) for token, (action, arg) in actions.items()} for state, actions in data['states'].items() } return cls(states, data['start_states'], data['end_states']) class IntParseTable(ParseTable): @classmethod def from_ParseTable(cls, parse_table): enum = list(parse_table.states) state_to_idx = {s:i for i,s in enumerate(enum)} int_states = {} for s, la in parse_table.states.items(): la = {k:(v[0], state_to_idx[v[1]]) if v[0] is Shift else v for k,v in la.items()} int_states[ state_to_idx[s] ] = la start_states = {start:state_to_idx[s] for start, s in parse_table.start_states.items()} end_states = {start:state_to_idx[s] for start, s in parse_table.end_states.items()} return cls(int_states, start_states, end_states) ###} # digraph and traverse, see The Theory and Practice of Compiler Writing # computes F(x) = G(x) union (union { G(y) | x R y }) # X: nodes # R: relation (function mapping node -> list of nodes that satisfy the relation) # G: set valued function def digraph(X, R, G): F = {} S = [] N = {} for x in X: N[x] = 0 for x in X: # this is always true for the first iteration, but N[x] may be updated in traverse below if N[x] == 0: traverse(x, S, N, X, R, G, F) return F # x: single node # S: stack # N: weights # X: nodes # R: relation (see above) # G: set valued function # F: set valued function we are computing (map of input -> output) def traverse(x, S, N, X, R, G, F): S.append(x) d = len(S) N[x] = d F[x] = G[x] for y in R[x]: if N[y] == 0: traverse(y, S, N, X, R, G, F) n_x = N[x] assert(n_x > 0) n_y = N[y] assert(n_y != 0) if (n_y > 0) and (n_y < n_x): N[x] = n_y F[x].update(F[y]) if N[x] == d: f_x = F[x] while True: z = S.pop() N[z] = -1 F[z] = f_x if z == x: break class LALR_Analyzer(GrammarAnalyzer): def __init__(self, parser_conf, debug=False): GrammarAnalyzer.__init__(self, parser_conf, debug) self.nonterminal_transitions = [] self.directly_reads = defaultdict(set) self.reads = defaultdict(set) self.includes = defaultdict(set) self.lookback = defaultdict(set) def compute_lr0_states(self): self.lr0_states = set() # map of kernels to LR0ItemSets cache = {} def step(state): _, unsat = classify_bool(state.closure, lambda rp: rp.is_satisfied) d = classify(unsat, lambda rp: rp.next) for sym, rps in d.items(): kernel = fzset({rp.advance(sym) for rp in rps}) new_state = cache.get(kernel, None) if new_state is None: closure = set(kernel) for rp in kernel: if not rp.is_satisfied and not rp.next.is_term: closure |= self.expand_rule(rp.next, self.lr0_rules_by_origin) new_state = LR0ItemSet(kernel, closure) cache[kernel] = new_state state.transitions[sym] = new_state yield new_state self.lr0_states.add(state) for _ in bfs(self.lr0_start_states.values(), step): pass def compute_reads_relations(self): # handle start state for root in self.lr0_start_states.values(): assert(len(root.kernel) == 1) for rp in root.kernel: assert(rp.index == 0) self.directly_reads[(root, rp.next)] = set([ Terminal('$END') ]) for state in self.lr0_states: seen = set() for rp in state.closure: if rp.is_satisfied: continue s = rp.next # if s is a not a nonterminal if s not in self.lr0_rules_by_origin: continue if s in seen: continue seen.add(s) nt = (state, s) self.nonterminal_transitions.append(nt) dr = self.directly_reads[nt] r = self.reads[nt] next_state = state.transitions[s] for rp2 in next_state.closure: if rp2.is_satisfied: continue s2 = rp2.next # if s2 is a terminal if s2 not in self.lr0_rules_by_origin: dr.add(s2) if s2 in self.NULLABLE: r.add((next_state, s2)) def compute_includes_lookback(self): for nt in self.nonterminal_transitions: state, nonterminal = nt includes = [] lookback = self.lookback[nt] for rp in state.closure: if rp.rule.origin != nonterminal: continue # traverse the states for rp(.rule) state2 = state for i in range(rp.index, len(rp.rule.expansion)): s = rp.rule.expansion[i] nt2 = (state2, s) state2 = state2.transitions[s] if nt2 not in self.reads: continue for j in range(i + 1, len(rp.rule.expansion)): if not rp.rule.expansion[j] in self.NULLABLE: break else: includes.append(nt2) # state2 is at the final state for rp.rule if rp.index == 0: for rp2 in state2.closure: if (rp2.rule == rp.rule) and rp2.is_satisfied: lookback.add((state2, rp2.rule)) for nt2 in includes: self.includes[nt2].add(nt) def compute_lookaheads(self): read_sets = digraph(self.nonterminal_transitions, self.reads, self.directly_reads) follow_sets = digraph(self.nonterminal_transitions, self.includes, read_sets) for nt, lookbacks in self.lookback.items(): for state, rule in lookbacks: for s in follow_sets[nt]: state.lookaheads[s].add(rule) def compute_lalr1_states(self): m = {} for state in self.lr0_states: actions = {} for la, next_state in state.transitions.items(): actions[la] = (Shift, next_state.closure) for la, rules in state.lookaheads.items(): if len(rules) > 1: raise GrammarError('Reduce/Reduce collision in %s between the following rules: %s' % (la, ''.join([ '\n\t\t- ' + str(r) for r in rules ]))) if la in actions: if self.debug: logging.warning('Shift/Reduce conflict for terminal %s: (resolving as shift)', la.name) logging.warning(' * %s', list(rules)[0]) else: actions[la] = (Reduce, list(rules)[0]) m[state] = { k.name: v for k, v in actions.items() } self.states = { k.closure: v for k, v in m.items() } # compute end states end_states = {} for state in self.states: for rp in state: for start in self.lr0_start_states: if rp.rule.origin.name == ('$root_' + start) and rp.is_satisfied: assert(not start in end_states) end_states[start] = state self._parse_table = ParseTable(self.states, { start: state.closure for start, state in self.lr0_start_states.items() }, end_states) if self.debug: self.parse_table = self._parse_table else: self.parse_table = IntParseTable.from_ParseTable(self._parse_table) def compute_lalr(self): self.compute_lr0_states() self.compute_reads_relations() self.compute_includes_lookback() self.compute_lookaheads() self.compute_lalr1_states()lark-0.8.1/lark/parsers/lalr_parser.py000066400000000000000000000063401361215331400177430ustar00rootroot00000000000000"""This module implements a LALR(1) Parser """ # Author: Erez Shinan (2017) # Email : erezshin@gmail.com from ..exceptions import UnexpectedToken from ..lexer import Token from ..utils import Enumerator, Serialize from .lalr_analysis import LALR_Analyzer, Shift, Reduce, IntParseTable ###{standalone class LALR_Parser(object): def __init__(self, parser_conf, debug=False): assert all(r.options.priority is None for r in parser_conf.rules), "LALR doesn't yet support prioritization" analysis = LALR_Analyzer(parser_conf, debug=debug) analysis.compute_lalr() callbacks = parser_conf.callbacks self._parse_table = analysis.parse_table self.parser_conf = parser_conf self.parser = _Parser(analysis.parse_table, callbacks) @classmethod def deserialize(cls, data, memo, callbacks): inst = cls.__new__(cls) inst._parse_table = IntParseTable.deserialize(data, memo) inst.parser = _Parser(inst._parse_table, callbacks) return inst def serialize(self, memo): return self._parse_table.serialize(memo) def parse(self, *args): return self.parser.parse(*args) class _Parser: def __init__(self, parse_table, callbacks): self.states = parse_table.states self.start_states = parse_table.start_states self.end_states = parse_table.end_states self.callbacks = callbacks def parse(self, seq, start, set_state=None): token = None stream = iter(seq) states = self.states start_state = self.start_states[start] end_state = self.end_states[start] state_stack = [start_state] value_stack = [] if set_state: set_state(start_state) def get_action(token): state = state_stack[-1] try: return states[state][token.type] except KeyError: expected = [s for s in states[state].keys() if s.isupper()] raise UnexpectedToken(token, expected, state=state) def reduce(rule): size = len(rule.expansion) if size: s = value_stack[-size:] del state_stack[-size:] del value_stack[-size:] else: s = [] value = self.callbacks[rule](s) _action, new_state = states[state_stack[-1]][rule.origin.name] assert _action is Shift state_stack.append(new_state) value_stack.append(value) # Main LALR-parser loop for token in stream: while True: action, arg = get_action(token) assert arg != end_state if action is Shift: state_stack.append(arg) value_stack.append(token) if set_state: set_state(arg) break # next token else: reduce(arg) token = Token.new_borrow_pos('$END', '', token) if token else Token('$END', '', 0, 1, 1) while True: _action, arg = get_action(token) assert(_action is Reduce) reduce(arg) if state_stack[-1] == end_state: return value_stack[-1] ###} lark-0.8.1/lark/parsers/xearley.py000066400000000000000000000152671361215331400171160ustar00rootroot00000000000000"""This module implements an experimental Earley parser with a dynamic lexer The core Earley algorithm used here is based on Elizabeth Scott's implementation, here: https://www.sciencedirect.com/science/article/pii/S1571066108001497 That is probably the best reference for understanding the algorithm here. The Earley parser outputs an SPPF-tree as per that document. The SPPF tree format is better documented here: http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/ Instead of running a lexer beforehand, or using a costy char-by-char method, this parser uses regular expressions by necessity, achieving high-performance while maintaining all of Earley's power in parsing any CFG. """ from collections import defaultdict from ..exceptions import UnexpectedCharacters from ..lexer import Token from ..grammar import Terminal from .earley import Parser as BaseParser from .earley_forest import SymbolNode class Parser(BaseParser): def __init__(self, parser_conf, term_matcher, resolve_ambiguity=True, ignore = (), complete_lex = False, debug=False): BaseParser.__init__(self, parser_conf, term_matcher, resolve_ambiguity, debug) self.ignore = [Terminal(t) for t in ignore] self.complete_lex = complete_lex def _parse(self, stream, columns, to_scan, start_symbol=None): def scan(i, to_scan): """The core Earley Scanner. This is a custom implementation of the scanner that uses the Lark lexer to match tokens. The scan list is built by the Earley predictor, based on the previously completed tokens. This ensures that at each phase of the parse we have a custom lexer context, allowing for more complex ambiguities.""" node_cache = {} # 1) Loop the expectations and ask the lexer to match. # Since regexp is forward looking on the input stream, and we only # want to process tokens when we hit the point in the stream at which # they complete, we push all tokens into a buffer (delayed_matches), to # be held possibly for a later parse step when we reach the point in the # input stream at which they complete. for item in set(to_scan): m = match(item.expect, stream, i) if m: t = Token(item.expect.name, m.group(0), i, text_line, text_column) delayed_matches[m.end()].append( (item, i, t) ) if self.complete_lex: s = m.group(0) for j in range(1, len(s)): m = match(item.expect, s[:-j]) if m: t = Token(item.expect.name, m.group(0), i, text_line, text_column) delayed_matches[i+m.end()].append( (item, i, t) ) # Remove any items that successfully matched in this pass from the to_scan buffer. # This ensures we don't carry over tokens that already matched, if we're ignoring below. to_scan.remove(item) # 3) Process any ignores. This is typically used for e.g. whitespace. # We carry over any unmatched items from the to_scan buffer to be matched again after # the ignore. This should allow us to use ignored symbols in non-terminals to implement # e.g. mandatory spacing. for x in self.ignore: m = match(x, stream, i) if m: # Carry over any items still in the scan buffer, to past the end of the ignored items. delayed_matches[m.end()].extend([(item, i, None) for item in to_scan ]) # If we're ignoring up to the end of the file, # carry over the start symbol if it already completed. delayed_matches[m.end()].extend([(item, i, None) for item in columns[i] if item.is_complete and item.s == start_symbol]) next_to_scan = set() next_set = set() columns.append(next_set) transitives.append({}) ## 4) Process Tokens from delayed_matches. # This is the core of the Earley scanner. Create an SPPF node for each Token, # and create the symbol node in the SPPF tree. Advance the item that completed, # and add the resulting new item to either the Earley set (for processing by the # completer/predictor) or the to_scan buffer for the next parse step. for item, start, token in delayed_matches[i+1]: if token is not None: token.end_line = text_line token.end_column = text_column + 1 new_item = item.advance() label = (new_item.s, new_item.start, i) new_item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, SymbolNode(*label)) new_item.node.add_family(new_item.s, item.rule, new_item.start, item.node, token) else: new_item = item if new_item.expect in self.TERMINALS: # add (B ::= Aai+1.B, h, y) to Q' next_to_scan.add(new_item) else: # add (B ::= Aa+1.B, h, y) to Ei+1 next_set.add(new_item) del delayed_matches[i+1] # No longer needed, so unburden memory if not next_set and not delayed_matches and not next_to_scan: raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan}, set(to_scan)) return next_to_scan delayed_matches = defaultdict(list) match = self.term_matcher # Cache for nodes & tokens created in a particular parse step. transitives = [{}] text_line = 1 text_column = 1 ## The main Earley loop. # Run the Prediction/Completion cycle for any Items in the current Earley set. # Completions will be added to the SPPF tree, and predictions will be recursively # processed down to terminals/empty nodes to be added to the scanner for the next # step. i = 0 for token in stream: self.predict_and_complete(i, to_scan, columns, transitives) to_scan = scan(i, to_scan) if token == '\n': text_line += 1 text_column = 1 else: text_column += 1 i += 1 self.predict_and_complete(i, to_scan, columns, transitives) ## Column is now the final column in the parse. assert i == len(columns)-1 return to_scanlark-0.8.1/lark/reconstruct.py000066400000000000000000000130271361215331400163310ustar00rootroot00000000000000from collections import defaultdict from .tree import Tree from .visitors import Transformer_InPlace from .common import ParserConf from .lexer import Token, PatternStr from .parsers import earley from .grammar import Rule, Terminal, NonTerminal def is_discarded_terminal(t): return t.is_term and t.filter_out def is_iter_empty(i): try: _ = next(i) return False except StopIteration: return True class WriteTokensTransformer(Transformer_InPlace): "Inserts discarded tokens into their correct place, according to the rules of grammar" def __init__(self, tokens, term_subs): self.tokens = tokens self.term_subs = term_subs def __default__(self, data, children, meta): if not getattr(meta, 'match_tree', False): return Tree(data, children) iter_args = iter(children) to_write = [] for sym in meta.orig_expansion: if is_discarded_terminal(sym): try: v = self.term_subs[sym.name](sym) except KeyError: t = self.tokens[sym.name] if not isinstance(t.pattern, PatternStr): raise NotImplementedError("Reconstructing regexps not supported yet: %s" % t) v = t.pattern.value to_write.append(v) else: x = next(iter_args) if isinstance(x, list): to_write += x else: if isinstance(x, Token): assert Terminal(x.type) == sym, x else: assert NonTerminal(x.data) == sym, (sym, x) to_write.append(x) assert is_iter_empty(iter_args) return to_write class MatchTree(Tree): pass class MakeMatchTree: def __init__(self, name, expansion): self.name = name self.expansion = expansion def __call__(self, args): t = MatchTree(self.name, args) t.meta.match_tree = True t.meta.orig_expansion = self.expansion return t def best_from_group(seq, group_key, cmp_key): d = {} for item in seq: key = group_key(item) if key in d: v1 = cmp_key(item) v2 = cmp_key(d[key]) if v2 > v1: d[key] = item else: d[key] = item return list(d.values()) class Reconstructor: def __init__(self, parser, term_subs={}): # XXX TODO calling compile twice returns different results! assert parser.options.maybe_placeholders == False tokens, rules, _grammar_extra = parser.grammar.compile(parser.options.start) self.write_tokens = WriteTokensTransformer({t.name:t for t in tokens}, term_subs) self.rules = list(self._build_recons_rules(rules)) self.rules.reverse() # Choose the best rule from each group of {rule => [rule.alias]}, since we only really need one derivation. self.rules = best_from_group(self.rules, lambda r: r, lambda r: -len(r.expansion)) self.rules.sort(key=lambda r: len(r.expansion)) callbacks = {rule: rule.alias for rule in self.rules} # TODO pass callbacks through dict, instead of alias? self.parser = earley.Parser(ParserConf(self.rules, callbacks, parser.options.start), self._match, resolve_ambiguity=True) def _build_recons_rules(self, rules): expand1s = {r.origin for r in rules if r.options.expand1} aliases = defaultdict(list) for r in rules: if r.alias: aliases[r.origin].append( r.alias ) rule_names = {r.origin for r in rules} nonterminals = {sym for sym in rule_names if sym.name.startswith('_') or sym in expand1s or sym in aliases } for r in rules: recons_exp = [sym if sym in nonterminals else Terminal(sym.name) for sym in r.expansion if not is_discarded_terminal(sym)] # Skip self-recursive constructs if recons_exp == [r.origin]: continue sym = NonTerminal(r.alias) if r.alias else r.origin yield Rule(sym, recons_exp, alias=MakeMatchTree(sym.name, r.expansion)) for origin, rule_aliases in aliases.items(): for alias in rule_aliases: yield Rule(origin, [Terminal(alias)], alias=MakeMatchTree(origin.name, [NonTerminal(alias)])) yield Rule(origin, [Terminal(origin.name)], alias=MakeMatchTree(origin.name, [origin])) def _match(self, term, token): if isinstance(token, Tree): return Terminal(token.data) == term elif isinstance(token, Token): return term == Terminal(token.type) assert False def _reconstruct(self, tree): # TODO: ambiguity? unreduced_tree = self.parser.parse(tree.children, tree.data) # find a full derivation assert unreduced_tree.data == tree.data res = self.write_tokens.transform(unreduced_tree) for item in res: if isinstance(item, Tree): for x in self._reconstruct(item): yield x else: yield item def reconstruct(self, tree): x = self._reconstruct(tree) y = [] prev_item = '' for item in x: if prev_item and item and prev_item[-1].isalnum() and item[0].isalnum(): y.append(' ') y.append(item) prev_item = item return ''.join(y) lark-0.8.1/lark/tools/000077500000000000000000000000001361215331400145415ustar00rootroot00000000000000lark-0.8.1/lark/tools/__init__.py000066400000000000000000000000001361215331400166400ustar00rootroot00000000000000lark-0.8.1/lark/tools/nearley.py000066400000000000000000000127431361215331400165610ustar00rootroot00000000000000"Converts between Lark and Nearley grammars. Work in progress!" import os.path import sys import codecs from lark import Lark, InlineTransformer nearley_grammar = r""" start: (ruledef|directive)+ directive: "@" NAME (STRING|NAME) | "@" JS -> js_code ruledef: NAME "->" expansions | NAME REGEXP "->" expansions -> macro expansions: expansion ("|" expansion)* expansion: expr+ js ?expr: item (":" /[+*?]/)? ?item: rule|string|regexp|null | "(" expansions ")" rule: NAME string: STRING regexp: REGEXP null: "null" JS: /{%.*?%}/s js: JS? NAME: /[a-zA-Z_$]\w*/ COMMENT: /#[^\n]*/ REGEXP: /\[.*?\]/ %import common.ESCAPED_STRING -> STRING %import common.WS %ignore WS %ignore COMMENT """ nearley_grammar_parser = Lark(nearley_grammar, parser='earley', lexer='standard') def _get_rulename(name): name = {'_': '_ws_maybe', '__':'_ws'}.get(name, name) return 'n_' + name.replace('$', '__DOLLAR__').lower() class NearleyToLark(InlineTransformer): def __init__(self): self._count = 0 self.extra_rules = {} self.extra_rules_rev = {} self.alias_js_code = {} def _new_function(self, code): name = 'alias_%d' % self._count self._count += 1 self.alias_js_code[name] = code return name def _extra_rule(self, rule): if rule in self.extra_rules_rev: return self.extra_rules_rev[rule] name = 'xrule_%d' % len(self.extra_rules) assert name not in self.extra_rules self.extra_rules[name] = rule self.extra_rules_rev[rule] = name return name def rule(self, name): return _get_rulename(name) def ruledef(self, name, exps): return '!%s: %s' % (_get_rulename(name), exps) def expr(self, item, op): rule = '(%s)%s' % (item, op) return self._extra_rule(rule) def regexp(self, r): return '/%s/' % r def null(self): return '' def string(self, s): return self._extra_rule(s) def expansion(self, *x): x, js = x[:-1], x[-1] if js.children: js_code ,= js.children js_code = js_code[2:-2] alias = '-> ' + self._new_function(js_code) else: alias = '' return ' '.join(x) + alias def expansions(self, *x): return '%s' % ('\n |'.join(x)) def start(self, *rules): return '\n'.join(filter(None, rules)) def _nearley_to_lark(g, builtin_path, n2l, js_code, folder_path, includes): rule_defs = [] tree = nearley_grammar_parser.parse(g) for statement in tree.children: if statement.data == 'directive': directive, arg = statement.children if directive in ('builtin', 'include'): folder = builtin_path if directive == 'builtin' else folder_path path = os.path.join(folder, arg[1:-1]) if path not in includes: includes.add(path) with codecs.open(path, encoding='utf8') as f: text = f.read() rule_defs += _nearley_to_lark(text, builtin_path, n2l, js_code, os.path.abspath(os.path.dirname(path)), includes) else: assert False, directive elif statement.data == 'js_code': code ,= statement.children code = code[2:-2] js_code.append(code) elif statement.data == 'macro': pass # TODO Add support for macros! elif statement.data == 'ruledef': rule_defs.append( n2l.transform(statement) ) else: raise Exception("Unknown statement: %s" % statement) return rule_defs def create_code_for_nearley_grammar(g, start, builtin_path, folder_path): import js2py emit_code = [] def emit(x=None): if x: emit_code.append(x) emit_code.append('\n') js_code = ['function id(x) {return x[0];}'] n2l = NearleyToLark() rule_defs = _nearley_to_lark(g, builtin_path, n2l, js_code, folder_path, set()) lark_g = '\n'.join(rule_defs) lark_g += '\n'+'\n'.join('!%s: %s' % item for item in n2l.extra_rules.items()) emit('from lark import Lark, Transformer') emit() emit('grammar = ' + repr(lark_g)) emit() for alias, code in n2l.alias_js_code.items(): js_code.append('%s = (%s);' % (alias, code)) emit(js2py.translate_js('\n'.join(js_code))) emit('class TransformNearley(Transformer):') for alias in n2l.alias_js_code: emit(" %s = var.get('%s').to_python()" % (alias, alias)) emit(" __default__ = lambda self, n, c, m: c if c else None") emit() emit('parser = Lark(grammar, start="n_%s", maybe_placeholders=False)' % start) emit('def parse(text):') emit(' return TransformNearley().transform(parser.parse(text))') return ''.join(emit_code) def main(fn, start, nearley_lib): with codecs.open(fn, encoding='utf8') as f: grammar = f.read() return create_code_for_nearley_grammar(grammar, start, os.path.join(nearley_lib, 'builtin'), os.path.abspath(os.path.dirname(fn))) if __name__ == '__main__': if len(sys.argv) < 4: print("Reads Nearley grammar (with js functions) outputs an equivalent lark parser.") print("Usage: %s " % sys.argv[0]) sys.exit(1) fn, start, nearley_lib = sys.argv[1:] print(main(fn, start, nearley_lib)) lark-0.8.1/lark/tools/serialize.py000066400000000000000000000030571361215331400171070ustar00rootroot00000000000000import codecs import sys import json from lark import Lark from lark.grammar import RuleOptions, Rule from lark.lexer import TerminalDef import argparse argparser = argparse.ArgumentParser(prog='python -m lark.tools.serialize') #description='''Lark Serialization Tool -- Stores Lark's internal state & LALR analysis as a convenient JSON file''') argparser.add_argument('grammar_file', type=argparse.FileType('r'), help='A valid .lark file') argparser.add_argument('-o', '--out', type=argparse.FileType('w'), default=sys.stdout, help='json file path to create (default=stdout)') argparser.add_argument('-s', '--start', default='start', help='start symbol (default="start")', nargs='+') argparser.add_argument('-l', '--lexer', default='standard', choices=['standard', 'contextual'], help='lexer type (default="standard")') def serialize(infile, outfile, lexer, start): lark_inst = Lark(infile, parser="lalr", lexer=lexer, start=start) # TODO contextual data, memo = lark_inst.memo_serialize([TerminalDef, Rule]) outfile.write('{\n') outfile.write(' "data": %s,\n' % json.dumps(data)) outfile.write(' "memo": %s\n' % json.dumps(memo)) outfile.write('}\n') def main(): if len(sys.argv) == 1 or '-h' in sys.argv or '--help' in sys.argv: print("Lark Serialization Tool - Stores Lark's internal state & LALR analysis as a JSON file") print("") argparser.print_help() else: args = argparser.parse_args() serialize(args.grammar_file, args.out, args.lexer, args.start) if __name__ == '__main__': main()lark-0.8.1/lark/tools/standalone.py000066400000000000000000000071721361215331400172520ustar00rootroot00000000000000###{standalone # # # Lark Stand-alone Generator Tool # ---------------------------------- # Generates a stand-alone LALR(1) parser with a standard lexer # # Git: https://github.com/erezsh/lark # Author: Erez Shinan (erezshin@gmail.com) # # # >>> LICENSE # # This tool and its generated code use a separate license from Lark. # # It is licensed under GPLv2 or above. # # If you wish to purchase a commercial license for this tool and its # generated code, contact me via email. # # If GPL is incompatible with your free or open-source project, # contact me and we'll work it out (for free). # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # See . # # import os from io import open ###} import pprint import codecs import sys import os from pprint import pprint from os import path from collections import defaultdict import lark from lark import Lark from lark.parsers.lalr_analysis import Reduce from lark.grammar import RuleOptions, Rule from lark.lexer import TerminalDef _dir = path.dirname(__file__) _larkdir = path.join(_dir, path.pardir) EXTRACT_STANDALONE_FILES = [ 'tools/standalone.py', 'exceptions.py', 'utils.py', 'tree.py', 'visitors.py', 'indenter.py', 'grammar.py', 'lexer.py', 'common.py', 'parse_tree_builder.py', 'parsers/lalr_parser.py', 'parsers/lalr_analysis.py', 'parser_frontends.py', 'lark.py', ] def extract_sections(lines): section = None text = [] sections = defaultdict(list) for l in lines: if l.startswith('###'): if l[3] == '{': section = l[4:].strip() elif l[3] == '}': sections[section] += text section = None text = [] else: raise ValueError(l) elif section: text.append(l) return {name:''.join(text) for name, text in sections.items()} def main(fobj, start): lark_inst = Lark(fobj, parser="lalr", lexer="contextual", start=start) print('# The file was automatically generated by Lark v%s' % lark.__version__) for pyfile in EXTRACT_STANDALONE_FILES: with open(os.path.join(_larkdir, pyfile)) as f: print (extract_sections(f)['standalone']) data, m = lark_inst.memo_serialize([TerminalDef, Rule]) print( 'DATA = (' ) # pprint(data, width=160) print(data) print(')') print( 'MEMO = (') print(m) print(')') print('Shift = 0') print('Reduce = 1') print("def Lark_StandAlone(transformer=None, postlex=None):") print(" namespace = {'Rule': Rule, 'TerminalDef': TerminalDef}") print(" return Lark.deserialize(DATA, namespace, MEMO, transformer=transformer, postlex=postlex)") if __name__ == '__main__': if len(sys.argv) < 2: print("Lark Stand-alone Generator Tool") print("Usage: python -m lark.tools.standalone []") sys.exit(1) if len(sys.argv) == 3: fn, start = sys.argv[1:] elif len(sys.argv) == 2: fn, start = sys.argv[1], 'start' else: assert False, sys.argv with codecs.open(fn, encoding='utf8') as f: main(f, start) lark-0.8.1/lark/tree.py000066400000000000000000000120761361215331400147200ustar00rootroot00000000000000try: from future_builtins import filter except ImportError: pass from copy import deepcopy ###{standalone class Meta: def __init__(self): self.empty = True class Tree(object): def __init__(self, data, children, meta=None): self.data = data self.children = children self._meta = meta @property def meta(self): if self._meta is None: self._meta = Meta() return self._meta def __repr__(self): return 'Tree(%s, %s)' % (self.data, self.children) def _pretty_label(self): return self.data def _pretty(self, level, indent_str): if len(self.children) == 1 and not isinstance(self.children[0], Tree): return [ indent_str*level, self._pretty_label(), '\t', '%s' % (self.children[0],), '\n'] l = [ indent_str*level, self._pretty_label(), '\n' ] for n in self.children: if isinstance(n, Tree): l += n._pretty(level+1, indent_str) else: l += [ indent_str*(level+1), '%s' % (n,), '\n' ] return l def pretty(self, indent_str=' '): return ''.join(self._pretty(0, indent_str)) def __eq__(self, other): try: return self.data == other.data and self.children == other.children except AttributeError: return False def __ne__(self, other): return not (self == other) def __hash__(self): return hash((self.data, tuple(self.children))) def iter_subtrees(self): # TODO: Re-write as a more efficient version visited = set() q = [self] l = [] while q: subtree = q.pop() l.append( subtree ) if id(subtree) in visited: continue # already been here from another branch visited.add(id(subtree)) q += [c for c in subtree.children if isinstance(c, Tree)] seen = set() for x in reversed(l): if id(x) not in seen: yield x seen.add(id(x)) def find_pred(self, pred): "Find all nodes where pred(tree) == True" return filter(pred, self.iter_subtrees()) def find_data(self, data): "Find all nodes where tree.data == data" return self.find_pred(lambda t: t.data == data) ###} def expand_kids_by_index(self, *indices): "Expand (inline) children at the given indices" for i in sorted(indices, reverse=True): # reverse so that changing tail won't affect indices kid = self.children[i] self.children[i:i+1] = kid.children def scan_values(self, pred): for c in self.children: if isinstance(c, Tree): for t in c.scan_values(pred): yield t else: if pred(c): yield c def iter_subtrees_topdown(self): stack = [self] while stack: node = stack.pop() if not isinstance(node, Tree): continue yield node for n in reversed(node.children): stack.append(n) def __deepcopy__(self, memo): return type(self)(self.data, deepcopy(self.children, memo)) def copy(self): return type(self)(self.data, self.children) def set(self, data, children): self.data = data self.children = children # XXX Deprecated! Here for backwards compatibility <0.6.0 @property def line(self): return self.meta.line @property def column(self): return self.meta.column @property def end_line(self): return self.meta.end_line @property def end_column(self): return self.meta.end_column class SlottedTree(Tree): __slots__ = 'data', 'children', 'rule', '_meta' def pydot__tree_to_png(tree, filename, rankdir="LR", **kwargs): """Creates a colorful image that represents the tree (data+children, without meta) Possible values for `rankdir` are "TB", "LR", "BT", "RL", corresponding to directed graphs drawn from top to bottom, from left to right, from bottom to top, and from right to left, respectively. `kwargs` can be any graph attribute (e. g. `dpi=200`). For a list of possible attributes, see https://www.graphviz.org/doc/info/attrs.html. """ import pydot graph = pydot.Dot(graph_type='digraph', rankdir=rankdir, **kwargs) i = [0] def new_leaf(leaf): node = pydot.Node(i[0], label=repr(leaf)) i[0] += 1 graph.add_node(node) return node def _to_pydot(subtree): color = hash(subtree.data) & 0xffffff color |= 0x808080 subnodes = [_to_pydot(child) if isinstance(child, Tree) else new_leaf(child) for child in subtree.children] node = pydot.Node(i[0], style="filled", fillcolor="#%x"%color, label=subtree.data) i[0] += 1 graph.add_node(node) for subnode in subnodes: graph.add_edge(pydot.Edge(node, subnode)) return node _to_pydot(tree) graph.write_png(filename) lark-0.8.1/lark/utils.py000066400000000000000000000153001361215331400151120ustar00rootroot00000000000000import sys from ast import literal_eval from collections import deque class fzset(frozenset): def __repr__(self): return '{%s}' % ', '.join(map(repr, self)) def classify_bool(seq, pred): true_elems = [] false_elems = [] for elem in seq: if pred(elem): true_elems.append(elem) else: false_elems.append(elem) return true_elems, false_elems def bfs(initial, expand): open_q = deque(list(initial)) visited = set(open_q) while open_q: node = open_q.popleft() yield node for next_node in expand(node): if next_node not in visited: visited.add(next_node) open_q.append(next_node) def _serialize(value, memo): # if memo and memo.in_types(value): # return {'__memo__': memo.memoized.get(value)} if isinstance(value, Serialize): return value.serialize(memo) elif isinstance(value, list): return [_serialize(elem, memo) for elem in value] elif isinstance(value, frozenset): return list(value) # TODO reversible? elif isinstance(value, dict): return {key:_serialize(elem, memo) for key, elem in value.items()} return value ###{standalone def classify(seq, key=None, value=None): d = {} for item in seq: k = key(item) if (key is not None) else item v = value(item) if (value is not None) else item if k in d: d[k].append(v) else: d[k] = [v] return d def _deserialize(data, namespace, memo): if isinstance(data, dict): if '__type__' in data: # Object class_ = namespace[data['__type__']] return class_.deserialize(data, memo) elif '@' in data: return memo[data['@']] return {key:_deserialize(value, namespace, memo) for key, value in data.items()} elif isinstance(data, list): return [_deserialize(value, namespace, memo) for value in data] return data class Serialize(object): def memo_serialize(self, types_to_memoize): memo = SerializeMemoizer(types_to_memoize) return self.serialize(memo), memo.serialize() def serialize(self, memo=None): if memo and memo.in_types(self): return {'@': memo.memoized.get(self)} fields = getattr(self, '__serialize_fields__') res = {f: _serialize(getattr(self, f), memo) for f in fields} res['__type__'] = type(self).__name__ postprocess = getattr(self, '_serialize', None) if postprocess: postprocess(res, memo) return res @classmethod def deserialize(cls, data, memo): namespace = getattr(cls, '__serialize_namespace__', {}) namespace = {c.__name__:c for c in namespace} fields = getattr(cls, '__serialize_fields__') if '@' in data: return memo[data['@']] inst = cls.__new__(cls) for f in fields: try: setattr(inst, f, _deserialize(data[f], namespace, memo)) except KeyError as e: raise KeyError("Cannot find key for class", cls, e) postprocess = getattr(inst, '_deserialize', None) if postprocess: postprocess() return inst class SerializeMemoizer(Serialize): __serialize_fields__ = 'memoized', def __init__(self, types_to_memoize): self.types_to_memoize = tuple(types_to_memoize) self.memoized = Enumerator() def in_types(self, value): return isinstance(value, self.types_to_memoize) def serialize(self): return _serialize(self.memoized.reversed(), None) @classmethod def deserialize(cls, data, namespace, memo): return _deserialize(data, namespace, memo) try: STRING_TYPE = basestring except NameError: # Python 3 STRING_TYPE = str import types from functools import wraps, partial from contextlib import contextmanager Str = type(u'') try: classtype = types.ClassType # Python2 except AttributeError: classtype = type # Python3 def smart_decorator(f, create_decorator): if isinstance(f, types.FunctionType): return wraps(f)(create_decorator(f, True)) elif isinstance(f, (classtype, type, types.BuiltinFunctionType)): return wraps(f)(create_decorator(f, False)) elif isinstance(f, types.MethodType): return wraps(f)(create_decorator(f.__func__, True)) elif isinstance(f, partial): # wraps does not work for partials in 2.7: https://bugs.python.org/issue3445 return wraps(f.func)(create_decorator(lambda *args, **kw: f(*args[1:], **kw), True)) else: return create_decorator(f.__func__.__call__, True) import sys, re Py36 = (sys.version_info[:2] >= (3, 6)) import sre_parse import sre_constants def get_regexp_width(regexp): try: return [int(x) for x in sre_parse.parse(regexp).getwidth()] except sre_constants.error: raise ValueError(regexp) ###} def dedup_list(l): """Given a list (l) will removing duplicates from the list, preserving the original order of the list. Assumes that the list entrie are hashable.""" dedup = set() return [ x for x in l if not (x in dedup or dedup.add(x))] try: from contextlib import suppress # Python 3 except ImportError: @contextmanager def suppress(*excs): '''Catch and dismiss the provided exception >>> x = 'hello' >>> with suppress(IndexError): ... x = x[10] >>> x 'hello' ''' try: yield except excs: pass try: compare = cmp except NameError: def compare(a, b): if a == b: return 0 elif a > b: return 1 return -1 class Enumerator(Serialize): def __init__(self): self.enums = {} def get(self, item): if item not in self.enums: self.enums[item] = len(self.enums) return self.enums[item] def __len__(self): return len(self.enums) def reversed(self): r = {v: k for k, v in self.enums.items()} assert len(r) == len(self.enums) return r def eval_escaping(s): w = '' i = iter(s) for n in i: w += n if n == '\\': try: n2 = next(i) except StopIteration: raise ValueError("Literal ended unexpectedly (bad escaping): `%r`" % s) if n2 == '\\': w += '\\\\' elif n2 not in 'uxnftr': w += '\\' w += n2 w = w.replace('\\"', '"').replace("'", "\\'") to_eval = "u'''%s'''" % w try: s = literal_eval(to_eval) except SyntaxError as e: raise ValueError(s, e) return s lark-0.8.1/lark/visitors.py000066400000000000000000000233651361215331400156460ustar00rootroot00000000000000from functools import wraps from .utils import smart_decorator from .tree import Tree from .exceptions import VisitError, GrammarError from .lexer import Token ###{standalone from inspect import getmembers, getmro class Discard(Exception): pass # Transformers class Transformer: """Visits the tree recursively, starting with the leaves and finally the root (bottom-up) Calls its methods (provided by user via inheritance) according to tree.data The returned value replaces the old one in the structure. Can be used to implement map or reduce. """ __visit_tokens__ = True # For backwards compatibility def __init__(self, visit_tokens=True): self.__visit_tokens__ = visit_tokens def _call_userfunc(self, tree, new_children=None): # Assumes tree is already transformed children = new_children if new_children is not None else tree.children try: f = getattr(self, tree.data) except AttributeError: return self.__default__(tree.data, children, tree.meta) else: try: wrapper = getattr(f, 'visit_wrapper', None) if wrapper is not None: return f.visit_wrapper(f, tree.data, children, tree.meta) else: return f(children) except (GrammarError, Discard): raise except Exception as e: raise VisitError(tree.data, tree, e) def _call_userfunc_token(self, token): try: f = getattr(self, token.type) except AttributeError: return self.__default_token__(token) else: try: return f(token) except (GrammarError, Discard): raise except Exception as e: raise VisitError(token.type, token, e) def _transform_children(self, children): for c in children: try: if isinstance(c, Tree): yield self._transform_tree(c) elif self.__visit_tokens__ and isinstance(c, Token): yield self._call_userfunc_token(c) else: yield c except Discard: pass def _transform_tree(self, tree): children = list(self._transform_children(tree.children)) return self._call_userfunc(tree, children) def transform(self, tree): return self._transform_tree(tree) def __mul__(self, other): return TransformerChain(self, other) def __default__(self, data, children, meta): "Default operation on tree (for override)" return Tree(data, children, meta) def __default_token__(self, token): "Default operation on token (for override)" return token @classmethod def _apply_decorator(cls, decorator, **kwargs): mro = getmro(cls) assert mro[0] is cls libmembers = {name for _cls in mro[1:] for name, _ in getmembers(_cls)} for name, value in getmembers(cls): # Make sure the function isn't inherited (unless it's overwritten) if name.startswith('_') or (name in libmembers and name not in cls.__dict__): continue if not callable(cls.__dict__[name]): continue # Skip if v_args already applied (at the function level) if hasattr(cls.__dict__[name], 'vargs_applied'): continue static = isinstance(cls.__dict__[name], (staticmethod, classmethod)) setattr(cls, name, decorator(value, static=static, **kwargs)) return cls class InlineTransformer(Transformer): # XXX Deprecated def _call_userfunc(self, tree, new_children=None): # Assumes tree is already transformed children = new_children if new_children is not None else tree.children try: f = getattr(self, tree.data) except AttributeError: return self.__default__(tree.data, children, tree.meta) else: return f(*children) class TransformerChain(object): def __init__(self, *transformers): self.transformers = transformers def transform(self, tree): for t in self.transformers: tree = t.transform(tree) return tree def __mul__(self, other): return TransformerChain(*self.transformers + (other,)) class Transformer_InPlace(Transformer): "Non-recursive. Changes the tree in-place instead of returning new instances" def _transform_tree(self, tree): # Cancel recursion return self._call_userfunc(tree) def transform(self, tree): for subtree in tree.iter_subtrees(): subtree.children = list(self._transform_children(subtree.children)) return self._transform_tree(tree) class Transformer_InPlaceRecursive(Transformer): "Recursive. Changes the tree in-place instead of returning new instances" def _transform_tree(self, tree): tree.children = list(self._transform_children(tree.children)) return self._call_userfunc(tree) # Visitors class VisitorBase: def _call_userfunc(self, tree): return getattr(self, tree.data, self.__default__)(tree) def __default__(self, tree): "Default operation on tree (for override)" return tree class Visitor(VisitorBase): """Bottom-up visitor, non-recursive Visits the tree, starting with the leaves and finally the root (bottom-up) Calls its methods (provided by user via inheritance) according to tree.data """ def visit(self, tree): for subtree in tree.iter_subtrees(): self._call_userfunc(subtree) return tree def visit_topdown(self,tree): for subtree in tree.iter_subtrees_topdown(): self._call_userfunc(subtree) return tree class Visitor_Recursive(VisitorBase): """Bottom-up visitor, recursive Visits the tree, starting with the leaves and finally the root (bottom-up) Calls its methods (provided by user via inheritance) according to tree.data """ def visit(self, tree): for child in tree.children: if isinstance(child, Tree): self.visit(child) self._call_userfunc(tree) return tree def visit_topdown(self,tree): self._call_userfunc(tree) for child in tree.children: if isinstance(child, Tree): self.visit_topdown(child) return tree def visit_children_decor(func): "See Interpreter" @wraps(func) def inner(cls, tree): values = cls.visit_children(tree) return func(cls, values) return inner class Interpreter: """Top-down visitor, recursive Visits the tree, starting with the root and finally the leaves (top-down) Calls its methods (provided by user via inheritance) according to tree.data Unlike Transformer and Visitor, the Interpreter doesn't automatically visit its sub-branches. The user has to explicitly call visit_children, or use the @visit_children_decor """ def visit(self, tree): return getattr(self, tree.data)(tree) def visit_children(self, tree): return [self.visit(child) if isinstance(child, Tree) else child for child in tree.children] def __getattr__(self, name): return self.__default__ def __default__(self, tree): return self.visit_children(tree) # Decorators def _apply_decorator(obj, decorator, **kwargs): try: _apply = obj._apply_decorator except AttributeError: return decorator(obj, **kwargs) else: return _apply(decorator, **kwargs) def _inline_args__func(func): @wraps(func) def create_decorator(_f, with_self): if with_self: def f(self, children): return _f(self, *children) else: def f(self, children): return _f(*children) return f return smart_decorator(func, create_decorator) def inline_args(obj): # XXX Deprecated return _apply_decorator(obj, _inline_args__func) def _visitor_args_func_dec(func, visit_wrapper=None, static=False): def create_decorator(_f, with_self): if with_self: def f(self, *args, **kwargs): return _f(self, *args, **kwargs) else: def f(self, *args, **kwargs): return _f(*args, **kwargs) return f if static: f = wraps(func)(create_decorator(func, False)) else: f = smart_decorator(func, create_decorator) f.vargs_applied = True f.visit_wrapper = visit_wrapper return f def _vargs_inline(f, data, children, meta): return f(*children) def _vargs_meta_inline(f, data, children, meta): return f(meta, *children) def _vargs_meta(f, data, children, meta): return f(children, meta) # TODO swap these for consistency? Backwards incompatible! def _vargs_tree(f, data, children, meta): return f(Tree(data, children, meta)) def v_args(inline=False, meta=False, tree=False, wrapper=None): "A convenience decorator factory, for modifying the behavior of user-supplied visitor methods" if tree and (meta or inline): raise ValueError("Visitor functions cannot combine 'tree' with 'meta' or 'inline'.") func = None if meta: if inline: func = _vargs_meta_inline else: func = _vargs_meta elif inline: func = _vargs_inline elif tree: func = _vargs_tree if wrapper is not None: if func is not None: raise ValueError("Cannot use 'wrapper' along with 'tree', 'meta' or 'inline'.") func = wrapper def _visitor_args_dec(obj): return _apply_decorator(obj, _visitor_args_func_dec, visit_wrapper=func) return _visitor_args_dec ###} lark-0.8.1/mkdocs.yml000066400000000000000000000006711361215331400144570ustar00rootroot00000000000000site_name: Lark theme: readthedocs pages: - Main Page: index.md - Philosophy: philosophy.md - Features: features.md - Parsers: parsers.md - How To Use (Guide): how_to_use.md - How To Develop (Guide): how_to_develop.md - Grammar Reference: grammar.md - Tree Construction Reference: tree_construction.md - Visitors and Transformers: visitors.md - Classes Reference: classes.md - Recipes: recipes.md lark-0.8.1/nearley-requirements.txt000066400000000000000000000000141361215331400173640ustar00rootroot00000000000000Js2Py==0.50 lark-0.8.1/readthedocs.yml000066400000000000000000000001571361215331400154630ustar00rootroot00000000000000version: 2 mkdocs: configuration: mkdocs.yml fail_on_warning: false formats: all python: version: 3.5 lark-0.8.1/setup.cfg000066400000000000000000000001611361215331400142670ustar00rootroot00000000000000[global] zip_safe= [bdist_wheel] universal = 1 [metadata] description-file = README.md license_file = LICENSE lark-0.8.1/setup.py000066400000000000000000000036161361215331400141700ustar00rootroot00000000000000import re from setuptools import setup __version__ ,= re.findall('__version__ = "(.*)"', open('lark/__init__.py').read()) setup( name = "lark-parser", version = __version__, packages = ['lark', 'lark.parsers', 'lark.tools', 'lark.grammars'], requires = [], install_requires = [], package_data = { '': ['*.md', '*.lark'] }, test_suite = 'tests.__main__', # metadata for upload to PyPI author = "Erez Shinan", author_email = "erezshin@gmail.com", description = "a modern parsing library", license = "MIT", keywords = "Earley LALR parser parsing ast", url = "https://github.com/erezsh/lark", download_url = "https://github.com/erezsh/lark/tarball/master", long_description=''' Lark is a modern general-purpose parsing library for Python. With Lark, you can parse any context-free grammar, efficiently, with very little code. Main Features: - Builds a parse-tree (AST) automagically, based on the structure of the grammar - Earley parser - Can parse all context-free grammars - Full support for ambiguous grammars - LALR(1) parser - Fast and light, competitive with PLY - Can generate a stand-alone parser - CYK parser, for highly ambiguous grammars - EBNF grammar - Unicode fully supported - Python 2 & 3 compatible - Automatic line & column tracking - Standard library of terminals (strings, numbers, names, etc.) - Import grammars from Nearley.js - Extensive test suite - And much more! ''', classifiers=[ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: General", "Topic :: Text Processing :: Linguistic", "License :: OSI Approved :: MIT License", ], ) lark-0.8.1/tests/000077500000000000000000000000001361215331400136125ustar00rootroot00000000000000lark-0.8.1/tests/__init__.py000066400000000000000000000000001361215331400157110ustar00rootroot00000000000000lark-0.8.1/tests/__main__.py000066400000000000000000000016001361215331400157010ustar00rootroot00000000000000from __future__ import absolute_import, print_function import unittest import logging from .test_trees import TestTrees from .test_tools import TestStandalone from .test_reconstructor import TestReconstructor try: from .test_nearley.test_nearley import TestNearley except ImportError: logging.warning("Warning: Skipping tests for Nearley grammar imports (js2py required)") # from .test_selectors import TestSelectors # from .test_grammars import TestPythonG, TestConfigG from .test_parser import ( TestLalrStandard, TestEarleyStandard, TestCykStandard, TestLalrContextual, TestEarleyDynamic, TestLalrCustom, # TestFullEarleyStandard, TestFullEarleyDynamic, TestFullEarleyDynamic_complete, TestParsers, ) logging.basicConfig(level=logging.INFO) if __name__ == '__main__': unittest.main() lark-0.8.1/tests/grammars/000077500000000000000000000000001361215331400154235ustar00rootroot00000000000000lark-0.8.1/tests/grammars/ab.lark000066400000000000000000000001251361215331400166560ustar00rootroot00000000000000startab: expr expr: A B | A expr B A: "a" B: "b" %import common.WS %ignore WS lark-0.8.1/tests/grammars/leading_underscore_grammar.lark000066400000000000000000000000431361215331400236350ustar00rootroot00000000000000A: "A" _SEP: "x" _a: A c: _a _SEPlark-0.8.1/tests/grammars/test.lark000066400000000000000000000000741361215331400172560ustar00rootroot00000000000000%import common.NUMBER %import common.WORD %import common.WS lark-0.8.1/tests/grammars/test_relative_import_of_nested_grammar.lark000066400000000000000000000001511361215331400262730ustar00rootroot00000000000000 start: rule_to_import %import .test_relative_import_of_nested_grammar__grammar_to_import.rule_to_importlark-0.8.1/tests/grammars/test_relative_import_of_nested_grammar__grammar_to_import.lark000066400000000000000000000001621361215331400322360ustar00rootroot00000000000000 rule_to_import: NESTED_TERMINAL %import .test_relative_import_of_nested_grammar__nested_grammar.NESTED_TERMINAL lark-0.8.1/tests/grammars/test_relative_import_of_nested_grammar__nested_grammar.lark000066400000000000000000000000251361215331400315020ustar00rootroot00000000000000NESTED_TERMINAL: "N" lark-0.8.1/tests/grammars/test_unicode.lark000066400000000000000000000000331361215331400207570ustar00rootroot00000000000000UNICODE : /[a-zØ-öø-ÿ]/lark-0.8.1/tests/grammars/three_rules_using_same_token.lark000066400000000000000000000000521361215331400242260ustar00rootroot00000000000000%import common.INT a: A b: A c: A A: "A"lark-0.8.1/tests/test_nearley/000077500000000000000000000000001361215331400163105ustar00rootroot00000000000000lark-0.8.1/tests/test_nearley/__init__.py000066400000000000000000000000001361215331400204070ustar00rootroot00000000000000lark-0.8.1/tests/test_nearley/grammars/000077500000000000000000000000001361215331400201215ustar00rootroot00000000000000lark-0.8.1/tests/test_nearley/grammars/include_unicode.ne000066400000000000000000000000411361215331400235710ustar00rootroot00000000000000@include "unicode.ne" main -> x lark-0.8.1/tests/test_nearley/grammars/unicode.ne000066400000000000000000000000131361215331400220650ustar00rootroot00000000000000x -> "±a" lark-0.8.1/tests/test_nearley/nearley/000077500000000000000000000000001361215331400177475ustar00rootroot00000000000000lark-0.8.1/tests/test_nearley/test_nearley.py000066400000000000000000000057541361215331400213730ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import import unittest import logging import os import codecs logging.basicConfig(level=logging.INFO) from lark.tools.nearley import create_code_for_nearley_grammar, main as nearley_tool_main TEST_PATH = os.path.abspath(os.path.dirname(__file__)) NEARLEY_PATH = os.path.join(TEST_PATH, 'nearley') BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin') if not os.path.exists(NEARLEY_PATH): logging.warn("Nearley not installed. Skipping Nearley tests!") raise ImportError("Skipping Nearley tests!") import js2py # Ensures that js2py exists, to avoid failing tests class TestNearley(unittest.TestCase): def test_css(self): fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne') with open(fn) as f: grammar = f.read() code = create_code_for_nearley_grammar(grammar, 'csscolor', BUILTIN_PATH, os.path.dirname(fn)) d = {} exec (code, d) parse = d['parse'] c = parse('#a199ff') assert c['r'] == 161 assert c['g'] == 153 assert c['b'] == 255 c = parse('rgb(255, 70%, 3)') assert c['r'] == 255 assert c['g'] == 178 assert c['b'] == 3 def test_include(self): fn = os.path.join(NEARLEY_PATH, 'test/grammars/folder-test.ne') with open(fn) as f: grammar = f.read() code = create_code_for_nearley_grammar(grammar, 'main', BUILTIN_PATH, os.path.dirname(fn)) d = {} exec (code, d) parse = d['parse'] parse('a') parse('b') def test_multi_include(self): fn = os.path.join(NEARLEY_PATH, 'test/grammars/multi-include-test.ne') with open(fn) as f: grammar = f.read() code = create_code_for_nearley_grammar(grammar, 'main', BUILTIN_PATH, os.path.dirname(fn)) d = {} exec (code, d) parse = d['parse'] parse('a') parse('b') parse('c') def test_utf8(self): grammar = u'main -> "±a"' code = create_code_for_nearley_grammar(grammar, 'main', BUILTIN_PATH, './') d = {} exec (code, d) parse = d['parse'] parse(u'±a') def test_backslash(self): grammar = r'main -> "\""' code = create_code_for_nearley_grammar(grammar, 'main', BUILTIN_PATH, './') d = {} exec (code, d) parse = d['parse'] parse(u'"') def test_null(self): grammar = r'main -> "a" | null' code = create_code_for_nearley_grammar(grammar, 'main', BUILTIN_PATH, './') d = {} exec (code, d) parse = d['parse'] parse('a') parse('') def test_utf8_2(self): fn = os.path.join(TEST_PATH, 'grammars/unicode.ne') nearley_tool_main(fn, 'x', NEARLEY_PATH) def test_include_utf8(self): fn = os.path.join(TEST_PATH, 'grammars/include_unicode.ne') nearley_tool_main(fn, 'main', NEARLEY_PATH) if __name__ == '__main__': unittest.main() lark-0.8.1/tests/test_parser.py000066400000000000000000001642641361215331400165340ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import import unittest import logging import os import sys from copy import deepcopy try: from cStringIO import StringIO as cStringIO except ImportError: # Available only in Python 2.x, 3.x only has io.StringIO from below cStringIO = None from io import ( StringIO as uStringIO, open, ) logging.basicConfig(level=logging.INFO) from lark.lark import Lark from lark.exceptions import GrammarError, ParseError, UnexpectedToken, UnexpectedInput, UnexpectedCharacters from lark.tree import Tree from lark.visitors import Transformer, Transformer_InPlace, v_args from lark.grammar import Rule from lark.lexer import TerminalDef, Lexer, TraditionalLexer __path__ = os.path.dirname(__file__) def _read(n, *args): with open(os.path.join(__path__, n), *args) as f: return f.read() class TestParsers(unittest.TestCase): def test_same_ast(self): "Tests that Earley and LALR parsers produce equal trees" g = Lark(r"""start: "(" name_list ("," "*" NAME)? ")" name_list: NAME | name_list "," NAME NAME: /\w+/ """, parser='lalr') l = g.parse('(a,b,c,*x)') g = Lark(r"""start: "(" name_list ("," "*" NAME)? ")" name_list: NAME | name_list "," NAME NAME: /\w/+ """) l2 = g.parse('(a,b,c,*x)') assert l == l2, '%s != %s' % (l.pretty(), l2.pretty()) def test_infinite_recurse(self): g = """start: a a: a | "a" """ self.assertRaises(GrammarError, Lark, g, parser='lalr') # TODO: should it? shouldn't it? # l = Lark(g, parser='earley', lexer='dynamic') # self.assertRaises(ParseError, l.parse, 'a') def test_propagate_positions(self): g = Lark("""start: a a: "a" """, propagate_positions=True) r = g.parse('a') self.assertEqual( r.children[0].meta.line, 1 ) g = Lark("""start: x x: a a: "a" """, propagate_positions=True) r = g.parse('a') self.assertEqual( r.children[0].meta.line, 1 ) def test_expand1(self): g = Lark("""start: a ?a: b b: "x" """) r = g.parse('x') self.assertEqual( r.children[0].data, "b" ) g = Lark("""start: a ?a: b -> c b: "x" """) r = g.parse('x') self.assertEqual( r.children[0].data, "c" ) g = Lark("""start: a ?a: B -> c B: "x" """) self.assertEqual( r.children[0].data, "c" ) g = Lark("""start: a ?a: b b -> c b: "x" """) r = g.parse('xx') self.assertEqual( r.children[0].data, "c" ) def test_comment_in_rule_definition(self): g = Lark("""start: a a: "a" // A comment // Another comment | "b" // Still more c: "unrelated" """) r = g.parse('b') self.assertEqual( r.children[0].data, "a" ) def test_visit_tokens(self): class T(Transformer): def a(self, children): return children[0] + "!" def A(self, tok): return tok.update(value=tok.upper()) # Test regular g = """start: a a : A A: "x" """ p = Lark(g, parser='lalr') r = T(False).transform(p.parse("x")) self.assertEqual( r.children, ["x!"] ) r = T().transform(p.parse("x")) self.assertEqual( r.children, ["X!"] ) # Test internal transformer p = Lark(g, parser='lalr', transformer=T()) r = p.parse("x") self.assertEqual( r.children, ["X!"] ) def test_vargs_meta(self): @v_args(meta=True) class T1(Transformer): def a(self, children, meta): assert not children return meta.line def start(self, children, meta): return children @v_args(meta=True, inline=True) class T2(Transformer): def a(self, meta): return meta.line def start(self, meta, *res): return list(res) for T in (T1, T2): for internal in [False, True]: try: g = Lark(r"""start: a+ a : "x" _NL? _NL: /\n/+ """, parser='lalr', transformer=T() if internal else None, propagate_positions=True) except NotImplementedError: assert internal continue res = g.parse("xx\nx\nxxx\n\n\nxx") assert not internal res = T().transform(res) self.assertEqual(res, [1, 1, 2, 3, 3, 3, 6, 6]) def test_vargs_tree(self): tree = Lark(''' start: a a a !a: "A" ''').parse('AAA') tree_copy = deepcopy(tree) @v_args(tree=True) class T(Transformer): def a(self, tree): return 1 def start(self, tree): return tree.children res = T().transform(tree) self.assertEqual(res, [1, 1, 1]) self.assertEqual(tree, tree_copy) def test_embedded_transformer(self): class T(Transformer): def a(self, children): return "" def b(self, children): return "" def c(self, children): return "" # Test regular g = Lark("""start: a a : "x" """, parser='lalr') r = T().transform(g.parse("x")) self.assertEqual( r.children, [""] ) g = Lark("""start: a a : "x" """, parser='lalr', transformer=T()) r = g.parse("x") self.assertEqual( r.children, [""] ) # Test Expand1 g = Lark("""start: a ?a : b b : "x" """, parser='lalr') r = T().transform(g.parse("x")) self.assertEqual( r.children, [""] ) g = Lark("""start: a ?a : b b : "x" """, parser='lalr', transformer=T()) r = g.parse("x") self.assertEqual( r.children, [""] ) # Test Expand1 -> Alias g = Lark("""start: a ?a : b b -> c b : "x" """, parser='lalr') r = T().transform(g.parse("xx")) self.assertEqual( r.children, [""] ) g = Lark("""start: a ?a : b b -> c b : "x" """, parser='lalr', transformer=T()) r = g.parse("xx") self.assertEqual( r.children, [""] ) def test_embedded_transformer_inplace(self): @v_args(tree=True) class T1(Transformer_InPlace): def a(self, tree): assert isinstance(tree, Tree), tree tree.children.append("tested") return tree def b(self, tree): return Tree(tree.data, tree.children + ['tested2']) @v_args(tree=True) class T2(Transformer): def a(self, tree): assert isinstance(tree, Tree), tree tree.children.append("tested") return tree def b(self, tree): return Tree(tree.data, tree.children + ['tested2']) class T3(Transformer): @v_args(tree=True) def a(self, tree): assert isinstance(tree, Tree) tree.children.append("tested") return tree @v_args(tree=True) def b(self, tree): return Tree(tree.data, tree.children + ['tested2']) for t in [T1(), T2(), T3()]: for internal in [False, True]: g = Lark("""start: a b a : "x" b : "y" """, parser='lalr', transformer=t if internal else None) r = g.parse("xy") if not internal: r = t.transform(r) a, b = r.children self.assertEqual(a.children, ["tested"]) self.assertEqual(b.children, ["tested2"]) def test_alias(self): Lark("""start: ["a"] "b" ["c"] "e" ["f"] ["g"] ["h"] "x" -> d """) def _make_full_earley_test(LEXER): def _Lark(grammar, **kwargs): return Lark(grammar, lexer=LEXER, parser='earley', propagate_positions=True, **kwargs) class _TestFullEarley(unittest.TestCase): def test_anon(self): # Fails an Earley implementation without special handling for empty rules, # or re-processing of already completed rules. g = Lark(r"""start: B B: ("ab"|/[^b]/)+ """, lexer=LEXER) self.assertEqual( g.parse('abc').children[0], 'abc') def test_earley(self): g = Lark("""start: A "b" c A: "a"+ c: "abc" """, parser="earley", lexer=LEXER) x = g.parse('aaaababc') def test_earley2(self): grammar = """ start: statement+ statement: "r" | "c" /[a-z]/+ %ignore " " """ program = """c b r""" l = Lark(grammar, parser='earley', lexer=LEXER) l.parse(program) @unittest.skipIf(LEXER=='dynamic', "Only relevant for the dynamic_complete parser") def test_earley3(self): """Tests prioritization and disambiguation for pseudo-terminals (there should be only one result) By default, `+` should immitate regexp greedy-matching """ grammar = """ start: A A A: "a"+ """ l = Lark(grammar, parser='earley', lexer=LEXER) res = l.parse("aaa") self.assertEqual(set(res.children), {'aa', 'a'}) # XXX TODO fix Earley to maintain correct order # i.e. terminals it imitate greedy search for terminals, but lazy search for rules # self.assertEqual(res.children, ['aa', 'a']) def test_earley4(self): grammar = """ start: A A? A: "a"+ """ l = Lark(grammar, parser='earley', lexer=LEXER) res = l.parse("aaa") assert set(res.children) == {'aa', 'a'} or res.children == ['aaa'] # XXX TODO fix Earley to maintain correct order # i.e. terminals it imitate greedy search for terminals, but lazy search for rules # self.assertEqual(res.children, ['aaa']) def test_earley_repeating_empty(self): # This was a sneaky bug! grammar = """ !start: "a" empty empty "b" empty: empty2 empty2: """ parser = Lark(grammar, parser='earley', lexer=LEXER) res = parser.parse('ab') empty_tree = Tree('empty', [Tree('empty2', [])]) self.assertSequenceEqual(res.children, ['a', empty_tree, empty_tree, 'b']) @unittest.skipIf(LEXER=='standard', "Requires dynamic lexer") def test_earley_explicit_ambiguity(self): # This was a sneaky bug! grammar = """ start: a b | ab a: "a" b: "b" ab: "ab" """ parser = Lark(grammar, parser='earley', lexer=LEXER, ambiguity='explicit') ambig_tree = parser.parse('ab') self.assertEqual( ambig_tree.data, '_ambig') self.assertEqual( len(ambig_tree.children), 2) @unittest.skipIf(LEXER=='standard', "Requires dynamic lexer") def test_ambiguity1(self): grammar = """ start: cd+ "e" !cd: "c" | "d" | "cd" """ l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER) ambig_tree = l.parse('cde') assert ambig_tree.data == '_ambig', ambig_tree assert len(ambig_tree.children) == 2 @unittest.skipIf(LEXER=='standard', "Requires dynamic lexer") def test_ambiguity2(self): grammar = """ ANY: /[a-zA-Z0-9 ]+/ a.2: "A" b+ b.2: "B" c: ANY start: (a|c)* """ l = Lark(grammar, parser='earley', lexer=LEXER) res = l.parse('ABX') expected = Tree('start', [ Tree('a', [ Tree('b', []) ]), Tree('c', [ 'X' ]) ]) self.assertEqual(res, expected) def test_fruitflies_ambig(self): grammar = """ start: noun verb noun -> simple | noun verb "like" noun -> comparative noun: adj? NOUN verb: VERB adj: ADJ NOUN: "flies" | "bananas" | "fruit" VERB: "like" | "flies" ADJ: "fruit" %import common.WS %ignore WS """ parser = Lark(grammar, ambiguity='explicit', lexer=LEXER) tree = parser.parse('fruit flies like bananas') expected = Tree('_ambig', [ Tree('comparative', [ Tree('noun', ['fruit']), Tree('verb', ['flies']), Tree('noun', ['bananas']) ]), Tree('simple', [ Tree('noun', [Tree('adj', ['fruit']), 'flies']), Tree('verb', ['like']), Tree('noun', ['bananas']) ]) ]) # self.assertEqual(tree, expected) self.assertEqual(tree.data, expected.data) self.assertEqual(set(tree.children), set(expected.children)) @unittest.skipIf(LEXER!='dynamic_complete', "Only relevant for the dynamic_complete parser") def test_explicit_ambiguity2(self): grammar = r""" start: NAME+ NAME: /\w+/ %ignore " " """ text = """cat""" parser = _Lark(grammar, start='start', ambiguity='explicit') tree = parser.parse(text) self.assertEqual(tree.data, '_ambig') combinations = {tuple(str(s) for s in t.children) for t in tree.children} self.assertEqual(combinations, { ('cat',), ('ca', 't'), ('c', 'at'), ('c', 'a' ,'t') }) def test_term_ambig_resolve(self): grammar = r""" !start: NAME+ NAME: /\w+/ %ignore " " """ text = """foo bar""" parser = Lark(grammar) tree = parser.parse(text) self.assertEqual(tree.children, ['foo', 'bar']) # @unittest.skipIf(LEXER=='dynamic', "Not implemented in Dynamic Earley yet") # TODO # def test_not_all_derivations(self): # grammar = """ # start: cd+ "e" # !cd: "c" # | "d" # | "cd" # """ # l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER, earley__all_derivations=False) # x = l.parse('cde') # assert x.data != '_ambig', x # assert len(x.children) == 1 _NAME = "TestFullEarley" + LEXER.capitalize() _TestFullEarley.__name__ = _NAME globals()[_NAME] = _TestFullEarley class CustomLexer(Lexer): """ Purpose of this custom lexer is to test the integration, so it uses the traditionalparser as implementation without custom lexing behaviour. """ def __init__(self, lexer_conf): self.lexer = TraditionalLexer(lexer_conf.tokens, ignore=lexer_conf.ignore, user_callbacks=lexer_conf.callbacks) def lex(self, *args, **kwargs): return self.lexer.lex(*args, **kwargs) def _make_parser_test(LEXER, PARSER): lexer_class_or_name = CustomLexer if LEXER == 'custom' else LEXER def _Lark(grammar, **kwargs): return Lark(grammar, lexer=lexer_class_or_name, parser=PARSER, propagate_positions=True, **kwargs) def _Lark_open(gfilename, **kwargs): return Lark.open(gfilename, lexer=lexer_class_or_name, parser=PARSER, propagate_positions=True, **kwargs) class _TestParser(unittest.TestCase): def test_basic1(self): g = _Lark("""start: a+ b a* "b" a* b: "b" a: "a" """) r = g.parse('aaabaab') self.assertEqual( ''.join(x.data for x in r.children), 'aaabaa' ) r = g.parse('aaabaaba') self.assertEqual( ''.join(x.data for x in r.children), 'aaabaaa' ) self.assertRaises(ParseError, g.parse, 'aaabaa') def test_basic2(self): # Multiple parsers and colliding tokens g = _Lark("""start: B A B: "12" A: "1" """) g2 = _Lark("""start: B A B: "12" A: "2" """) x = g.parse('121') assert x.data == 'start' and x.children == ['12', '1'], x x = g2.parse('122') assert x.data == 'start' and x.children == ['12', '2'], x @unittest.skipIf(cStringIO is None, "cStringIO not available") def test_stringio_bytes(self): """Verify that a Lark can be created from file-like objects other than Python's standard 'file' object""" _Lark(cStringIO(b'start: a+ b a* "b" a*\n b: "b"\n a: "a" ')) def test_stringio_unicode(self): """Verify that a Lark can be created from file-like objects other than Python's standard 'file' object""" _Lark(uStringIO(u'start: a+ b a* "b" a*\n b: "b"\n a: "a" ')) def test_unicode(self): g = _Lark(u"""start: UNIA UNIB UNIA UNIA: /\xa3/ UNIB: /\u0101/ """) g.parse(u'\xa3\u0101\u00a3') def test_unicode2(self): g = _Lark(r"""start: UNIA UNIB UNIA UNIC UNIA: /\xa3/ UNIB: "a\u0101b\ " UNIC: /a?\u0101c\n/ """) g.parse(u'\xa3a\u0101b\\ \u00a3\u0101c\n') def test_unicode3(self): g = _Lark(r"""start: UNIA UNIB UNIA UNIC UNIA: /\xa3/ UNIB: "\u0101" UNIC: /\u0203/ /\n/ """) g.parse(u'\xa3\u0101\u00a3\u0203\n') def test_hex_escape(self): g = _Lark(r"""start: A B C A: "\x01" B: /\x02/ C: "\xABCD" """) g.parse('\x01\x02\xABCD') def test_unicode_literal_range_escape(self): g = _Lark(r"""start: A+ A: "\u0061".."\u0063" """) g.parse('abc') def test_hex_literal_range_escape(self): g = _Lark(r"""start: A+ A: "\x01".."\x03" """) g.parse('\x01\x02\x03') @unittest.skipIf(PARSER == 'cyk', "Takes forever") def test_stack_for_ebnf(self): """Verify that stack depth isn't an issue for EBNF grammars""" g = _Lark(r"""start: a+ a : "a" """) g.parse("a" * (sys.getrecursionlimit()*2 )) def test_expand1_lists_with_one_item(self): g = _Lark(r"""start: list ?list: item+ item : A A: "a" """) r = g.parse("a") # because 'list' is an expand-if-contains-one rule and we only provided one element it should have expanded to 'item' self.assertSequenceEqual([subtree.data for subtree in r.children], ('item',)) # regardless of the amount of items: there should be only *one* child in 'start' because 'list' isn't an expand-all rule self.assertEqual(len(r.children), 1) def test_expand1_lists_with_one_item_2(self): g = _Lark(r"""start: list ?list: item+ "!" item : A A: "a" """) r = g.parse("a!") # because 'list' is an expand-if-contains-one rule and we only provided one element it should have expanded to 'item' self.assertSequenceEqual([subtree.data for subtree in r.children], ('item',)) # regardless of the amount of items: there should be only *one* child in 'start' because 'list' isn't an expand-all rule self.assertEqual(len(r.children), 1) def test_dont_expand1_lists_with_multiple_items(self): g = _Lark(r"""start: list ?list: item+ item : A A: "a" """) r = g.parse("aa") # because 'list' is an expand-if-contains-one rule and we've provided more than one element it should *not* have expanded self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) # regardless of the amount of items: there should be only *one* child in 'start' because 'list' isn't an expand-all rule self.assertEqual(len(r.children), 1) # Sanity check: verify that 'list' contains the two 'item's we've given it [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ('item', 'item')) def test_dont_expand1_lists_with_multiple_items_2(self): g = _Lark(r"""start: list ?list: item+ "!" item : A A: "a" """) r = g.parse("aa!") # because 'list' is an expand-if-contains-one rule and we've provided more than one element it should *not* have expanded self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) # regardless of the amount of items: there should be only *one* child in 'start' because 'list' isn't an expand-all rule self.assertEqual(len(r.children), 1) # Sanity check: verify that 'list' contains the two 'item's we've given it [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ('item', 'item')) @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_empty_expand1_list(self): g = _Lark(r"""start: list ?list: item* item : A A: "a" """) r = g.parse("") # because 'list' is an expand-if-contains-one rule and we've provided less than one element (i.e. none) it should *not* have expanded self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) # regardless of the amount of items: there should be only *one* child in 'start' because 'list' isn't an expand-all rule self.assertEqual(len(r.children), 1) # Sanity check: verify that 'list' contains no 'item's as we've given it none [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ()) @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_empty_expand1_list_2(self): g = _Lark(r"""start: list ?list: item* "!"? item : A A: "a" """) r = g.parse("") # because 'list' is an expand-if-contains-one rule and we've provided less than one element (i.e. none) it should *not* have expanded self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) # regardless of the amount of items: there should be only *one* child in 'start' because 'list' isn't an expand-all rule self.assertEqual(len(r.children), 1) # Sanity check: verify that 'list' contains no 'item's as we've given it none [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ()) @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_empty_flatten_list(self): g = _Lark(r"""start: list list: | item "," list item : A A: "a" """) r = g.parse("") # Because 'list' is a flatten rule it's top-level element should *never* be expanded self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) # Sanity check: verify that 'list' contains no 'item's as we've given it none [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ()) @unittest.skipIf(True, "Flattening list isn't implemented (and may never be)") def test_single_item_flatten_list(self): g = _Lark(r"""start: list list: | item "," list item : A A: "a" """) r = g.parse("a,") # Because 'list' is a flatten rule it's top-level element should *never* be expanded self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) # Sanity check: verify that 'list' contains exactly the one 'item' we've given it [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ('item',)) @unittest.skipIf(True, "Flattening list isn't implemented (and may never be)") def test_multiple_item_flatten_list(self): g = _Lark(r"""start: list #list: | item "," list item : A A: "a" """) r = g.parse("a,a,") # Because 'list' is a flatten rule it's top-level element should *never* be expanded self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) # Sanity check: verify that 'list' contains exactly the two 'item's we've given it [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ('item', 'item')) @unittest.skipIf(True, "Flattening list isn't implemented (and may never be)") def test_recurse_flatten(self): """Verify that stack depth doesn't get exceeded on recursive rules marked for flattening.""" g = _Lark(r"""start: a | start a a : A A : "a" """) # Force PLY to write to the debug log, but prevent writing it to the terminal (uses repr() on the half-built # STree data structures, which uses recursion). g.parse("a" * (sys.getrecursionlimit() // 4)) def test_token_collision(self): g = _Lark(r"""start: "Hello" NAME NAME: /\w/+ %ignore " " """) x = g.parse('Hello World') self.assertSequenceEqual(x.children, ['World']) x = g.parse('Hello HelloWorld') self.assertSequenceEqual(x.children, ['HelloWorld']) def test_token_collision_WS(self): g = _Lark(r"""start: "Hello" NAME NAME: /\w/+ %import common.WS %ignore WS """) x = g.parse('Hello World') self.assertSequenceEqual(x.children, ['World']) x = g.parse('Hello HelloWorld') self.assertSequenceEqual(x.children, ['HelloWorld']) def test_token_collision2(self): g = _Lark(""" !start: "starts" %import common.LCASE_LETTER """) x = g.parse("starts") self.assertSequenceEqual(x.children, ['starts']) # def test_string_priority(self): # g = _Lark("""start: (A | /a?bb/)+ # A: "a" """) # x = g.parse('abb') # self.assertEqual(len(x.children), 2) # # This parse raises an exception because the lexer will always try to consume # # "a" first and will never match the regular expression # # This behavior is subject to change!! # # Thie won't happen with ambiguity handling. # g = _Lark("""start: (A | /a?ab/)+ # A: "a" """) # self.assertRaises(LexError, g.parse, 'aab') def test_undefined_rule(self): self.assertRaises(GrammarError, _Lark, """start: a""") def test_undefined_token(self): self.assertRaises(GrammarError, _Lark, """start: A""") def test_rule_collision(self): g = _Lark("""start: "a"+ "b" | "a"+ """) x = g.parse('aaaa') x = g.parse('aaaab') def test_rule_collision2(self): g = _Lark("""start: "a"* "b" | "a"+ """) x = g.parse('aaaa') x = g.parse('aaaab') x = g.parse('b') def test_token_not_anon(self): """Tests that "a" is matched as an anonymous token, and not A. """ g = _Lark("""start: "a" A: "a" """) x = g.parse('a') self.assertEqual(len(x.children), 0, '"a" should be considered anonymous') g = _Lark("""start: "a" A A: "a" """) x = g.parse('aa') self.assertEqual(len(x.children), 1, 'only "a" should be considered anonymous') self.assertEqual(x.children[0].type, "A") g = _Lark("""start: /a/ A: /a/ """) x = g.parse('a') self.assertEqual(len(x.children), 1) self.assertEqual(x.children[0].type, "A", "A isn't associated with /a/") @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_maybe(self): g = _Lark("""start: ["a"] """) x = g.parse('a') x = g.parse('') def test_start(self): g = _Lark("""a: "a" a? """, start='a') x = g.parse('a') x = g.parse('aa') x = g.parse('aaa') def test_alias(self): g = _Lark("""start: "a" -> b """) x = g.parse('a') self.assertEqual(x.data, "b") def test_token_ebnf(self): g = _Lark("""start: A A: "a"* ("b"? "c".."e")+ """) x = g.parse('abcde') x = g.parse('dd') def test_backslash(self): g = _Lark(r"""start: "\\" "a" """) x = g.parse(r'\a') g = _Lark(r"""start: /\\/ /a/ """) x = g.parse(r'\a') def test_backslash2(self): g = _Lark(r"""start: "\"" "-" """) x = g.parse('"-') g = _Lark(r"""start: /\// /-/ """) x = g.parse('/-') def test_special_chars(self): g = _Lark(r"""start: "\n" """) x = g.parse('\n') g = _Lark(r"""start: /\n/ """) x = g.parse('\n') # def test_token_recurse(self): # g = _Lark("""start: A # A: B # B: A # """) @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_empty(self): # Fails an Earley implementation without special handling for empty rules, # or re-processing of already completed rules. g = _Lark(r"""start: _empty a "B" a: _empty "A" _empty: """) x = g.parse('AB') def test_regex_quote(self): g = r""" start: SINGLE_QUOTED_STRING | DOUBLE_QUOTED_STRING SINGLE_QUOTED_STRING : /'[^']*'/ DOUBLE_QUOTED_STRING : /"[^"]*"/ """ g = _Lark(g) self.assertEqual( g.parse('"hello"').children, ['"hello"']) self.assertEqual( g.parse("'hello'").children, ["'hello'"]) def test_lexer_token_limit(self): "Python has a stupid limit of 100 groups in a regular expression. Test that we handle this limitation" tokens = {'A%d'%i:'"%d"'%i for i in range(300)} g = _Lark("""start: %s %s""" % (' '.join(tokens), '\n'.join("%s: %s"%x for x in tokens.items()))) def test_float_without_lexer(self): expected_error = UnexpectedCharacters if LEXER.startswith('dynamic') else UnexpectedToken if PARSER == 'cyk': expected_error = ParseError g = _Lark("""start: ["+"|"-"] float float: digit* "." digit+ exp? | digit+ exp exp: ("e"|"E") ["+"|"-"] digit+ digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" """) g.parse("1.2") g.parse("-.2e9") g.parse("+2e-9") self.assertRaises( expected_error, g.parse, "+2e-9e") def test_keep_all_tokens(self): l = _Lark("""start: "a"+ """, keep_all_tokens=True) tree = l.parse('aaa') self.assertEqual(tree.children, ['a', 'a', 'a']) def test_token_flags(self): l = _Lark("""!start: "a"i+ """ ) tree = l.parse('aA') self.assertEqual(tree.children, ['a', 'A']) l = _Lark("""!start: /a/i+ """ ) tree = l.parse('aA') self.assertEqual(tree.children, ['a', 'A']) # g = """!start: "a"i "a" # """ # self.assertRaises(GrammarError, _Lark, g) # g = """!start: /a/i /a/ # """ # self.assertRaises(GrammarError, _Lark, g) g = """start: NAME "," "a" NAME: /[a-z_]/i /[a-z0-9_]/i* """ l = _Lark(g) tree = l.parse('ab,a') self.assertEqual(tree.children, ['ab']) tree = l.parse('AB,a') self.assertEqual(tree.children, ['AB']) def test_token_flags3(self): l = _Lark("""!start: ABC+ ABC: "abc"i """ ) tree = l.parse('aBcAbC') self.assertEqual(tree.children, ['aBc', 'AbC']) def test_token_flags2(self): g = """!start: ("a"i | /a/ /b/?)+ """ l = _Lark(g) tree = l.parse('aA') self.assertEqual(tree.children, ['a', 'A']) @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_twice_empty(self): g = """!start: ("A"?)? """ l = _Lark(g) tree = l.parse('A') self.assertEqual(tree.children, ['A']) tree = l.parse('') self.assertEqual(tree.children, []) def test_undefined_ignore(self): g = """!start: "A" %ignore B """ self.assertRaises( GrammarError, _Lark, g) def test_alias_in_terminal(self): g = """start: TERM TERM: "a" -> alias """ self.assertRaises( GrammarError, _Lark, g) def test_line_and_column(self): g = r"""!start: "A" bc "D" !bc: "B\nC" """ l = _Lark(g) a, bc, d = l.parse("AB\nCD").children self.assertEqual(a.line, 1) self.assertEqual(a.column, 1) bc ,= bc.children self.assertEqual(bc.line, 1) self.assertEqual(bc.column, 2) self.assertEqual(d.line, 2) self.assertEqual(d.column, 2) if LEXER != 'dynamic': self.assertEqual(a.end_line, 1) self.assertEqual(a.end_column, 2) self.assertEqual(bc.end_line, 2) self.assertEqual(bc.end_column, 2) self.assertEqual(d.end_line, 2) self.assertEqual(d.end_column, 3) def test_reduce_cycle(self): """Tests an edge-condition in the LALR parser, in which a transition state looks exactly like the end state. It seems that the correct solution is to explicitely distinguish finalization in the reduce() function. """ l = _Lark(""" term: A | term term A: "a" """, start='term') tree = l.parse("aa") self.assertEqual(len(tree.children), 2) @unittest.skipIf(LEXER != 'standard', "Only standard lexers care about token priority") def test_lexer_prioritization(self): "Tests effect of priority on result" grammar = """ start: A B | AB A.2: "a" B: "b" AB: "ab" """ l = _Lark(grammar) res = l.parse("ab") self.assertEqual(res.children, ['a', 'b']) self.assertNotEqual(res.children, ['ab']) grammar = """ start: A B | AB A: "a" B: "b" AB.3: "ab" """ l = _Lark(grammar) res = l.parse("ab") self.assertNotEqual(res.children, ['a', 'b']) self.assertEqual(res.children, ['ab']) grammar = """ start: A B | AB A: "a" B.-20: "b" AB.-10: "ab" """ l = _Lark(grammar) res = l.parse("ab") self.assertEqual(res.children, ['a', 'b']) grammar = """ start: A B | AB A.-99999999999999999999999: "a" B: "b" AB: "ab" """ l = _Lark(grammar) res = l.parse("ab") self.assertEqual(res.children, ['ab']) def test_import(self): grammar = """ start: NUMBER WORD %import common.NUMBER %import common.WORD %import common.WS %ignore WS """ l = _Lark(grammar) x = l.parse('12 elephants') self.assertEqual(x.children, ['12', 'elephants']) def test_import_rename(self): grammar = """ start: N W %import common.NUMBER -> N %import common.WORD -> W %import common.WS %ignore WS """ l = _Lark(grammar) x = l.parse('12 elephants') self.assertEqual(x.children, ['12', 'elephants']) def test_relative_import(self): l = _Lark_open('test_relative_import.lark', rel_to=__file__) x = l.parse('12 lions') self.assertEqual(x.children, ['12', 'lions']) def test_relative_import_unicode(self): l = _Lark_open('test_relative_import_unicode.lark', rel_to=__file__) x = l.parse(u'Ø') self.assertEqual(x.children, [u'Ø']) def test_relative_import_rename(self): l = _Lark_open('test_relative_import_rename.lark', rel_to=__file__) x = l.parse('12 lions') self.assertEqual(x.children, ['12', 'lions']) def test_relative_rule_import(self): l = _Lark_open('test_relative_rule_import.lark', rel_to=__file__) x = l.parse('xaabby') self.assertEqual(x.children, [ 'x', Tree('expr', ['a', Tree('expr', ['a', 'b']), 'b']), 'y']) def test_relative_rule_import_drop_ignore(self): # %ignore rules are dropped on import l = _Lark_open('test_relative_rule_import_drop_ignore.lark', rel_to=__file__) self.assertRaises((ParseError, UnexpectedInput), l.parse, 'xa abby') def test_relative_rule_import_subrule(self): l = _Lark_open('test_relative_rule_import_subrule.lark', rel_to=__file__) x = l.parse('xaabby') self.assertEqual(x.children, [ 'x', Tree('startab', [ Tree('grammars__ab__expr', [ 'a', Tree('grammars__ab__expr', ['a', 'b']), 'b', ]), ]), 'y']) def test_relative_rule_import_subrule_no_conflict(self): l = _Lark_open( 'test_relative_rule_import_subrule_no_conflict.lark', rel_to=__file__) x = l.parse('xaby') self.assertEqual(x.children, [Tree('expr', [ 'x', Tree('startab', [ Tree('grammars__ab__expr', ['a', 'b']), ]), 'y'])]) self.assertRaises((ParseError, UnexpectedInput), l.parse, 'xaxabyby') def test_relative_rule_import_rename(self): l = _Lark_open('test_relative_rule_import_rename.lark', rel_to=__file__) x = l.parse('xaabby') self.assertEqual(x.children, [ 'x', Tree('ab', ['a', Tree('ab', ['a', 'b']), 'b']), 'y']) def test_multi_import(self): grammar = """ start: NUMBER WORD %import common (NUMBER, WORD, WS) %ignore WS """ l = _Lark(grammar) x = l.parse('12 toucans') self.assertEqual(x.children, ['12', 'toucans']) def test_relative_multi_import(self): l = _Lark_open("test_relative_multi_import.lark", rel_to=__file__) x = l.parse('12 capybaras') self.assertEqual(x.children, ['12', 'capybaras']) def test_relative_import_preserves_leading_underscore(self): l = _Lark_open("test_relative_import_preserves_leading_underscore.lark", rel_to=__file__) x = l.parse('Ax') self.assertEqual(next(x.find_data('c')).children, ['A']) def test_relative_import_of_nested_grammar(self): l = _Lark_open("grammars/test_relative_import_of_nested_grammar.lark", rel_to=__file__) x = l.parse('N') self.assertEqual(next(x.find_data('rule_to_import')).children, ['N']) def test_relative_import_rules_dependencies_imported_only_once(self): l = _Lark_open("test_relative_import_rules_dependencies_imported_only_once.lark", rel_to=__file__) x = l.parse('AAA') self.assertEqual(next(x.find_data('a')).children, ['A']) self.assertEqual(next(x.find_data('b')).children, ['A']) self.assertEqual(next(x.find_data('d')).children, ['A']) def test_import_errors(self): grammar = """ start: NUMBER WORD %import .grammars.bad_test.NUMBER """ self.assertRaises(IOError, _Lark, grammar) grammar = """ start: NUMBER WORD %import bad_test.NUMBER """ self.assertRaises(IOError, _Lark, grammar) @unittest.skipIf(PARSER != 'earley', "Currently only Earley supports priority in rules") def test_earley_prioritization(self): "Tests effect of priority on result" grammar = """ start: a | b a.1: "a" b.2: "a" """ # l = Lark(grammar, parser='earley', lexer='standard') l = _Lark(grammar) res = l.parse("a") self.assertEqual(res.children[0].data, 'b') grammar = """ start: a | b a.2: "a" b.1: "a" """ l = _Lark(grammar) # l = Lark(grammar, parser='earley', lexer='standard') res = l.parse("a") self.assertEqual(res.children[0].data, 'a') @unittest.skipIf(PARSER != 'earley', "Currently only Earley supports priority in rules") def test_earley_prioritization_sum(self): "Tests effect of priority on result" grammar = """ start: ab_ b_ a_ | indirection indirection: a_ bb_ a_ a_: "a" b_: "b" ab_: "ab" bb_.1: "bb" """ l = Lark(grammar, priority="invert") res = l.parse('abba') self.assertEqual(''.join(child.data for child in res.children), 'ab_b_a_') grammar = """ start: ab_ b_ a_ | indirection indirection: a_ bb_ a_ a_: "a" b_: "b" ab_.1: "ab" bb_: "bb" """ l = Lark(grammar, priority="invert") res = l.parse('abba') self.assertEqual(''.join(child.data for child in res.children), 'indirection') grammar = """ start: ab_ b_ a_ | indirection indirection: a_ bb_ a_ a_.2: "a" b_.1: "b" ab_.3: "ab" bb_.3: "bb" """ l = Lark(grammar, priority="invert") res = l.parse('abba') self.assertEqual(''.join(child.data for child in res.children), 'ab_b_a_') grammar = """ start: ab_ b_ a_ | indirection indirection: a_ bb_ a_ a_.1: "a" b_.1: "b" ab_.4: "ab" bb_.3: "bb" """ l = Lark(grammar, priority="invert") res = l.parse('abba') self.assertEqual(''.join(child.data for child in res.children), 'indirection') def test_utf8(self): g = u"""start: a a: "±a" """ l = _Lark(g) self.assertEqual(l.parse(u'±a'), Tree('start', [Tree('a', [])])) g = u"""start: A A: "±a" """ l = _Lark(g) self.assertEqual(l.parse(u'±a'), Tree('start', [u'\xb1a'])) @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_ignore(self): grammar = r""" COMMENT: /(!|(\/\/))[^\n]*/ %ignore COMMENT %import common.WS -> _WS %import common.INT start: "INT"i _WS+ INT _WS* """ parser = _Lark(grammar) tree = parser.parse("int 1 ! This is a comment\n") self.assertEqual(tree.children, ['1']) tree = parser.parse("int 1 ! This is a comment") # A trailing ignore token can be tricky! self.assertEqual(tree.children, ['1']) parser = _Lark(r""" start : "a"* %ignore "b" """) tree = parser.parse("bb") self.assertEqual(tree.children, []) def test_regex_escaping(self): g = _Lark("start: /[ab]/") g.parse('a') g.parse('b') self.assertRaises( UnexpectedInput, g.parse, 'c') _Lark(r'start: /\w/').parse('a') g = _Lark(r'start: /\\w/') self.assertRaises( UnexpectedInput, g.parse, 'a') g.parse(r'\w') _Lark(r'start: /\[/').parse('[') _Lark(r'start: /\//').parse('/') _Lark(r'start: /\\/').parse('\\') _Lark(r'start: /\[ab]/').parse('[ab]') _Lark(r'start: /\\[ab]/').parse('\\a') _Lark(r'start: /\t/').parse('\t') _Lark(r'start: /\\t/').parse('\\t') _Lark(r'start: /\\\t/').parse('\\\t') _Lark(r'start: "\t"').parse('\t') _Lark(r'start: "\\t"').parse('\\t') _Lark(r'start: "\\\t"').parse('\\\t') def test_ranged_repeat_rules(self): g = u"""!start: "A"~3 """ l = _Lark(g) self.assertEqual(l.parse(u'AAA'), Tree('start', ["A", "A", "A"])) self.assertRaises(ParseError, l.parse, u'AA') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAA') g = u"""!start: "A"~0..2 """ if PARSER != 'cyk': # XXX CYK currently doesn't support empty grammars l = _Lark(g) self.assertEqual(l.parse(u''), Tree('start', [])) self.assertEqual(l.parse(u'A'), Tree('start', ['A'])) self.assertEqual(l.parse(u'AA'), Tree('start', ['A', 'A'])) self.assertRaises((UnexpectedToken, UnexpectedInput), l.parse, u'AAA') g = u"""!start: "A"~3..2 """ self.assertRaises(GrammarError, _Lark, g) g = u"""!start: "A"~2..3 "B"~2 """ l = _Lark(g) self.assertEqual(l.parse(u'AABB'), Tree('start', ['A', 'A', 'B', 'B'])) self.assertEqual(l.parse(u'AAABB'), Tree('start', ['A', 'A', 'A', 'B', 'B'])) self.assertRaises(ParseError, l.parse, u'AAAB') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAABBB') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'ABB') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAABB') def test_ranged_repeat_terms(self): g = u"""!start: AAA AAA: "A"~3 """ l = _Lark(g) self.assertEqual(l.parse(u'AAA'), Tree('start', ["AAA"])) self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AA') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAA') g = u"""!start: AABB CC AABB: "A"~0..2 "B"~2 CC: "C"~1..2 """ l = _Lark(g) self.assertEqual(l.parse(u'AABBCC'), Tree('start', ['AABB', 'CC'])) self.assertEqual(l.parse(u'BBC'), Tree('start', ['BB', 'C'])) self.assertEqual(l.parse(u'ABBCC'), Tree('start', ['ABB', 'CC'])) self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAB') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAABBB') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'ABB') self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAABB') @unittest.skipIf(PARSER=='earley', "Priority not handled correctly right now") # TODO XXX def test_priority_vs_embedded(self): g = """ A.2: "a" WORD: ("a".."z")+ start: (A | WORD)+ """ l = _Lark(g) t = l.parse('abc') self.assertEqual(t.children, ['a', 'bc']) self.assertEqual(t.children[0].type, 'A') def test_line_counting(self): p = _Lark("start: /[^x]+/") text = 'hello\nworld' t = p.parse(text) tok = t.children[0] self.assertEqual(tok, text) self.assertEqual(tok.line, 1) self.assertEqual(tok.column, 1) if _LEXER != 'dynamic': self.assertEqual(tok.end_line, 2) self.assertEqual(tok.end_column, 6) @unittest.skipIf(PARSER=='cyk', "Empty rules") def test_empty_end(self): p = _Lark(""" start: b c d b: "B" c: | "C" d: | "D" """) res = p.parse('B') self.assertEqual(len(res.children), 3) @unittest.skipIf(PARSER=='cyk', "Empty rules") def test_maybe_placeholders(self): # Anonymous tokens shouldn't count p = _Lark("""start: ["a"] ["b"] ["c"] """, maybe_placeholders=True) self.assertEqual(p.parse("").children, []) # All invisible constructs shouldn't count p = _Lark("""start: [A] ["b"] [_c] ["e" "f" _c] A: "a" _c: "c" """, maybe_placeholders=True) self.assertEqual(p.parse("").children, [None]) self.assertEqual(p.parse("c").children, [None]) self.assertEqual(p.parse("aefc").children, ['a']) # ? shouldn't apply p = _Lark("""!start: ["a"] "b"? ["c"] """, maybe_placeholders=True) self.assertEqual(p.parse("").children, [None, None]) self.assertEqual(p.parse("b").children, [None, 'b', None]) p = _Lark("""!start: ["a"] ["b"] ["c"] """, maybe_placeholders=True) self.assertEqual(p.parse("").children, [None, None, None]) self.assertEqual(p.parse("a").children, ['a', None, None]) self.assertEqual(p.parse("b").children, [None, 'b', None]) self.assertEqual(p.parse("c").children, [None, None, 'c']) self.assertEqual(p.parse("ab").children, ['a', 'b', None]) self.assertEqual(p.parse("ac").children, ['a', None, 'c']) self.assertEqual(p.parse("bc").children, [None, 'b', 'c']) self.assertEqual(p.parse("abc").children, ['a', 'b', 'c']) p = _Lark("""!start: (["a"] "b" ["c"])+ """, maybe_placeholders=True) self.assertEqual(p.parse("b").children, [None, 'b', None]) self.assertEqual(p.parse("bb").children, [None, 'b', None, None, 'b', None]) self.assertEqual(p.parse("abbc").children, ['a', 'b', None, None, 'b', 'c']) self.assertEqual(p.parse("babbcabcb").children, [None, 'b', None, 'a', 'b', None, None, 'b', 'c', 'a', 'b', 'c', None, 'b', None]) p = _Lark("""!start: ["a"] ["c"] "b"+ ["a"] ["d"] """, maybe_placeholders=True) self.assertEqual(p.parse("bb").children, [None, None, 'b', 'b', None, None]) self.assertEqual(p.parse("bd").children, [None, None, 'b', None, 'd']) self.assertEqual(p.parse("abba").children, ['a', None, 'b', 'b', 'a', None]) self.assertEqual(p.parse("cbbbb").children, [None, 'c', 'b', 'b', 'b', 'b', None, None]) def test_escaped_string(self): "Tests common.ESCAPED_STRING" grammar = r""" start: ESCAPED_STRING+ %import common (WS_INLINE, ESCAPED_STRING) %ignore WS_INLINE """ parser = _Lark(grammar) parser.parse(r'"\\" "b" "c"') parser.parse(r'"That" "And a \"b"') def test_meddling_unused(self): "Unless 'unused' is removed, LALR analysis will fail on reduce-reduce collision" grammar = """ start: EKS* x x: EKS unused: x* EKS: "x" """ parser = _Lark(grammar) @unittest.skipIf(PARSER!='lalr' or LEXER=='custom', "Serialize currently only works for LALR parsers without custom lexers (though it should be easy to extend)") def test_serialize(self): grammar = """ start: _ANY b "C" _ANY: /./ b: "B" """ parser = _Lark(grammar) d = parser.serialize() parser2 = Lark.deserialize(d, {}, {}) self.assertEqual(parser2.parse('ABC'), Tree('start', [Tree('b', [])]) ) namespace = {'Rule': Rule, 'TerminalDef': TerminalDef} d, m = parser.memo_serialize(namespace.values()) parser3 = Lark.deserialize(d, namespace, m) self.assertEqual(parser3.parse('ABC'), Tree('start', [Tree('b', [])]) ) def test_multi_start(self): parser = _Lark(''' a: "x" "a"? b: "x" "b"? ''', start=['a', 'b']) self.assertEqual(parser.parse('xa', 'a'), Tree('a', [])) self.assertEqual(parser.parse('xb', 'b'), Tree('b', [])) def test_lexer_detect_newline_tokens(self): # Detect newlines in regular tokens g = _Lark(r"""start: "go" tail* !tail : SA "@" | SB "@" | SC "@" | SD "@" SA : "a" /\n/ SB : /b./s SC : "c" /[^a-z]/ SD : "d" /\s/ """) a,b,c,d = [x.children[1] for x in g.parse('goa\n@b\n@c\n@d\n@').children] self.assertEqual(a.line, 2) self.assertEqual(b.line, 3) self.assertEqual(c.line, 4) self.assertEqual(d.line, 5) # Detect newlines in ignored tokens for re in ['/\\n/', '/[^a-z]/', '/\\s/']: g = _Lark('''!start: "a" "a" %ignore {}'''.format(re)) a, b = g.parse('a\na').children self.assertEqual(a.line, 1) self.assertEqual(b.line, 2) _NAME = "Test" + PARSER.capitalize() + LEXER.capitalize() _TestParser.__name__ = _NAME globals()[_NAME] = _TestParser # Note: You still have to import them in __main__ for the tests to run _TO_TEST = [ ('standard', 'earley'), ('standard', 'cyk'), ('dynamic', 'earley'), ('dynamic_complete', 'earley'), ('standard', 'lalr'), ('contextual', 'lalr'), ('custom', 'lalr'), # (None, 'earley'), ] for _LEXER, _PARSER in _TO_TEST: _make_parser_test(_LEXER, _PARSER) for _LEXER in ('dynamic', 'dynamic_complete'): _make_full_earley_test(_LEXER) if __name__ == '__main__': unittest.main() lark-0.8.1/tests/test_reconstructor.py000066400000000000000000000054001361215331400201360ustar00rootroot00000000000000import json import unittest from unittest import TestCase from lark import Lark from lark.reconstruct import Reconstructor common = """ %import common (WS_INLINE, NUMBER, WORD) %ignore WS_INLINE """ def _remove_ws(s): return s.replace(' ', '').replace('\n','') class TestReconstructor(TestCase): def assert_reconstruct(self, grammar, code): parser = Lark(grammar, parser='lalr', maybe_placeholders=False) tree = parser.parse(code) new = Reconstructor(parser).reconstruct(tree) self.assertEqual(_remove_ws(code), _remove_ws(new)) def test_starred_rule(self): g = """ start: item* item: NL | rule rule: WORD ":" NUMBER NL: /(\\r?\\n)+\\s*/ """ + common code = """ Elephants: 12 """ self.assert_reconstruct(g, code) def test_starred_group(self): g = """ start: (rule | NL)* rule: WORD ":" NUMBER NL: /(\\r?\\n)+\\s*/ """ + common code = """ Elephants: 12 """ self.assert_reconstruct(g, code) def test_alias(self): g = """ start: line* line: NL | rule | "hello" -> hi rule: WORD ":" NUMBER NL: /(\\r?\\n)+\\s*/ """ + common code = """ Elephants: 12 hello """ self.assert_reconstruct(g, code) def test_json_example(self): test_json = ''' { "empty_object" : {}, "empty_array" : [], "booleans" : { "YES" : true, "NO" : false }, "numbers" : [ 0, 1, -2, 3.3, 4.4e5, 6.6e-7 ], "strings" : [ "This", [ "And" , "That", "And a \\"b" ] ], "nothing" : null } ''' json_grammar = r""" ?start: value ?value: object | array | string | SIGNED_NUMBER -> number | "true" -> true | "false" -> false | "null" -> null array : "[" [value ("," value)*] "]" object : "{" [pair ("," pair)*] "}" pair : string ":" value string : ESCAPED_STRING %import common.ESCAPED_STRING %import common.SIGNED_NUMBER %import common.WS %ignore WS """ json_parser = Lark(json_grammar, parser='lalr', maybe_placeholders=False) tree = json_parser.parse(test_json) new_json = Reconstructor(json_parser).reconstruct(tree) self.assertEqual(json.loads(new_json), json.loads(test_json)) if __name__ == '__main__': unittest.main() lark-0.8.1/tests/test_relative_import.lark000066400000000000000000000001441361215331400207300ustar00rootroot00000000000000start: NUMBER WORD %import .grammars.test.NUMBER %import common.WORD %import common.WS %ignore WS lark-0.8.1/tests/test_relative_import_preserves_leading_underscore.lark000066400000000000000000000000701361215331400267400ustar00rootroot00000000000000start: c %import .grammars.leading_underscore_grammar.clark-0.8.1/tests/test_relative_import_rename.lark000066400000000000000000000001441361215331400222570ustar00rootroot00000000000000start: N WORD %import .grammars.test.NUMBER -> N %import common.WORD %import common.WS %ignore WS lark-0.8.1/tests/test_relative_import_rules_dependencies_imported_only_once.lark000066400000000000000000000002461361215331400306230ustar00rootroot00000000000000%import .grammars.three_rules_using_same_token.a %import .grammars.three_rules_using_same_token.b %import .grammars.three_rules_using_same_token.c -> d start: a b d lark-0.8.1/tests/test_relative_import_unicode.lark000066400000000000000000000000661361215331400224410ustar00rootroot00000000000000start: UNICODE %import .grammars.test_unicode.UNICODElark-0.8.1/tests/test_relative_multi_import.lark000066400000000000000000000001111361215331400221340ustar00rootroot00000000000000start: NUMBER WORD %import .grammars.test (NUMBER, WORD, WS) %ignore WS lark-0.8.1/tests/test_relative_rule_import.lark000066400000000000000000000000731361215331400217600ustar00rootroot00000000000000start: X expr Y X: "x" Y: "y" %import .grammars.ab.expr lark-0.8.1/tests/test_relative_rule_import_drop_ignore.lark000066400000000000000000000000731361215331400243470ustar00rootroot00000000000000start: X expr Y X: "x" Y: "y" %import .grammars.ab.expr lark-0.8.1/tests/test_relative_rule_import_rename.lark000066400000000000000000000000771361215331400233130ustar00rootroot00000000000000start: X ab Y X: "x" Y: "y" %import .grammars.ab.expr -> ab lark-0.8.1/tests/test_relative_rule_import_subrule.lark000066400000000000000000000001011361215331400235110ustar00rootroot00000000000000start: X startab Y X: "x" Y: "y" %import .grammars.ab.startab lark-0.8.1/tests/test_relative_rule_import_subrule_no_conflict.lark000066400000000000000000000001151361215331400260730ustar00rootroot00000000000000start: expr expr: X startab Y X: "x" Y: "y" %import .grammars.ab.startab lark-0.8.1/tests/test_tools.py000066400000000000000000000050331361215331400163640ustar00rootroot00000000000000from __future__ import absolute_import import sys from unittest import TestCase, main from lark.tree import Tree from lark.tools import standalone try: from StringIO import StringIO except ImportError: from io import StringIO class TestStandalone(TestCase): def setUp(self): pass def _create_standalone(self, grammar): code_buf = StringIO() temp = sys.stdout sys.stdout = code_buf standalone.main(StringIO(grammar), 'start') sys.stdout = temp code = code_buf.getvalue() context = {} exec(code, context) return context def test_simple(self): grammar = """ start: NUMBER WORD %import common.NUMBER %import common.WORD %import common.WS %ignore WS """ context = self._create_standalone(grammar) _Lark = context['Lark_StandAlone'] l = _Lark() x = l.parse('12 elephants') self.assertEqual(x.children, ['12', 'elephants']) x = l.parse('16 candles') self.assertEqual(x.children, ['16', 'candles']) def test_contextual(self): grammar = """ start: a b a: "A" "B" b: "AB" """ context = self._create_standalone(grammar) _Lark = context['Lark_StandAlone'] l = _Lark() x = l.parse('ABAB') class T(context['Transformer']): def a(self, items): return 'a' def b(self, items): return 'b' start = list x = T().transform(x) self.assertEqual(x, ['a', 'b']) l2 = _Lark(transformer=T()) x = l2.parse('ABAB') self.assertEqual(x, ['a', 'b']) def test_postlex(self): from lark.indenter import Indenter class MyIndenter(Indenter): NL_type = '_NEWLINE' OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE'] CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE'] INDENT_type = '_INDENT' DEDENT_type = '_DEDENT' tab_len = 8 grammar = r""" start: "(" ")" _NEWLINE _NEWLINE: /\n/ """ context = self._create_standalone(grammar) _Lark = context['Lark_StandAlone'] l = _Lark(postlex=MyIndenter()) x = l.parse('()\n') self.assertEqual(x, Tree('start', [])) l = _Lark(postlex=MyIndenter()) x = l.parse('(\n)\n') self.assertEqual(x, Tree('start', [])) if __name__ == '__main__': main() lark-0.8.1/tests/test_trees.py000066400000000000000000000145021361215331400163470ustar00rootroot00000000000000from __future__ import absolute_import import unittest from unittest import TestCase import copy import pickle import functools from lark.tree import Tree from lark.visitors import Visitor, Visitor_Recursive, Transformer, Interpreter, visit_children_decor, v_args, Discard class TestTrees(TestCase): def setUp(self): self.tree1 = Tree('a', [Tree(x, y) for x, y in zip('bcd', 'xyz')]) def test_deepcopy(self): assert self.tree1 == copy.deepcopy(self.tree1) def test_pickle(self): s = copy.deepcopy(self.tree1) data = pickle.dumps(s) assert pickle.loads(data) == s def test_iter_subtrees(self): expected = [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z'), Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')])] nodes = list(self.tree1.iter_subtrees()) self.assertEqual(nodes, expected) def test_iter_subtrees_topdown(self): expected = [Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]), Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')] nodes = list(self.tree1.iter_subtrees_topdown()) self.assertEqual(nodes, expected) def test_visitor(self): class Visitor1(Visitor): def __init__(self): self.nodes=[] def __default__(self,tree): self.nodes.append(tree) class Visitor1_Recursive(Visitor_Recursive): def __init__(self): self.nodes=[] def __default__(self,tree): self.nodes.append(tree) visitor1=Visitor1() visitor1_recursive=Visitor1_Recursive() expected_top_down = [Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]), Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')] expected_botton_up= [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z'), Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')])] visitor1.visit(self.tree1) self.assertEqual(visitor1.nodes,expected_botton_up) visitor1_recursive.visit(self.tree1) self.assertEqual(visitor1_recursive.nodes,expected_botton_up) visitor1.nodes=[] visitor1_recursive.nodes=[] visitor1.visit_topdown(self.tree1) self.assertEqual(visitor1.nodes,expected_top_down) visitor1_recursive.visit_topdown(self.tree1) self.assertEqual(visitor1_recursive.nodes,expected_top_down) def test_interp(self): t = Tree('a', [Tree('b', []), Tree('c', []), 'd']) class Interp1(Interpreter): def a(self, tree): return self.visit_children(tree) + ['e'] def b(self, tree): return 'B' def c(self, tree): return 'C' self.assertEqual(Interp1().visit(t), list('BCde')) class Interp2(Interpreter): @visit_children_decor def a(self, values): return values + ['e'] def b(self, tree): return 'B' def c(self, tree): return 'C' self.assertEqual(Interp2().visit(t), list('BCde')) class Interp3(Interpreter): def b(self, tree): return 'B' def c(self, tree): return 'C' self.assertEqual(Interp3().visit(t), list('BCd')) def test_transformer(self): t = Tree('add', [Tree('sub', [Tree('i', ['3']), Tree('f', ['1.1'])]), Tree('i', ['1'])]) class T(Transformer): i = v_args(inline=True)(int) f = v_args(inline=True)(float) sub = lambda self, values: values[0] - values[1] def add(self, values): return sum(values) res = T().transform(t) self.assertEqual(res, 2.9) @v_args(inline=True) class T(Transformer): i = int f = float sub = lambda self, a, b: a-b def add(self, a, b): return a + b res = T().transform(t) self.assertEqual(res, 2.9) @v_args(inline=True) class T(Transformer): i = int f = float from operator import sub, add res = T().transform(t) self.assertEqual(res, 2.9) def test_vargs(self): @v_args() class MyTransformer(Transformer): @staticmethod def integer(args): return 1 # some code here @classmethod def integer2(cls, args): return 2 # some code here hello = staticmethod(lambda args: 'hello') x = MyTransformer().transform( Tree('integer', [2])) self.assertEqual(x, 1) x = MyTransformer().transform( Tree('integer2', [2])) self.assertEqual(x, 2) x = MyTransformer().transform( Tree('hello', [2])) self.assertEqual(x, 'hello') def test_vargs_override(self): t = Tree('add', [Tree('sub', [Tree('i', ['3']), Tree('f', ['1.1'])]), Tree('i', ['1'])]) @v_args(inline=True) class T(Transformer): i = int f = float sub = lambda self, a, b: a-b not_a_method = {'other': 'stuff'} @v_args(inline=False) def add(self, values): return sum(values) res = T().transform(t) self.assertEqual(res, 2.9) def test_partial(self): tree = Tree("start", [Tree("a", ["test1"]), Tree("b", ["test2"])]) def test(prefix, s, postfix): return prefix + s.upper() + postfix @v_args(inline=True) class T(Transformer): a = functools.partial(test, "@", postfix="!") b = functools.partial(lambda s: s + "!") res = T().transform(tree) assert res.children == ["@TEST1!", "test2!"] def test_discard(self): class MyTransformer(Transformer): def a(self, args): return 1 # some code here def b(cls, args): raise Discard() t = Tree('root', [ Tree('b', []), Tree('a', []), Tree('b', []), Tree('c', []), Tree('b', []), ]) t2 = Tree('root', [1, Tree('c', [])]) x = MyTransformer().transform( t ) self.assertEqual(x, t2) if __name__ == '__main__': unittest.main() lark-0.8.1/tox.ini000066400000000000000000000006601361215331400137650ustar00rootroot00000000000000[tox] envlist = py27, py34, py35, py36, py37, pypy, pypy3 skip_missing_interpreters=true [travis] 2.7 = py27 3.4 = py34 3.5 = py35 3.6 = py36 3.7 = py37 pypy = pypy pypy3 = pypy3 [testenv] whitelist_externals = git deps = -rnearley-requirements.txt # to always force recreation and avoid unexpected side effects recreate=True commands= git submodule sync -q git submodule update --init python -m tests {posargs}