././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7717898 funcparserlib-1.0.0/LICENSE0000644000000000000000000000205414233721657012315 0ustar00Copyright © 2009/2021 Andrey Vlasovskikh Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7717898 funcparserlib-1.0.0/README.md0000644000000000000000000001316514233721657012574 0ustar00Funcparserlib ============= Recursive descent parsing library for Python based on functional combinators. [![PyPI](https://img.shields.io/pypi/v/funcparserlib)](https://pypi.org/project/funcparserlib/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/funcparserlib)](https://pypi.org/project/funcparserlib/) Description ----------- The primary focus of `funcparserlib` is **parsing little languages** or **external DSLs** (domain specific languages). Parsers made with `funcparserlib` are pure-Python LL(\*) parsers. It means that it's **very easy to write parsers** without thinking about lookaheads and other hardcore parsing stuff. However, recursive descent parsing is a rather slow method compared to LL(k) or LR(k) algorithms. Still, parsing with `funcparserlib` is **at least twice faster than PyParsing**, a very popular library for Python. The source code of `funcparserlib` is only 1.2K lines of code, with lots of comments. Its API is fully type hinted. It features the longest parsed prefix error reporting, as well as a tiny lexer generator for token position tracking. The idea of parser combinators used in `funcparserlib` comes from the [Introduction to Functional Programming](https://www.cl.cam.ac.uk/teaching/Lectures/funprog-jrh-1996/) course. We have converted it from ML into Python. Installation ------------ You can install `funcparserlib` from [PyPI](https://pypi.org/project/funcparserlib/): ```shell $ pip install funcparserlib ``` There are no dependencies on other libraries. Documentation ------------- * [Getting Started](https://funcparserlib.pirx.ru/getting-started/) * Your **starting point** with `funcparserlib` * [API Reference](https://funcparserlib.pirx.ru/api/) * Learn the details of the API There are several examples available in the `tests/` directory: * [GraphViz DOT parser](https://github.com/vlasovskikh/funcparserlib/blob/master/tests/dot.py) * [JSON parser](https://github.com/vlasovskikh/funcparserlib/blob/master/tests/json.py) See also [the changelog](https://funcparserlib.pirx.ru/changes/). Example ------- Let's consider a little language of **numeric expressions** with a syntax similar to Python expressions. Here are some expression strings in this language: ``` 0 1 + 2 + 3 -1 + 2 ** 32 3.1415926 * (2 + 7.18281828e-1) * 42 ``` Here is **the complete source code** of the tokenizer and the parser for this language written using `funcparserlib`: ```python from typing import List, Tuple, Union from dataclasses import dataclass from funcparserlib.lexer import make_tokenizer, TokenSpec, Token from funcparserlib.parser import tok, Parser, many, forward_decl, finished @dataclass class BinaryExpr: op: str left: "Expr" right: "Expr" Expr = Union[BinaryExpr, int, float] def tokenize(s: str) -> List[Token]: specs = [ TokenSpec("whitespace", r"\s+"), TokenSpec("float", r"[+\-]?\d+\.\d*([Ee][+\-]?\d+)*"), TokenSpec("int", r"[+\-]?\d+"), TokenSpec("op", r"(\*\*)|[+\-*/()]"), ] tokenizer = make_tokenizer(specs) return [t for t in tokenizer(s) if t.type != "whitespace"] def parse(tokens: List[Token]) -> Expr: int_num = tok("int") >> int float_num = tok("float") >> float number = int_num | float_num expr: Parser[Token, Expr] = forward_decl() parenthesized = -op("(") + expr + -op(")") primary = number | parenthesized power = primary + many(op("**") + primary) >> to_expr term = power + many((op("*") | op("/")) + power) >> to_expr sum = term + many((op("+") | op("-")) + term) >> to_expr expr.define(sum) document = expr + -finished return document.parse(tokens) def op(name: str) -> Parser[Token, str]: return tok("op", name) def to_expr(args: Tuple[Expr, List[Tuple[str, Expr]]]) -> Expr: first, rest = args result = first for op, expr in rest: result = BinaryExpr(op, result, expr) return result ``` Now, consider this numeric expression: `3.1415926 * (2 + 7.18281828e-1) * 42`. Let's `tokenize()` it using the tokenizer we've created with `funcparserlib.lexer`: ``` [ Token('float', '3.1415926'), Token('op', '*'), Token('op', '('), Token('int', '2'), Token('op', '+'), Token('float', '7.18281828e-1'), Token('op', ')'), Token('op', '*'), Token('int', '42'), ] ``` Let's `parse()` these tokens into an expression tree using our parser created with `funcparserlib.parser`: ``` BinaryExpr( op='*', left=BinaryExpr( op='*', left=3.1415926, right=BinaryExpr(op='+', left=2, right=0.718281828), ), right=42, ) ``` Learn how to write this parser using `funcparserlib` in the [Getting Started](https://funcparserlib.pirx.ru/getting-started/) guide! Used By ------- Some open-source projects that use `funcparserlib` as an explicit dependency: * [Hy](https://github.com/hylang/hy), a Lisp dialect that's embedded in Python * 4.2K stars, version `>= 1.0.0a0`, Python 3.7+ * [Splash](https://github.com/scrapinghub/splash), a JavaScript rendering service with HTTP API, by Scrapinghub * 3.6K stars, version `*`. Python 3 in Docker * [graphite-beacon](https://github.com/klen/graphite-beacon), a simple alerting system for Graphite metrics * 459 stars, version `==0.3.6`, Python 2 and 3 * [blockdiag](https://github.com/blockdiag/blockdiag), generates block-diagram image file from spec-text file * 148 stars, version `>= 1.0.0a0`, Python 3.7+ * [kll](https://github.com/kiibohd/kll), Keyboard Layout Language (KLL) compiler * 109 stars, copied source code, Python 3.5+ Next ---- Read the [Getting Started](https://funcparserlib.pirx.ru/getting-started/) guide to start learning `funcparserlib`. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7717898 funcparserlib-1.0.0/funcparserlib/__init__.py0000644000000000000000000000000014233721657016245 0ustar00././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7717898 funcparserlib-1.0.0/funcparserlib/lexer.py0000644000000000000000000001575014233721657015647 0ustar00# -*- coding: utf-8 -*- # Copyright © 2009/2021 Andrey Vlasovskikh # # Permission is hereby granted, free of charge, to any person obtaining a copy of this # software and associated documentation files (the "Software"), to deal in the Software # without restriction, including without limitation the rights to use, copy, modify, # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject to the following # conditions: # # The above copyright notice and this permission notice shall be included in all copies # or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. from __future__ import unicode_literals __all__ = ["make_tokenizer", "TokenSpec", "Token", "LexerError"] import re class LexerError(Exception): def __init__(self, place, msg): self.place = place self.msg = msg def __str__(self): s = "cannot tokenize data" line, pos = self.place return '%s: %d,%d: "%s"' % (s, line, pos, self.msg) class TokenSpec(object): """A token specification for generating a lexer via `make_tokenizer()`.""" def __init__(self, type, pattern, flags=0): """Initialize a `TokenSpec` object. Parameters: type (str): User-defined type of the token (e.g. `"name"`, `"number"`, `"operator"`) pattern (str): Regexp for matching this token type flags (int, optional): Regexp flags, the second argument of `re.compile()` """ self.type = type self.pattern = pattern self.flags = flags def __repr__(self): return "TokenSpec(%r, %r, %r)" % (self.type, self.pattern, self.flags) class Token(object): """A token object that represents a substring of certain type in your text. You can compare tokens for equality using the `==` operator. Tokens also define custom `repr()` and `str()`. Attributes: type (str): User-defined type of the token (e.g. `"name"`, `"number"`, `"operator"`) value (str): Text value of the token start (Optional[Tuple[int, int]]): Start position (_line_, _column_) end (Optional[Tuple[int, int]]): End position (_line_, _column_) """ def __init__(self, type, value, start=None, end=None): """Initialize a `Token` object.""" self.type = type self.value = value self.start = start self.end = end def __repr__(self): return "Token(%r, %r)" % (self.type, self.value) def __eq__(self, other): # FIXME: Case sensitivity is assumed here if other is None: return False else: return self.type == other.type and self.value == other.value def _pos_str(self): if self.start is None or self.end is None: return "" else: sl, sp = self.start el, ep = self.end return "%d,%d-%d,%d:" % (sl, sp, el, ep) def __str__(self): s = "%s %s '%s'" % (self._pos_str(), self.type, self.value) return s.strip() @property def name(self): return self.value def pformat(self): return "%s %s '%s'" % ( self._pos_str().ljust(20), # noqa self.type.ljust(14), self.value, ) def make_tokenizer(specs): # noinspection GrazieInspection """Make a function that tokenizes text based on the regexp specs. Type: `(Sequence[TokenSpec | Tuple]) -> Callable[[str], Iterable[Token]]` A token spec is `TokenSpec` instance. !!! Note For legacy reasons, a token spec may also be a tuple of (_type_, _args_), where _type_ sets the value of `Token.type` for the token, and _args_ are the positional arguments for `re.compile()`: either just (_pattern_,) or (_pattern_, _flags_). It returns a tokenizer function that takes a string and returns an iterable of `Token` objects, or raises `LexerError` if it cannot tokenize the string according to its token specs. Examples: ```pycon >>> tokenize = make_tokenizer([ ... TokenSpec("space", r"\\s+"), ... TokenSpec("id", r"\\w+"), ... TokenSpec("op", r"[,!]"), ... ]) >>> text = "Hello, World!" >>> [t for t in tokenize(text) if t.type != "space"] # noqa [Token('id', 'Hello'), Token('op', ','), Token('id', 'World'), Token('op', '!')] >>> text = "Bye?" >>> list(tokenize(text)) Traceback (most recent call last): ... lexer.LexerError: cannot tokenize data: 1,4: "Bye?" ``` """ compiled = [] for spec in specs: if isinstance(spec, TokenSpec): c = spec.type, re.compile(spec.pattern, spec.flags) else: name, args = spec c = name, re.compile(*args) compiled.append(c) def match_specs(s, i, position): line, pos = position for type, regexp in compiled: m = regexp.match(s, i) if m is not None: value = m.group() nls = value.count("\n") n_line = line + nls if nls == 0: n_pos = pos + len(value) else: n_pos = len(value) - value.rfind("\n") - 1 return Token(type, value, (line, pos + 1), (n_line, n_pos)) else: err_line = s.splitlines()[line - 1] raise LexerError((line, pos + 1), err_line) def f(s): length = len(s) line, pos = 1, 0 i = 0 while i < length: t = match_specs(s, i, (line, pos)) yield t line, pos = t.end i += len(t.value) return f # This is an example of token specs. See also [this article][1] for a # discussion of searching for multiline comments using regexps (including `*?`). # # [1]: http://ostermiller.org/findcomment.html _example_token_specs = [ TokenSpec("COMMENT", r"\(\*(.|[\r\n])*?\*\)", re.MULTILINE), TokenSpec("COMMENT", r"\{(.|[\r\n])*?\}", re.MULTILINE), TokenSpec("COMMENT", r"//.*"), TokenSpec("NL", r"[\r\n]+"), TokenSpec("SPACE", r"[ \t\r\n]+"), TokenSpec("NAME", r"[A-Za-z_][A-Za-z_0-9]*"), TokenSpec("REAL", r"[0-9]+\.[0-9]*([Ee][+\-]?[0-9]+)*"), TokenSpec("INT", r"[0-9]+"), TokenSpec("INT", r"\$[0-9A-Fa-f]+"), TokenSpec("OP", r"(\.\.)|(<>)|(<=)|(>=)|(:=)|[;,=\(\):\[\]\.+\-<>\*/@\^]"), TokenSpec("STRING", r"'([^']|(''))*'"), TokenSpec("CHAR", r"#[0-9]+"), TokenSpec("CHAR", r"#\$[0-9A-Fa-f]+"), ] # tokenize = make_tokenizer(_example_token_specs) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7717898 funcparserlib-1.0.0/funcparserlib/lexer.pyi0000644000000000000000000000147214233721657016014 0ustar00from typing import Tuple, Optional, Callable, Iterable, Text, Sequence _Place = Tuple[int, int] _Spec = Tuple[Text, Tuple] class Token: type: Text value: Text start: Optional[_Place] end: Optional[_Place] name: Text def __init__( self, type: Text, value: Text, start: Optional[_Place] = ..., end: Optional[_Place] = ..., ) -> None: ... def pformat(self) -> Text: ... class TokenSpec: name: Text pattern: Text flags: int def __init__(self, name: Text, pattern: Text, flags: int = ...) -> None: ... def make_tokenizer( specs: Sequence[TokenSpec | _Spec], ) -> Callable[[Text], Iterable[Token]]: ... class LexerError(Exception): place: Tuple[int, int] msg: Text def __init__(self, place: _Place, msg: Text) -> None: ... ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7757902 funcparserlib-1.0.0/funcparserlib/parser.py0000644000000000000000000006155114233721657016024 0ustar00# -*- coding: utf-8 -*- # Copyright © 2009/2021 Andrey Vlasovskikh # # Permission is hereby granted, free of charge, to any person obtaining a copy of this # software and associated documentation files (the "Software"), to deal in the Software # without restriction, including without limitation the rights to use, copy, modify, # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject to the following # conditions: # # The above copyright notice and this permission notice shall be included in all copies # or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. """Functional parsing combinators. Parsing combinators define an internal domain-specific language (DSL) for describing the parsing rules of a grammar. The DSL allows you to start with a few primitive parsers, then combine your parsers to get more complex ones, and finally cover the whole grammar you want to parse. The structure of the language: * Class `Parser` * All the primitives and combinators of the language return `Parser` objects * It defines the main `Parser.parse(tokens)` method * Primitive parsers * `tok(type, value)`, `a(value)`, `some(pred)`, `forward_decl()`, `finished` * Parser combinators * `p1 + p2`, `p1 | p2`, `p >> f`, `-p`, `maybe(p)`, `many(p)`, `oneplus(p)`, `skip(p)` * Abstraction * Use regular Python variables `p = ... # Expression of type Parser` to define new rules (non-terminals) of your grammar Every time you apply one of the combinators, you get a new `Parser` object. In other words, the set of `Parser` objects is closed under the means of combination. !!! Note We took the parsing combinators language from the book [Introduction to Functional Programming][1] and translated it from ML into Python. [1]: https://www.cl.cam.ac.uk/teaching/Lectures/funprog-jrh-1996/ """ from __future__ import unicode_literals __all__ = [ "some", "a", "tok", "many", "pure", "finished", "maybe", "skip", "oneplus", "forward_decl", "NoParseError", "Parser", ] import sys import logging import warnings from funcparserlib.lexer import Token log = logging.getLogger("funcparserlib") debug = False if sys.version_info < (3,): string_types = (str, unicode) # noqa else: string_types = str class Parser(object): """A parser object that can parse a sequence of tokens or can be combined with other parsers using `+`, `|`, `>>`, `many()`, and other parsing combinators. Type: `Parser[A, B]` The generic variables in the type are: `A` — the type of the tokens in the sequence to parse,`B` — the type of the parsed value. In order to define a parser for your grammar: 1. You start with primitive parsers by calling `a(value)`, `some(pred)`, `forward_decl()`, `finished` 2. You use parsing combinators `p1 + p2`, `p1 | p2`, `p >> f`, `many(p)`, and others to combine parsers into a more complex parser 3. You can assign complex parsers to variables to define names that correspond to the rules of your grammar !!! Note The constructor `Parser.__init__()` is considered **internal** and may be changed in future versions. Use primitive parsers and parsing combinators to construct new parsers. """ def __init__(self, p): """Wrap the parser function `p` into a `Parser` object.""" self.name = "" self.define(p) def named(self, name): # noinspection GrazieInspection """Specify the name of the parser for easier debugging. Type: `(str) -> Parser[A, B]` This name is used in the debug-level parsing log. You can also get it via the `Parser.name` attribute. Examples: ```pycon >>> expr = (a("x") + a("y")).named("expr") >>> expr.name 'expr' ``` ```pycon >>> expr = a("x") + a("y") >>> expr.name "('x', 'y')" ``` !!! Note You can enable the parsing log this way: ```python import logging logging.basicConfig(level=logging.DEBUG) import funcparserlib.parser funcparserlib.parser.debug = True ``` The way to enable the parsing log may be changed in future versions. """ self.name = name return self def define(self, p): """Define the parser created earlier as a forward declaration. Type: `(Parser[A, B]) -> None` Use `p = forward_decl()` in combination with `p.define(...)` to define recursive parsers. See the examples in the docs for `forward_decl()`. """ f = getattr(p, "run", p) if debug: setattr(self, "_run", f) else: setattr(self, "run", f) self.named(getattr(p, "name", p.__doc__)) def run(self, tokens, s): """Run the parser against the tokens with the specified parsing state. Type: `(Sequence[A], State) -> Tuple[B, State]` The parsing state includes the current position in the sequence being parsed, and the position of the rightmost token that has been consumed while parsing for better error messages. If the parser fails to parse the tokens, it raises `NoParseError`. !!! Warning This is method is **internal** and may be changed in future versions. Use `Parser.parse(tokens)` instead and let the parser object take care of updating the parsing state. """ if debug: log.debug("trying %s" % self.name) return self._run(tokens, s) # noqa def _run(self, tokens, s): raise NotImplementedError("you must define() a parser") def parse(self, tokens): """Parse the sequence of tokens and return the parsed value. Type: `(Sequence[A]) -> B` It takes a sequence of tokens of arbitrary type `A` and returns the parsed value of arbitrary type `B`. If the parser fails to parse the tokens, it raises `NoParseError`. !!! Note Although `Parser.parse()` can parse sequences of any objects (including `str` which is a sequence of `str` chars), **the recommended way** is parsing sequences of `Token` objects. You **should** use a regexp-based tokenizer `make_tokenizer()` defined in `funcparserlib.lexer` to convert your text into a sequence of `Token` objects before parsing it. You will get more readable parsing error messages (as `Token` objects contain their position in the source file) and good separation of the lexical and syntactic levels of the grammar. """ try: (tree, _) = self.run(tokens, State(0, 0, None)) return tree except NoParseError as e: max = e.state.max if len(tokens) > max: t = tokens[max] if isinstance(t, Token): if t.start is None or t.end is None: loc = "" else: s_line, s_pos = t.start e_line, e_pos = t.end loc = "%d,%d-%d,%d: " % (s_line, s_pos, e_line, e_pos) msg = "%s%s: %r" % (loc, e.msg, t.value) elif isinstance(t, string_types): msg = "%s: %r" % (e.msg, t) else: msg = "%s: %s" % (e.msg, t) else: msg = "got unexpected end of input" if e.state.parser is not None: msg = "%s, expected: %s" % (msg, e.state.parser.name) e.msg = msg raise def __add__(self, other): """Sequential combination of parsers. It runs this parser, then the other parser. The return value of the resulting parser is a tuple of each parsed value in the sum of parsers. We merge all parsing results of `p1 + p2 + ... + pN` into a single tuple. It means that the parsing result may be a 2-tuple, a 3-tuple, a 4-tuple, etc. of parsed values. You avoid this by transforming the parsed pair into a new value using the `>>` combinator. You can also skip some parsing results in the resulting parsers by using `-p` or `skip(p)` for some parsers in your sum of parsers. It means that the parsing result might be a single value, not a tuple of parsed values. See the docs for `Parser.__neg__()` for more examples. Overloaded types (lots of them to provide stricter checking for the quite dynamic return type of this method): * `(self: Parser[A, B], _IgnoredParser[A]) -> Parser[A, B]` * `(self: Parser[A, B], Parser[A, C]) -> _TupleParser[A, Tuple[B, C]]` * `(self: _TupleParser[A, B], _IgnoredParser[A]) -> _TupleParser[A, B]` * `(self: _TupleParser[A, B], Parser[A, Any]) -> Parser[A, Any]` * `(self: _IgnoredParser[A], _IgnoredParser[A]) -> _IgnoredParser[A]` * `(self: _IgnoredParser[A], Parser[A, C]) -> Parser[A, C]` Examples: ```pycon >>> expr = a("x") + a("y") >>> expr.parse("xy") ('x', 'y') ``` ```pycon >>> expr = a("x") + a("y") + a("z") >>> expr.parse("xyz") ('x', 'y', 'z') ``` ```pycon >>> expr = a("x") + a("y") >>> expr.parse("xz") Traceback (most recent call last): ... parser.NoParseError: got unexpected token: 'z', expected: 'y' ``` """ def magic(v1, v2): if isinstance(v1, _Tuple): return _Tuple(v1 + (v2,)) else: return _Tuple((v1, v2)) @_TupleParser def _add(tokens, s): (v1, s2) = self.run(tokens, s) (v2, s3) = other.run(tokens, s2) return magic(v1, v2), s3 @Parser def ignored_right(tokens, s): v, s2 = self.run(tokens, s) _, s3 = other.run(tokens, s2) return v, s3 name = "(%s, %s)" % (self.name, other.name) if isinstance(other, _IgnoredParser): return ignored_right.named(name) else: return _add.named(name) def __or__(self, other): """Choice combination of parsers. It runs this parser and returns its result. If the parser fails, it runs the other parser. Examples: ```pycon >>> expr = a("x") | a("y") >>> expr.parse("x") 'x' >>> expr.parse("y") 'y' >>> expr.parse("z") Traceback (most recent call last): ... parser.NoParseError: got unexpected token: 'z', expected: 'x' or 'y' ``` """ @Parser def _or(tokens, s): try: return self.run(tokens, s) except NoParseError as e: state = e.state try: return other.run(tokens, State(s.pos, state.max, state.parser)) except NoParseError as e: if s.pos == e.state.max: e.state = State(e.state.pos, e.state.max, _or) raise _or.name = "%s or %s" % (self.name, other.name) return _or def __rshift__(self, f): """Transform the parsing result by applying the specified function. Type: `(Callable[[B], C]) -> Parser[A, C]` You can use it for transforming the parsed value into another value before including it into the parse tree (the AST). Examples: ```pycon >>> def make_canonical_name(s): ... return s.lower() >>> expr = (a("D") | a("d")) >> make_canonical_name >>> expr.parse("D") 'd' >>> expr.parse("d") 'd' ``` """ @Parser def _shift(tokens, s): (v, s2) = self.run(tokens, s) return f(v), s2 return _shift.named(self.name) def bind(self, f): """Bind the parser to a monadic function that returns a new parser. Type: `(Callable[[B], Parser[A, C]]) -> Parser[A, C]` Also known as `>>=` in Haskell. !!! Note You can parse any context-free grammar without resorting to `bind`. Due to its poor performance please use it only when you really need it. """ @Parser def _bind(tokens, s): (v, s2) = self.run(tokens, s) return f(v).run(tokens, s2) _bind.name = "(%s >>=)" % (self.name,) return _bind def __neg__(self): """Return a parser that parses the same tokens, but its parsing result is ignored by the sequential `+` combinator. Type: `(Parser[A, B]) -> _IgnoredParser[A]` You can use it for throwing away elements of concrete syntax (e.g. `","`, `";"`). Examples: ```pycon >>> expr = -a("x") + a("y") >>> expr.parse("xy") 'y' ``` ```pycon >>> expr = a("x") + -a("y") >>> expr.parse("xy") 'x' ``` ```pycon >>> expr = a("x") + -a("y") + a("z") >>> expr.parse("xyz") ('x', 'z') ``` ```pycon >>> expr = -a("x") + a("y") + -a("z") >>> expr.parse("xyz") 'y' ``` ```pycon >>> expr = -a("x") + a("y") >>> expr.parse("yz") Traceback (most recent call last): ... parser.NoParseError: got unexpected token: 'y', expected: 'x' ``` ```pycon >>> expr = a("x") + -a("y") >>> expr.parse("xz") Traceback (most recent call last): ... parser.NoParseError: got unexpected token: 'z', expected: 'y' ``` !!! Note You **should not** pass the resulting parser to any combinators other than `+`. You **should** have at least one non-skipped value in your `p1 + p2 + ... + pN`. The parsed value of `-p` is an **internal** `_Ignored` object, not intended for actual use. """ return _IgnoredParser(self) def __class_getitem__(cls, key): return cls class State(object): """Parsing state that is maintained basically for error reporting. It consists of the current position `pos` in the sequence being parsed, and the position `max` of the rightmost token that has been consumed while parsing. """ def __init__(self, pos, max, parser=None): self.pos = pos self.max = max self.parser = parser def __str__(self): return str((self.pos, self.max)) def __repr__(self): return "State(%r, %r)" % (self.pos, self.max) class NoParseError(Exception): def __init__(self, msg, state): self.msg = msg self.state = state def __str__(self): return self.msg class _Tuple(tuple): pass class _TupleParser(Parser): pass class _Ignored(object): def __init__(self, value): self.value = value def __repr__(self): return "_Ignored(%s)" % repr(self.value) def __eq__(self, other): return isinstance(other, _Ignored) and self.value == other.value @Parser def finished(tokens, s): """A parser that throws an exception if there are any unparsed tokens left in the sequence.""" if s.pos >= len(tokens): return None, s else: s2 = State(s.pos, s.max, finished if s.pos == s.max else s.parser) raise NoParseError("got unexpected token", s2) finished.name = "end of input" def many(p): """Return a parser that applies the parser `p` as many times as it succeeds at parsing the tokens. Return a parser that infinitely applies the parser `p` to the input sequence of tokens as long as it successfully parses them. The parsed value is a list of the sequentially parsed values. Examples: ```pycon >>> expr = many(a("x")) >>> expr.parse("x") ['x'] >>> expr.parse("xx") ['x', 'x'] >>> expr.parse("xxxy") # noqa ['x', 'x', 'x'] >>> expr.parse("y") [] ``` """ @Parser def _many(tokens, s): res = [] try: while True: (v, s) = p.run(tokens, s) res.append(v) except NoParseError as e: s2 = State(s.pos, e.state.max, e.state.parser) if debug: log.debug( "*matched* %d instances of %s, new state = %s" % (len(res), _many.name, s2) ) return res, s2 _many.name = "{ %s }" % p.name return _many def some(pred): """Return a parser that parses a token if it satisfies the predicate `pred`. Type: `(Callable[[A], bool]) -> Parser[A, A]` Examples: ```pycon >>> expr = some(lambda s: s.isalpha()).named('alpha') >>> expr.parse("x") 'x' >>> expr.parse("y") 'y' >>> expr.parse("1") Traceback (most recent call last): ... parser.NoParseError: got unexpected token: '1', expected: alpha ``` !!! Warning The `some()` combinator is quite slow and may be changed or removed in future versions. If you need a parser for a token by its type (e.g. any identifier) and maybe its value, use `tok(type[, value])` instead. You should use `make_tokenizer()` from `funcparserlib.lexer` to tokenize your text first. """ @Parser def _some(tokens, s): if s.pos >= len(tokens): s2 = State(s.pos, s.max, _some if s.pos == s.max else s.parser) raise NoParseError("got unexpected end of input", s2) else: t = tokens[s.pos] if pred(t): pos = s.pos + 1 s2 = State(pos, max(pos, s.max), s.parser) if debug: log.debug("*matched* %r, new state = %s" % (t, s2)) return t, s2 else: s2 = State(s.pos, s.max, _some if s.pos == s.max else s.parser) if debug: log.debug( "failed %r, state = %s, expected = %s" % (t, s2, s2.parser.name) ) raise NoParseError("got unexpected token", s2) _some.name = "some(...)" return _some def a(value): """Return a parser that parses a token if it's equal to `value`. Type: `(A) -> Parser[A, A]` Examples: ```pycon >>> expr = a("x") >>> expr.parse("x") 'x' >>> expr.parse("y") Traceback (most recent call last): ... parser.NoParseError: got unexpected token: 'y', expected: 'x' ``` !!! Note Although `Parser.parse()` can parse sequences of any objects (including `str` which is a sequence of `str` chars), **the recommended way** is parsing sequences of `Token` objects. You **should** use a regexp-based tokenizer `make_tokenizer()` defined in `funcparserlib.lexer` to convert your text into a sequence of `Token` objects before parsing it. You will get more readable parsing error messages (as `Token` objects contain their position in the source file) and good separation of the lexical and syntactic levels of the grammar. """ name = getattr(value, "name", value) return some(lambda t: t == value).named(repr(name)) def tok(type, value=None): """Return a parser that parses a `Token` and returns the string value of the token. Type: `(str, Optional[str]) -> Parser[Token, str]` You can match any token of the specified `type` or you can match a specific token by its `type` and `value`. Examples: ```pycon >>> expr = tok("expr") >>> expr.parse([Token("expr", "foo")]) 'foo' >>> expr.parse([Token("expr", "bar")]) 'bar' >>> expr.parse([Token("op", "=")]) Traceback (most recent call last): ... parser.NoParseError: got unexpected token: '=', expected: expr ``` ```pycon >>> expr = tok("op", "=") >>> expr.parse([Token("op", "=")]) '=' >>> expr.parse([Token("op", "+")]) Traceback (most recent call last): ... parser.NoParseError: got unexpected token: '+', expected: '=' ``` !!! Note In order to convert your text to parse into a sequence of `Token` objects, use a regexp-based tokenizer `make_tokenizer()` defined in `funcparserlib.lexer`. You will get more readable parsing error messages (as `Token` objects contain their position in the source file) and good separation of the lexical and syntactic levels of the grammar. """ if value is not None: p = a(Token(type, value)) else: p = some(lambda t: t.type == type).named(type) return (p >> (lambda t: t.value)).named(p.name) def pure(x): """Wrap any object into a parser. Type: `(A) -> Parser[A, A]` A pure parser doesn't touch the tokens sequence, it just returns its pure `x` value. Also known as `return` in Haskell. """ @Parser def _pure(_, s): return x, s _pure.name = "(pure %r)" % (x,) return _pure def maybe(p): """Return a parser that returns `None` if the parser `p` fails. Examples: ```pycon >>> expr = maybe(a("x")) >>> expr.parse("x") 'x' >>> expr.parse("y") is None True ``` """ return (p | pure(None)).named("[ %s ]" % (p.name,)) def skip(p): """An alias for `-p`. See also the docs for `Parser.__neg__()`. """ return -p class _IgnoredParser(Parser): def __init__(self, p): super(_IgnoredParser, self).__init__(p) run = self._run if debug else self.run def ignored(tokens, s): v, s2 = run(tokens, s) return v if isinstance(v, _Ignored) else _Ignored(v), s2 self.define(ignored) self.name = getattr(p, "name", p.__doc__) def __add__(self, other): def ignored_left(tokens, s): _, s2 = self.run(tokens, s) v, s3 = other.run(tokens, s2) return v, s3 if isinstance(other, _IgnoredParser): return _IgnoredParser(ignored_left).named( "(%s, %s)" % (self.name, other.name) ) else: return Parser(ignored_left).named("(%s, %s)" % (self.name, other.name)) def oneplus(p): """Return a parser that applies the parser `p` one or more times. A similar parser combinator `many(p)` means apply `p` zero or more times, whereas `oneplus(p)` means apply `p` one or more times. Examples: ```pycon >>> expr = oneplus(a("x")) >>> expr.parse("x") ['x'] >>> expr.parse("xx") ['x', 'x'] >>> expr.parse("y") Traceback (most recent call last): ... parser.NoParseError: got unexpected token: 'y', expected: 'x' ``` """ @Parser def _oneplus(tokens, s): (v1, s2) = p.run(tokens, s) (v2, s3) = many(p).run(tokens, s2) return [v1] + v2, s3 _oneplus.name = "(%s, { %s })" % (p.name, p.name) return _oneplus def with_forward_decls(suspension): warnings.warn( "Use forward_decl() instead:\n" "\n" " p = forward_decl()\n" " ...\n" " p.define(parser_value)\n", DeprecationWarning, ) @Parser def f(tokens, s): return suspension().run(tokens, s) return f def forward_decl(): """Return an undefined parser that can be used as a forward declaration. Type: `Parser[Any, Any]` Use `p = forward_decl()` in combination with `p.define(...)` to define recursive parsers. Examples: ```pycon >>> expr = forward_decl() >>> expr.define(a("x") + maybe(expr) + a("y")) >>> expr.parse("xxyy") # noqa ('x', ('x', None, 'y'), 'y') >>> expr.parse("xxy") Traceback (most recent call last): ... parser.NoParseError: got unexpected end of input, expected: 'y' ``` !!! Note If you care about static types, you should add a type hint for your forward declaration, so that your type checker can check types in `p.define(...)` later: ```python p: Parser[str, int] = forward_decl() p.define(a("x")) # Type checker error p.define(a("1") >> int) # OK ``` """ @Parser def f(_tokens, _s): raise NotImplementedError("you must define() a forward_decl somewhere") f.name = "forward_decl()" return f if __name__ == "__main__": import doctest doctest.testmod() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7757902 funcparserlib-1.0.0/funcparserlib/parser.pyi0000644000000000000000000000517514233721657016175 0ustar00from typing import ( Optional, Generic, TypeVar, Union, Callable, Tuple, Sequence, Any, List, Text, overload, ) from funcparserlib.lexer import Token _A = TypeVar("_A") _B = TypeVar("_B") _C = TypeVar("_C") _D = TypeVar("_D") class State: pos: int max: int parser: Union[Parser, _ParserCallable, None] def __init__( self, pos: int, max: int, parser: Union[Parser, _ParserCallable, None] = ..., ) -> None: ... _ParserCallable = Callable[[_A, State], Tuple[_B, State]] class Parser(Generic[_A, _B]): name: Text def __init__(self, p: Union[Parser[_A, _B], _ParserCallable]) -> None: ... def named(self, name: Text) -> Parser[_A, _B]: ... def define(self, p: Union[Parser[_A, _B], _ParserCallable]) -> None: ... def run(self, tokens: Sequence[_A], s: State) -> Tuple[_B, State]: ... def parse(self, tokens: Sequence[_A]) -> _B: ... @overload def __add__( # type: ignore[misc] self, other: _IgnoredParser[_A] ) -> Parser[_A, _B]: ... @overload def __add__(self, other: Parser[_A, _C]) -> _TupleParser[_A, Tuple[_B, _C]]: ... def __or__(self, other: Parser[_A, _C]) -> Parser[_A, Union[_B, _C]]: ... def __rshift__(self, f: Callable[[_B], _C]) -> Parser[_A, _C]: ... def bind(self, f: Callable[[_B], Parser[_A, _C]]) -> Parser[_A, _C]: ... def __neg__(self) -> _IgnoredParser[_A]: ... class _Ignored: value: Any def __init__(self, value: Any) -> None: ... class _IgnoredParser(Parser[_A, _Ignored]): @overload # type: ignore[override] def __add__(self, other: _IgnoredParser[_A]) -> _IgnoredParser[_A]: ... @overload # type: ignore[override] def __add__(self, other: Parser[_A, _C]) -> Parser[_A, _C]: ... class _TupleParser(Parser[_A, _B]): @overload # type: ignore[override] def __add__(self, other: _IgnoredParser[_A]) -> _TupleParser[_A, _B]: ... @overload def __add__(self, other: Parser[_A, Any]) -> Parser[_A, Any]: ... finished: Parser[Any, None] def many(p: Parser[_A, _B]) -> Parser[_A, List[_B]]: ... def some(pred: Callable[[_A], bool]) -> Parser[_A, _A]: ... def a(value: _A) -> Parser[_A, _A]: ... def tok(type: Text, value: Optional[Text] = ...) -> Parser[Token, Text]: ... def pure(x: _A) -> Parser[_A, _A]: ... def maybe(p: Parser[_A, _B]) -> Parser[_A, Optional[_B]]: ... def skip(p: Parser[_A, Any]) -> _IgnoredParser[_A]: ... def oneplus(p: Parser[_A, _B]) -> Parser[_A, List[_B]]: ... def forward_decl() -> Parser[Any, Any]: ... class NoParseError(Exception): msg: Text state: State def __init__(self, msg: Text, state: State) -> None: ... ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7757902 funcparserlib-1.0.0/funcparserlib/py.typed0000644000000000000000000000000014233721657015633 0ustar00././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7757902 funcparserlib-1.0.0/funcparserlib/util.py0000644000000000000000000000513414233721657015500 0ustar00# -*- coding: utf-8 -*- # Copyright © 2009/2021 Andrey Vlasovskikh # # Permission is hereby granted, free of charge, to any person obtaining a copy of this # software and associated documentation files (the "Software"), to deal in the Software # without restriction, including without limitation the rights to use, copy, modify, # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject to the following # conditions: # # The above copyright notice and this permission notice shall be included in all copies # or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. from __future__ import unicode_literals def pretty_tree(x, kids, show): """Return a pseudo-graphic tree representation of the object `x` similar to the `tree` command in Unix. Type: `(T, Callable[[T], List[T]], Callable[[T], str]) -> str` It applies the parameter `show` (which is a function of type `(T) -> str`) to get a textual representation of the objects to show. It applies the parameter `kids` (which is a function of type `(T) -> List[T]`) to list the children of the object to show. Examples: ```pycon >>> print(pretty_tree( ... ["foo", ["bar", "baz"], "quux"], ... lambda obj: obj if isinstance(obj, list) else [], ... lambda obj: "[]" if isinstance(obj, list) else str(obj), ... )) [] |-- foo |-- [] | |-- bar | `-- baz `-- quux ``` """ (MID, END, CONT, LAST, ROOT) = ("|-- ", "`-- ", "| ", " ", "") def rec(obj, indent, sym): line = indent + sym + show(obj) obj_kids = kids(obj) if len(obj_kids) == 0: return line else: if sym == MID: next_indent = indent + CONT elif sym == ROOT: next_indent = indent + ROOT else: next_indent = indent + LAST chars = [MID] * (len(obj_kids) - 1) + [END] lines = [rec(kid, next_indent, sym) for kid, sym in zip(obj_kids, chars)] return "\n".join([line] + lines) return rec(x, "", ROOT) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7757902 funcparserlib-1.0.0/funcparserlib/util.pyi0000644000000000000000000000025414233721657015647 0ustar00from typing import TypeVar, Callable, List, Text _A = TypeVar("_A") def pretty_tree( x: _A, kids: Callable[[_A], List[_A]], show: Callable[[_A], Text] ) -> Text: ... ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483566.7757902 funcparserlib-1.0.0/pyproject.toml0000644000000000000000000000261214233721657014224 0ustar00[tool.poetry] name = "funcparserlib" version = "1.0.0" description = "Recursive descent parsing library based on functional combinators" authors = ["Andrey Vlasovskikh "] license = "MIT" readme = "README.md" homepage = "https://funcparserlib.pirx.ru" classifiers = [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", ] [tool.poetry.dependencies] python = "~2.7 || ^3.7" [tool.poetry.dev-dependencies] six = "^1.15.0" typing = {version = "^3.7.4", python = "~2.7"} mypy = {version = "~0.942", python = "^3.7"} pre-commit = {version = "^2.11.1", python = "^3.7"} black = {extras = ["d"], version = ">=22.3.0", python = "^3.7"} flake8 = {version = "^3.9.2", python = "^3.7"} tox = {version = "^3.23.0", python = "^3.7"} mkdocs = {version = "^1.3.0", python = "^3.7"} mkdocs-material = {version = "^8.2.9", python = "^3.7"} mkdocstrings = {extras = ["python-legacy"], version = "^0.18.1", python = "^3.7"} [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api" ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483579.2097456 funcparserlib-1.0.0/setup.py0000644000000000000000000001500214233721673013015 0ustar00# -*- coding: utf-8 -*- from setuptools import setup packages = \ ['funcparserlib'] package_data = \ {'': ['*']} setup_kwargs = { 'name': 'funcparserlib', 'version': '1.0.0', 'description': 'Recursive descent parsing library based on functional combinators', 'long_description': 'Funcparserlib\n=============\n\nRecursive descent parsing library for Python based on functional combinators.\n\n[![PyPI](https://img.shields.io/pypi/v/funcparserlib)](https://pypi.org/project/funcparserlib/)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/funcparserlib)](https://pypi.org/project/funcparserlib/)\n\n\nDescription\n-----------\n\nThe primary focus of `funcparserlib` is **parsing little languages** or **external DSLs** (domain specific languages).\n\nParsers made with `funcparserlib` are pure-Python LL(\\*) parsers. It means that it\'s **very easy to write parsers** without thinking about lookaheads and other hardcore parsing stuff. However, recursive descent parsing is a rather slow method compared to LL(k) or LR(k) algorithms. Still, parsing with `funcparserlib` is **at least twice faster than PyParsing**, a very popular library for Python.\n\nThe source code of `funcparserlib` is only 1.2K lines of code, with lots of comments. Its API is fully type hinted. It features the longest parsed prefix error reporting, as well as a tiny lexer generator for token position tracking.\n\nThe idea of parser combinators used in `funcparserlib` comes from the [Introduction to Functional Programming](https://www.cl.cam.ac.uk/teaching/Lectures/funprog-jrh-1996/) course. We have converted it from ML into Python.\n\n\nInstallation\n------------\n\nYou can install `funcparserlib` from [PyPI](https://pypi.org/project/funcparserlib/):\n\n```shell\n$ pip install funcparserlib\n```\n\nThere are no dependencies on other libraries.\n\n\nDocumentation\n-------------\n\n* [Getting Started](https://funcparserlib.pirx.ru/getting-started/)\n * Your **starting point** with `funcparserlib`\n* [API Reference](https://funcparserlib.pirx.ru/api/)\n * Learn the details of the API\n\nThere are several examples available in the `tests/` directory:\n\n* [GraphViz DOT parser](https://github.com/vlasovskikh/funcparserlib/blob/master/tests/dot.py)\n* [JSON parser](https://github.com/vlasovskikh/funcparserlib/blob/master/tests/json.py)\n\nSee also [the changelog](https://funcparserlib.pirx.ru/changes/).\n\n\nExample\n-------\n\nLet\'s consider a little language of **numeric expressions** with a syntax similar to Python expressions. Here are some expression strings in this language:\n\n```\n0\n1 + 2 + 3\n-1 + 2 ** 32\n3.1415926 * (2 + 7.18281828e-1) * 42\n```\n\n\nHere is **the complete source code** of the tokenizer and the parser for this language written using `funcparserlib`:\n\n```python\nfrom typing import List, Tuple, Union\nfrom dataclasses import dataclass\n\nfrom funcparserlib.lexer import make_tokenizer, TokenSpec, Token\nfrom funcparserlib.parser import tok, Parser, many, forward_decl, finished\n\n\n@dataclass\nclass BinaryExpr:\n op: str\n left: "Expr"\n right: "Expr"\n\n\nExpr = Union[BinaryExpr, int, float]\n\n\ndef tokenize(s: str) -> List[Token]:\n specs = [\n TokenSpec("whitespace", r"\\s+"),\n TokenSpec("float", r"[+\\-]?\\d+\\.\\d*([Ee][+\\-]?\\d+)*"),\n TokenSpec("int", r"[+\\-]?\\d+"),\n TokenSpec("op", r"(\\*\\*)|[+\\-*/()]"),\n ]\n tokenizer = make_tokenizer(specs)\n return [t for t in tokenizer(s) if t.type != "whitespace"]\n\n\ndef parse(tokens: List[Token]) -> Expr:\n int_num = tok("int") >> int\n float_num = tok("float") >> float\n number = int_num | float_num\n\n expr: Parser[Token, Expr] = forward_decl()\n parenthesized = -op("(") + expr + -op(")")\n primary = number | parenthesized\n power = primary + many(op("**") + primary) >> to_expr\n term = power + many((op("*") | op("/")) + power) >> to_expr\n sum = term + many((op("+") | op("-")) + term) >> to_expr\n expr.define(sum)\n\n document = expr + -finished\n\n return document.parse(tokens)\n\n\ndef op(name: str) -> Parser[Token, str]:\n return tok("op", name)\n\n\ndef to_expr(args: Tuple[Expr, List[Tuple[str, Expr]]]) -> Expr:\n first, rest = args\n result = first\n for op, expr in rest:\n result = BinaryExpr(op, result, expr)\n return result\n```\n\nNow, consider this numeric expression: `3.1415926 * (2 + 7.18281828e-1) * 42`.\n\nLet\'s `tokenize()` it using the tokenizer we\'ve created with `funcparserlib.lexer`:\n\n```\n[\n Token(\'float\', \'3.1415926\'),\n Token(\'op\', \'*\'),\n Token(\'op\', \'(\'),\n Token(\'int\', \'2\'),\n Token(\'op\', \'+\'),\n Token(\'float\', \'7.18281828e-1\'),\n Token(\'op\', \')\'),\n Token(\'op\', \'*\'),\n Token(\'int\', \'42\'),\n]\n```\n\nLet\'s `parse()` these tokens into an expression tree using our parser created with `funcparserlib.parser`:\n\n```\nBinaryExpr(\n op=\'*\',\n left=BinaryExpr(\n op=\'*\',\n left=3.1415926,\n right=BinaryExpr(op=\'+\', left=2, right=0.718281828),\n ),\n right=42,\n)\n```\n\nLearn how to write this parser using `funcparserlib` in the [Getting Started](https://funcparserlib.pirx.ru/getting-started/) guide!\n\n\nUsed By\n-------\n\nSome open-source projects that use `funcparserlib` as an explicit dependency:\n\n* [Hy](https://github.com/hylang/hy), a Lisp dialect that\'s embedded in Python\n * 4.2K stars, version `>= 1.0.0a0`, Python 3.7+\n* [Splash](https://github.com/scrapinghub/splash), a JavaScript rendering service with HTTP API, by Scrapinghub\n * 3.6K stars, version `*`. Python 3 in Docker\n* [graphite-beacon](https://github.com/klen/graphite-beacon), a simple alerting system for Graphite metrics\n * 459 stars, version `==0.3.6`, Python 2 and 3\n* [blockdiag](https://github.com/blockdiag/blockdiag), generates block-diagram image file from spec-text file\n * 148 stars, version `>= 1.0.0a0`, Python 3.7+\n* [kll](https://github.com/kiibohd/kll), Keyboard Layout Language (KLL) compiler\n * 109 stars, copied source code, Python 3.5+\n\n\nNext\n----\n\nRead the [Getting Started](https://funcparserlib.pirx.ru/getting-started/) guide to start learning `funcparserlib`.\n', 'author': 'Andrey Vlasovskikh', 'author_email': 'andrey.vlasovskikh@gmail.com', 'maintainer': None, 'maintainer_email': None, 'url': 'https://funcparserlib.pirx.ru', 'packages': packages, 'package_data': package_data, 'python_requires': '>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*', } setup(**setup_kwargs) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1651483579.2108488 funcparserlib-1.0.0/PKG-INFO0000644000000000000000000001510114233721673012400 0ustar00Metadata-Version: 2.1 Name: funcparserlib Version: 1.0.0 Summary: Recursive descent parsing library based on functional combinators Home-page: https://funcparserlib.pirx.ru License: MIT Author: Andrey Vlasovskikh Author-email: andrey.vlasovskikh@gmail.com Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.* Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: MIT License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Description-Content-Type: text/markdown Funcparserlib ============= Recursive descent parsing library for Python based on functional combinators. [![PyPI](https://img.shields.io/pypi/v/funcparserlib)](https://pypi.org/project/funcparserlib/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/funcparserlib)](https://pypi.org/project/funcparserlib/) Description ----------- The primary focus of `funcparserlib` is **parsing little languages** or **external DSLs** (domain specific languages). Parsers made with `funcparserlib` are pure-Python LL(\*) parsers. It means that it's **very easy to write parsers** without thinking about lookaheads and other hardcore parsing stuff. However, recursive descent parsing is a rather slow method compared to LL(k) or LR(k) algorithms. Still, parsing with `funcparserlib` is **at least twice faster than PyParsing**, a very popular library for Python. The source code of `funcparserlib` is only 1.2K lines of code, with lots of comments. Its API is fully type hinted. It features the longest parsed prefix error reporting, as well as a tiny lexer generator for token position tracking. The idea of parser combinators used in `funcparserlib` comes from the [Introduction to Functional Programming](https://www.cl.cam.ac.uk/teaching/Lectures/funprog-jrh-1996/) course. We have converted it from ML into Python. Installation ------------ You can install `funcparserlib` from [PyPI](https://pypi.org/project/funcparserlib/): ```shell $ pip install funcparserlib ``` There are no dependencies on other libraries. Documentation ------------- * [Getting Started](https://funcparserlib.pirx.ru/getting-started/) * Your **starting point** with `funcparserlib` * [API Reference](https://funcparserlib.pirx.ru/api/) * Learn the details of the API There are several examples available in the `tests/` directory: * [GraphViz DOT parser](https://github.com/vlasovskikh/funcparserlib/blob/master/tests/dot.py) * [JSON parser](https://github.com/vlasovskikh/funcparserlib/blob/master/tests/json.py) See also [the changelog](https://funcparserlib.pirx.ru/changes/). Example ------- Let's consider a little language of **numeric expressions** with a syntax similar to Python expressions. Here are some expression strings in this language: ``` 0 1 + 2 + 3 -1 + 2 ** 32 3.1415926 * (2 + 7.18281828e-1) * 42 ``` Here is **the complete source code** of the tokenizer and the parser for this language written using `funcparserlib`: ```python from typing import List, Tuple, Union from dataclasses import dataclass from funcparserlib.lexer import make_tokenizer, TokenSpec, Token from funcparserlib.parser import tok, Parser, many, forward_decl, finished @dataclass class BinaryExpr: op: str left: "Expr" right: "Expr" Expr = Union[BinaryExpr, int, float] def tokenize(s: str) -> List[Token]: specs = [ TokenSpec("whitespace", r"\s+"), TokenSpec("float", r"[+\-]?\d+\.\d*([Ee][+\-]?\d+)*"), TokenSpec("int", r"[+\-]?\d+"), TokenSpec("op", r"(\*\*)|[+\-*/()]"), ] tokenizer = make_tokenizer(specs) return [t for t in tokenizer(s) if t.type != "whitespace"] def parse(tokens: List[Token]) -> Expr: int_num = tok("int") >> int float_num = tok("float") >> float number = int_num | float_num expr: Parser[Token, Expr] = forward_decl() parenthesized = -op("(") + expr + -op(")") primary = number | parenthesized power = primary + many(op("**") + primary) >> to_expr term = power + many((op("*") | op("/")) + power) >> to_expr sum = term + many((op("+") | op("-")) + term) >> to_expr expr.define(sum) document = expr + -finished return document.parse(tokens) def op(name: str) -> Parser[Token, str]: return tok("op", name) def to_expr(args: Tuple[Expr, List[Tuple[str, Expr]]]) -> Expr: first, rest = args result = first for op, expr in rest: result = BinaryExpr(op, result, expr) return result ``` Now, consider this numeric expression: `3.1415926 * (2 + 7.18281828e-1) * 42`. Let's `tokenize()` it using the tokenizer we've created with `funcparserlib.lexer`: ``` [ Token('float', '3.1415926'), Token('op', '*'), Token('op', '('), Token('int', '2'), Token('op', '+'), Token('float', '7.18281828e-1'), Token('op', ')'), Token('op', '*'), Token('int', '42'), ] ``` Let's `parse()` these tokens into an expression tree using our parser created with `funcparserlib.parser`: ``` BinaryExpr( op='*', left=BinaryExpr( op='*', left=3.1415926, right=BinaryExpr(op='+', left=2, right=0.718281828), ), right=42, ) ``` Learn how to write this parser using `funcparserlib` in the [Getting Started](https://funcparserlib.pirx.ru/getting-started/) guide! Used By ------- Some open-source projects that use `funcparserlib` as an explicit dependency: * [Hy](https://github.com/hylang/hy), a Lisp dialect that's embedded in Python * 4.2K stars, version `>= 1.0.0a0`, Python 3.7+ * [Splash](https://github.com/scrapinghub/splash), a JavaScript rendering service with HTTP API, by Scrapinghub * 3.6K stars, version `*`. Python 3 in Docker * [graphite-beacon](https://github.com/klen/graphite-beacon), a simple alerting system for Graphite metrics * 459 stars, version `==0.3.6`, Python 2 and 3 * [blockdiag](https://github.com/blockdiag/blockdiag), generates block-diagram image file from spec-text file * 148 stars, version `>= 1.0.0a0`, Python 3.7+ * [kll](https://github.com/kiibohd/kll), Keyboard Layout Language (KLL) compiler * 109 stars, copied source code, Python 3.5+ Next ---- Read the [Getting Started](https://funcparserlib.pirx.ru/getting-started/) guide to start learning `funcparserlib`.