././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1611782043.1266093 rply-0.7.8/0000755000175000017500000000000000000000000013621 5ustar00alexgaynoralexgaynor././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/LICENSE0000644000175000017500000000277700000000000014643 0ustar00alexgaynoralexgaynorCopyright (c) Alex Gaynor and individual contributors. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of rply nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/MANIFEST.in0000644000175000017500000000004300000000000015354 0ustar00alexgaynoralexgaynorinclude README.rst include LICENSE ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1611782043.1266093 rply-0.7.8/PKG-INFO0000644000175000017500000001203600000000000014720 0ustar00alexgaynoralexgaynorMetadata-Version: 1.0 Name: rply Version: 0.7.8 Summary: A pure Python Lex/Yacc that works with RPython Home-page: UNKNOWN Author: Alex Gaynor Author-email: alex.gaynor@gmail.com License: BSD 3-Clause License Description: RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: https://travis-ci.org/alex/rply Welcome to RPLY! A pure Python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. You can find the documentation `online`_. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.7, 3.4+, and PyPy. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ .. _`online`: https://rply.readthedocs.io/ Platform: UNKNOWN ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/README.rst0000644000175000017500000000752500000000000015321 0ustar00alexgaynoralexgaynorRPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: https://travis-ci.org/alex/rply Welcome to RPLY! A pure Python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. You can find the documentation `online`_. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.7, 3.4+, and PyPy. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ .. _`online`: https://rply.readthedocs.io/ ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1611782043.1236095 rply-0.7.8/rply/0000755000175000017500000000000000000000000014607 5ustar00alexgaynoralexgaynor././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781917.0 rply-0.7.8/rply/__init__.py0000644000175000017500000000045200000000000016721 0ustar00alexgaynoralexgaynorfrom rply.errors import LexingError, ParsingError from rply.lexergenerator import LexerGenerator from rply.parsergenerator import ParserGenerator from rply.token import Token __version__ = '0.7.8' __all__ = [ "LexerGenerator", "LexingError", "ParserGenerator", "ParsingError", "Token", ] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/errors.py0000644000175000017500000000201300000000000016471 0ustar00alexgaynoralexgaynorclass ParserGeneratorError(Exception): pass class LexingError(Exception): """ Raised by a Lexer, if no rule matches. """ def __init__(self, message, source_pos): self.message = message self.source_pos = source_pos def getsourcepos(self): """ Returns the position in the source, at which this error occurred. """ return self.source_pos def __repr__(self): return 'LexingError(%r, %r)' % (self.message, self.source_pos) class ParsingError(Exception): """ Raised by a Parser, if no production rule can be applied. """ def __init__(self, message, source_pos): self.message = message self.source_pos = source_pos def getsourcepos(self): """ Returns the position in the source, at which this error occurred. """ return self.source_pos def __repr__(self): return 'ParsingError(%r, %r)' % (self.message, self.source_pos) class ParserGeneratorWarning(Warning): pass ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/grammar.py0000644000175000017500000001574000000000000016616 0ustar00alexgaynoralexgaynorfrom rply.errors import ParserGeneratorError from rply.utils import iteritems def rightmost_terminal(symbols, terminals): for sym in reversed(symbols): if sym in terminals: return sym return None class Grammar(object): def __init__(self, terminals): # A list of all the productions self.productions = [None] # A dictionary mapping the names of non-terminals to a list of all # productions of that nonterminal self.prod_names = {} # A dictionary mapping the names of terminals to a list of the rules # where they are used self.terminals = dict((t, []) for t in terminals) self.terminals["error"] = [] # A dictionary mapping names of nonterminals to a list of rule numbers # where they are used self.nonterminals = {} self.first = {} self.follow = {} self.precedence = {} self.start = None def add_production(self, prod_name, syms, func, precedence): if prod_name in self.terminals: raise ParserGeneratorError("Illegal rule name %r" % prod_name) if precedence is None: precname = rightmost_terminal(syms, self.terminals) prod_prec = self.precedence.get(precname, ("right", 0)) else: try: prod_prec = self.precedence[precedence] except KeyError: raise ParserGeneratorError( "Precedence %r doesn't exist" % precedence ) pnumber = len(self.productions) self.nonterminals.setdefault(prod_name, []) for t in syms: if t in self.terminals: self.terminals[t].append(pnumber) else: self.nonterminals.setdefault(t, []).append(pnumber) p = Production(pnumber, prod_name, syms, prod_prec, func) self.productions.append(p) self.prod_names.setdefault(prod_name, []).append(p) def set_precedence(self, term, assoc, level): if term in self.precedence: raise ParserGeneratorError( "Precedence already specified for %s" % term ) if assoc not in ["left", "right", "nonassoc"]: raise ParserGeneratorError( "Precedence must be one of left, right, nonassoc; not %s" % ( assoc ) ) self.precedence[term] = (assoc, level) def set_start(self): start = self.productions[1].name self.productions[0] = Production(0, "S'", [start], ("right", 0), None) self.nonterminals[start].append(0) self.start = start def unused_terminals(self): return [ t for t, prods in iteritems(self.terminals) if not prods and t != "error" ] def unused_productions(self): return [p for p, prods in iteritems(self.nonterminals) if not prods] def build_lritems(self): """ Walks the list of productions and builds a complete set of the LR items. """ for p in self.productions: lastlri = p i = 0 lr_items = [] while True: if i > p.getlength(): lri = None else: try: before = p.prod[i - 1] except IndexError: before = None try: after = self.prod_names[p.prod[i]] except (IndexError, KeyError): after = [] lri = LRItem(p, i, before, after) lastlri.lr_next = lri if lri is None: break lr_items.append(lri) lastlri = lri i += 1 p.lr_items = lr_items def _first(self, beta): result = [] for x in beta: x_produces_empty = False for f in self.first[x]: if f == "": x_produces_empty = True else: if f not in result: result.append(f) if not x_produces_empty: break else: result.append("") return result def compute_first(self): for t in self.terminals: self.first[t] = [t] self.first["$end"] = ["$end"] for n in self.nonterminals: self.first[n] = [] changed = True while changed: changed = False for n in self.nonterminals: for p in self.prod_names[n]: for f in self._first(p.prod): if f not in self.first[n]: self.first[n].append(f) changed = True def compute_follow(self): for k in self.nonterminals: self.follow[k] = [] start = self.start self.follow[start] = ["$end"] added = True while added: added = False for p in self.productions[1:]: for i, B in enumerate(p.prod): if B in self.nonterminals: fst = self._first(p.prod[i + 1:]) has_empty = False for f in fst: if f != "" and f not in self.follow[B]: self.follow[B].append(f) added = True if f == "": has_empty = True if has_empty or i == (len(p.prod) - 1): for f in self.follow[p.name]: if f not in self.follow[B]: self.follow[B].append(f) added = True class Production(object): def __init__(self, num, name, prod, precedence, func): self.name = name self.prod = prod self.number = num self.func = func self.prec = precedence self.unique_syms = [] for s in self.prod: if s not in self.unique_syms: self.unique_syms.append(s) self.lr_items = [] self.lr_next = None self.lr0_added = 0 self.reduced = 0 def __repr__(self): return "Production(%s -> %s)" % (self.name, " ".join(self.prod)) def getlength(self): return len(self.prod) class LRItem(object): def __init__(self, p, n, before, after): self.name = p.name self.prod = p.prod[:] self.prod.insert(n, ".") self.number = p.number self.lr_index = n self.lookaheads = {} self.unique_syms = p.unique_syms self.lr_before = before self.lr_after = after def __repr__(self): return "LRItem(%s -> %s)" % (self.name, " ".join(self.prod)) def getlength(self): return len(self.prod) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/lexer.py0000644000175000017500000000334100000000000016301 0ustar00alexgaynoralexgaynorfrom rply.errors import LexingError from rply.token import SourcePosition, Token class Lexer(object): def __init__(self, rules, ignore_rules): self.rules = rules self.ignore_rules = ignore_rules def lex(self, s): return LexerStream(self, s) class LexerStream(object): def __init__(self, lexer, s): self.lexer = lexer self.s = s self.idx = 0 self._lineno = 1 self._colno = 1 def __iter__(self): return self def _update_pos(self, match): self.idx = match.end self._lineno += self.s.count("\n", match.start, match.end) last_nl = self.s.rfind("\n", 0, match.start) if last_nl < 0: return match.start + 1 else: return match.start - last_nl def next(self): while True: if self.idx >= len(self.s): raise StopIteration for rule in self.lexer.ignore_rules: match = rule.matches(self.s, self.idx) if match: self._update_pos(match) break else: break for rule in self.lexer.rules: match = rule.matches(self.s, self.idx) if match: lineno = self._lineno self._colno = self._update_pos(match) source_pos = SourcePosition(match.start, lineno, self._colno) token = Token( rule.name, self.s[match.start:match.end], source_pos ) return token else: raise LexingError(None, SourcePosition( self.idx, self._lineno, self._colno)) def __next__(self): return self.next() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/lexergenerator.py0000644000175000017500000000613500000000000020214 0ustar00alexgaynoralexgaynorimport re try: import rpython from rpython.rlib.objectmodel import we_are_translated from rpython.rlib.rsre import rsre_core from rpython.rlib.rsre.rpy import get_code except ImportError: rpython = None def we_are_translated(): return False from rply.lexer import Lexer class Rule(object): _attrs_ = ['name', 'flags', '_pattern'] def __init__(self, name, pattern, flags=0): self.name = name self.re = re.compile(pattern, flags=flags) if rpython: self._pattern = get_code(pattern, flags) def _freeze_(self): return True def matches(self, s, pos): if not we_are_translated(): m = self.re.match(s, pos) return Match(*m.span(0)) if m is not None else None else: assert pos >= 0 ctx = rsre_core.StrMatchContext(s, pos, len(s)) matched = rsre_core.match_context(ctx, self._pattern) if matched: return Match(ctx.match_start, ctx.match_end) else: return None class Match(object): _attrs_ = ["start", "end"] def __init__(self, start, end): self.start = start self.end = end class LexerGenerator(object): r""" A LexerGenerator represents a set of rules that match pieces of text that should either be turned into tokens or ignored by the lexer. Rules are added using the :meth:`add` and :meth:`ignore` methods: >>> from rply import LexerGenerator >>> lg = LexerGenerator() >>> lg.add('NUMBER', r'\d+') >>> lg.add('ADD', r'\+') >>> lg.ignore(r'\s+') The rules are passed to :func:`re.compile`. If you need additional flags, e.g. :const:`re.DOTALL`, you can pass them to :meth:`add` and :meth:`ignore` as an additional optional parameter: >>> import re >>> lg.add('ALL', r'.*', flags=re.DOTALL) You can then build a lexer with which you can lex a string to produce an iterator yielding tokens: >>> lexer = lg.build() >>> iterator = lexer.lex('1 + 1') >>> iterator.next() Token('NUMBER', '1') >>> iterator.next() Token('ADD', '+') >>> iterator.next() Token('NUMBER', '1') >>> iterator.next() Traceback (most recent call last): ... StopIteration """ def __init__(self): self.rules = [] self.ignore_rules = [] def add(self, name, pattern, flags=0): """ Adds a rule with the given `name` and `pattern`. In case of ambiguity, the first rule added wins. """ self.rules.append(Rule(name, pattern, flags=flags)) def ignore(self, pattern, flags=0): """ Adds a rule whose matched value will be ignored. Ignored rules will be matched before regular ones. """ self.ignore_rules.append(Rule("", pattern, flags=flags)) def build(self): """ Returns a lexer instance, which provides a `lex` method that must be called with a string and returns an iterator yielding :class:`~rply.Token` instances. """ return Lexer(self.rules, self.ignore_rules) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/parser.py0000644000175000017500000000570600000000000016465 0ustar00alexgaynoralexgaynorfrom rply.errors import ParsingError class LRParser(object): def __init__(self, lr_table, error_handler): self.lr_table = lr_table self.error_handler = error_handler def parse(self, tokenizer, state=None): from rply.token import Token lookahead = None lookaheadstack = [] statestack = [0] symstack = [Token("$end", "$end")] current_state = 0 while True: if self.lr_table.default_reductions[current_state]: t = self.lr_table.default_reductions[current_state] current_state = self._reduce_production( t, symstack, statestack, state ) continue if lookahead is None: if lookaheadstack: lookahead = lookaheadstack.pop() else: try: lookahead = next(tokenizer) except StopIteration: lookahead = None if lookahead is None: lookahead = Token("$end", "$end") ltype = lookahead.gettokentype() if ltype in self.lr_table.lr_action[current_state]: t = self.lr_table.lr_action[current_state][ltype] if t > 0: statestack.append(t) current_state = t symstack.append(lookahead) lookahead = None continue elif t < 0: current_state = self._reduce_production( t, symstack, statestack, state ) continue else: n = symstack[-1] return n else: # TODO: actual error handling here if self.error_handler is not None: if state is None: self.error_handler(lookahead) else: self.error_handler(state, lookahead) raise AssertionError("For now, error_handler must raise.") else: raise ParsingError(None, lookahead.getsourcepos()) def _reduce_production(self, t, symstack, statestack, state): # reduce a symbol on the stack and emit a production p = self.lr_table.grammar.productions[-t] pname = p.name plen = p.getlength() start = len(symstack) + (-plen - 1) assert start >= 0 targ = symstack[start + 1:] start = len(symstack) + (-plen) assert start >= 0 del symstack[start:] del statestack[start:] if state is None: value = p.func(targ) else: value = p.func(state, targ) symstack.append(value) current_state = self.lr_table.lr_goto[statestack[-1]][pname] statestack.append(current_state) return current_state ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/parsergenerator.py0000644000175000017500000005364700000000000020403 0ustar00alexgaynoralexgaynorimport errno import hashlib import json import os import sys import tempfile import warnings from appdirs import AppDirs from rply.errors import ParserGeneratorError, ParserGeneratorWarning from rply.grammar import Grammar from rply.parser import LRParser from rply.utils import Counter, IdentityDict, iteritems, itervalues LARGE_VALUE = sys.maxsize class ParserGenerator(object): """ A ParserGenerator represents a set of production rules, that define a sequence of terminals and non-terminals to be replaced with a non-terminal, which can be turned into a parser. :param tokens: A list of token (non-terminal) names. :param precedence: A list of tuples defining the order of operation for avoiding ambiguity, consisting of a string defining associativity (left, right or nonassoc) and a list of token names with the same associativity and level of precedence. :param cache_id: A string specifying an ID for caching. """ VERSION = 1 def __init__(self, tokens, precedence=[], cache_id=None): self.tokens = tokens self.productions = [] self.precedence = precedence self.cache_id = cache_id self.error_handler = None def production(self, rule, precedence=None): """ A decorator that defines one or many production rules and registers the decorated function to be called with the terminals and non-terminals matched by those rules. A `rule` should consist of a name defining the non-terminal returned by the decorated function and one or more sequences of pipe-separated non-terminals and terminals that are supposed to be replaced:: replacing_non_terminal : TERMINAL1 non_term1 | TERMINAL2 non_term2 The name of the non-terminal replacing the sequence is on the left, separated from the sequence by a colon. The whitespace around the colon is required. Knowing this we can define productions:: pg = ParserGenerator(['NUMBER', 'ADD']) @pg.production('number : NUMBER') def expr_number(p): return BoxInt(int(p[0].getstr())) @pg.production('expr : number ADD number') def expr_add(p): return BoxInt(p[0].getint() + p[2].getint()) If a state was passed to the parser, the decorated function is additionally called with that state as first argument. """ parts = rule.split() production_name = parts[0] if parts[1] != ":": raise ParserGeneratorError("Expecting :") body = " ".join(parts[2:]) prods = body.split("|") def inner(func): for production in prods: syms = production.split() self.productions.append((production_name, syms, func, precedence)) return func return inner def error(self, func): """ Sets the error handler that is called with the state (if passed to the parser) and the token the parser errored on. Currently error handlers must raise an exception. If an error handler is not defined, a :exc:`rply.ParsingError` will be raised. """ self.error_handler = func return func def compute_grammar_hash(self, g): hasher = hashlib.sha1() hasher.update(g.start.encode()) hasher.update(json.dumps(sorted(g.terminals)).encode()) for term, (assoc, level) in sorted(iteritems(g.precedence)): hasher.update(term.encode()) hasher.update(assoc.encode()) hasher.update(bytes(level)) for p in g.productions: hasher.update(p.name.encode()) hasher.update(json.dumps(p.prec).encode()) hasher.update(json.dumps(p.prod).encode()) return hasher.hexdigest() def serialize_table(self, table): return { "lr_action": table.lr_action, "lr_goto": table.lr_goto, "sr_conflicts": table.sr_conflicts, "rr_conflicts": table.rr_conflicts, "default_reductions": table.default_reductions, "start": table.grammar.start, "terminals": sorted(table.grammar.terminals), "precedence": table.grammar.precedence, "productions": [ (p.name, p.prod, p.prec) for p in table.grammar.productions ], } def data_is_valid(self, g, data): if g.start != data["start"]: return False if sorted(g.terminals) != data["terminals"]: return False if sorted(g.precedence) != sorted(data["precedence"]): return False for key, (assoc, level) in iteritems(g.precedence): if data["precedence"][key] != [assoc, level]: return False if len(g.productions) != len(data["productions"]): return False for p, (name, prod, (assoc, level)) in zip(g.productions, data["productions"]): if p.name != name: return False if p.prod != prod: return False if p.prec != (assoc, level): return False return True def build(self): g = Grammar(self.tokens) for level, (assoc, terms) in enumerate(self.precedence, 1): for term in terms: g.set_precedence(term, assoc, level) for prod_name, syms, func, precedence in self.productions: g.add_production(prod_name, syms, func, precedence) g.set_start() for unused_term in g.unused_terminals(): warnings.warn( "Token %r is unused" % unused_term, ParserGeneratorWarning, stacklevel=2 ) for unused_prod in g.unused_productions(): warnings.warn( "Production %r is not reachable" % unused_prod, ParserGeneratorWarning, stacklevel=2 ) g.build_lritems() g.compute_first() g.compute_follow() table = None if self.cache_id is not None: cache_dir = AppDirs("rply").user_cache_dir cache_file = os.path.join( cache_dir, "%s-%s-%s.json" % ( self.cache_id, self.VERSION, self.compute_grammar_hash(g) ) ) if os.path.exists(cache_file): with open(cache_file) as f: data = json.load(f) if self.data_is_valid(g, data): table = LRTable.from_cache(g, data) if table is None: table = LRTable.from_grammar(g) if self.cache_id is not None: self._write_cache(cache_dir, cache_file, table) if table.sr_conflicts: warnings.warn( "%d shift/reduce conflict%s" % ( len(table.sr_conflicts), "s" if len(table.sr_conflicts) > 1 else "" ), ParserGeneratorWarning, stacklevel=2, ) if table.rr_conflicts: warnings.warn( "%d reduce/reduce conflict%s" % ( len(table.rr_conflicts), "s" if len(table.rr_conflicts) > 1 else "" ), ParserGeneratorWarning, stacklevel=2, ) return LRParser(table, self.error_handler) def _write_cache(self, cache_dir, cache_file, table): if not os.path.exists(cache_dir): try: os.makedirs(cache_dir, mode=0o0700) except OSError as e: if e.errno == errno.EROFS: return raise with tempfile.NamedTemporaryFile(dir=cache_dir, delete=False, mode="w") as f: json.dump(self.serialize_table(table), f) os.rename(f.name, cache_file) def digraph(X, R, FP): N = dict.fromkeys(X, 0) stack = [] F = {} for x in X: if N[x] == 0: traverse(x, N, stack, F, X, R, FP) return F def traverse(x, N, stack, F, X, R, FP): stack.append(x) d = len(stack) N[x] = d F[x] = FP(x) rel = R(x) for y in rel: if N[y] == 0: traverse(y, N, stack, F, X, R, FP) N[x] = min(N[x], N[y]) for a in F.get(y, []): if a not in F[x]: F[x].append(a) if N[x] == d: N[stack[-1]] = LARGE_VALUE F[stack[-1]] = F[x] element = stack.pop() while element != x: N[stack[-1]] = LARGE_VALUE F[stack[-1]] = F[x] element = stack.pop() class LRTable(object): def __init__(self, grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts): self.grammar = grammar self.lr_action = lr_action self.lr_goto = lr_goto self.default_reductions = default_reductions self.sr_conflicts = sr_conflicts self.rr_conflicts = rr_conflicts @classmethod def from_cache(cls, grammar, data): lr_action = [ dict([(str(k), v) for k, v in iteritems(action)]) for action in data["lr_action"] ] lr_goto = [ dict([(str(k), v) for k, v in iteritems(goto)]) for goto in data["lr_goto"] ] return LRTable( grammar, lr_action, lr_goto, data["default_reductions"], data["sr_conflicts"], data["rr_conflicts"] ) @classmethod def from_grammar(cls, grammar): cidhash = IdentityDict() goto_cache = {} add_count = Counter() C = cls.lr0_items(grammar, add_count, cidhash, goto_cache) cls.add_lalr_lookaheads(grammar, C, add_count, cidhash, goto_cache) lr_action = [None] * len(C) lr_goto = [None] * len(C) sr_conflicts = [] rr_conflicts = [] for st, I in enumerate(C): st_action = {} st_actionp = {} st_goto = {} for p in I: if p.getlength() == p.lr_index + 1: if p.name == "S'": # Start symbol. Accept! st_action["$end"] = 0 st_actionp["$end"] = p else: laheads = p.lookaheads[st] for a in laheads: if a in st_action: r = st_action[a] if r > 0: sprec, slevel = grammar.productions[st_actionp[a].number].prec rprec, rlevel = grammar.precedence.get(a, ("right", 0)) if (slevel < rlevel) or (slevel == rlevel and rprec == "left"): st_action[a] = -p.number st_actionp[a] = p if not slevel and not rlevel: sr_conflicts.append((st, repr(a), "reduce")) grammar.productions[p.number].reduced += 1 elif not (slevel == rlevel and rprec == "nonassoc"): if not rlevel: sr_conflicts.append((st, repr(a), "shift")) elif r < 0: oldp = grammar.productions[-r] pp = grammar.productions[p.number] if oldp.number > pp.number: st_action[a] = -p.number st_actionp[a] = p chosenp, rejectp = pp, oldp grammar.productions[p.number].reduced += 1 grammar.productions[oldp.number].reduced -= 1 else: chosenp, rejectp = oldp, pp rr_conflicts.append((st, repr(chosenp), repr(rejectp))) else: raise ParserGeneratorError("Unknown conflict in state %d" % st) else: st_action[a] = -p.number st_actionp[a] = p grammar.productions[p.number].reduced += 1 else: i = p.lr_index a = p.prod[i + 1] if a in grammar.terminals: g = cls.lr0_goto(I, a, add_count, goto_cache) j = cidhash.get(g, -1) if j >= 0: if a in st_action: r = st_action[a] if r > 0: if r != j: raise ParserGeneratorError("Shift/shift conflict in state %d" % st) elif r < 0: rprec, rlevel = grammar.productions[st_actionp[a].number].prec sprec, slevel = grammar.precedence.get(a, ("right", 0)) if (slevel > rlevel) or (slevel == rlevel and rprec == "right"): grammar.productions[st_actionp[a].number].reduced -= 1 st_action[a] = j st_actionp[a] = p if not rlevel: sr_conflicts.append((st, repr(a), "shift")) elif not (slevel == rlevel and rprec == "nonassoc"): if not slevel and not rlevel: sr_conflicts.append((st, repr(a), "reduce")) else: raise ParserGeneratorError("Unknown conflict in state %d" % st) else: st_action[a] = j st_actionp[a] = p nkeys = set() for ii in I: for s in ii.unique_syms: if s in grammar.nonterminals: nkeys.add(s) for n in nkeys: g = cls.lr0_goto(I, n, add_count, goto_cache) j = cidhash.get(g, -1) if j >= 0: st_goto[n] = j lr_action[st] = st_action lr_goto[st] = st_goto default_reductions = [0] * len(lr_action) for state, actions in enumerate(lr_action): actions = set(itervalues(actions)) if len(actions) == 1 and next(iter(actions)) < 0: default_reductions[state] = next(iter(actions)) return LRTable(grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts) @classmethod def lr0_items(cls, grammar, add_count, cidhash, goto_cache): C = [cls.lr0_closure([grammar.productions[0].lr_next], add_count)] for i, I in enumerate(C): cidhash[I] = i i = 0 while i < len(C): I = C[i] i += 1 asyms = set() for ii in I: asyms.update(ii.unique_syms) for x in asyms: g = cls.lr0_goto(I, x, add_count, goto_cache) if not g: continue if g in cidhash: continue cidhash[g] = len(C) C.append(g) return C @classmethod def lr0_closure(cls, I, add_count): add_count.incr() J = I[:] added = True while added: added = False for j in J: for x in j.lr_after: if x.lr0_added == add_count.value: continue J.append(x.lr_next) x.lr0_added = add_count.value added = True return J @classmethod def lr0_goto(cls, I, x, add_count, goto_cache): s = goto_cache.setdefault(x, IdentityDict()) gs = [] for p in I: n = p.lr_next if n and n.lr_before == x: s1 = s.get(n) if not s1: s1 = {} s[n] = s1 gs.append(n) s = s1 g = s.get("$end") if not g: if gs: g = cls.lr0_closure(gs, add_count) s["$end"] = g else: s["$end"] = gs return g @classmethod def add_lalr_lookaheads(cls, grammar, C, add_count, cidhash, goto_cache): nullable = cls.compute_nullable_nonterminals(grammar) trans = cls.find_nonterminal_transitions(grammar, C) readsets = cls.compute_read_sets(grammar, C, trans, nullable, add_count, cidhash, goto_cache) lookd, included = cls.compute_lookback_includes(grammar, C, trans, nullable, add_count, cidhash, goto_cache) followsets = cls.compute_follow_sets(trans, readsets, included) cls.add_lookaheads(lookd, followsets) @classmethod def compute_nullable_nonterminals(cls, grammar): nullable = set() num_nullable = 0 while True: for p in grammar.productions[1:]: if p.getlength() == 0: nullable.add(p.name) continue for t in p.prod: if t not in nullable: break else: nullable.add(p.name) if len(nullable) == num_nullable: break num_nullable = len(nullable) return nullable @classmethod def find_nonterminal_transitions(cls, grammar, C): trans = [] for idx, state in enumerate(C): for p in state: if p.lr_index < p.getlength() - 1: t = (idx, p.prod[p.lr_index + 1]) if t[1] in grammar.nonterminals and t not in trans: trans.append(t) return trans @classmethod def compute_read_sets(cls, grammar, C, ntrans, nullable, add_count, cidhash, goto_cache): return digraph( ntrans, R=lambda x: cls.reads_relation(C, x, nullable, add_count, cidhash, goto_cache), FP=lambda x: cls.dr_relation(grammar, C, x, nullable, add_count, goto_cache) ) @classmethod def compute_follow_sets(cls, ntrans, readsets, includesets): return digraph( ntrans, R=lambda x: includesets.get(x, []), FP=lambda x: readsets[x], ) @classmethod def dr_relation(cls, grammar, C, trans, nullable, add_count, goto_cache): state, N = trans terms = [] g = cls.lr0_goto(C[state], N, add_count, goto_cache) for p in g: if p.lr_index < p.getlength() - 1: a = p.prod[p.lr_index + 1] if a in grammar.terminals and a not in terms: terms.append(a) if state == 0 and N == grammar.productions[0].prod[0]: terms.append("$end") return terms @classmethod def reads_relation(cls, C, trans, empty, add_count, cidhash, goto_cache): rel = [] state, N = trans g = cls.lr0_goto(C[state], N, add_count, goto_cache) j = cidhash.get(g, -1) for p in g: if p.lr_index < p.getlength() - 1: a = p.prod[p.lr_index + 1] if a in empty: rel.append((j, a)) return rel @classmethod def compute_lookback_includes(cls, grammar, C, trans, nullable, add_count, cidhash, goto_cache): lookdict = {} includedict = {} dtrans = dict.fromkeys(trans, 1) for state, N in trans: lookb = [] includes = [] for p in C[state]: if p.name != N: continue lr_index = p.lr_index j = state while lr_index < p.getlength() - 1: lr_index += 1 t = p.prod[lr_index] if (j, t) in dtrans: li = lr_index + 1 while li < p.getlength(): if p.prod[li] in grammar.terminals: break if p.prod[li] not in nullable: break li += 1 else: includes.append((j, t)) g = cls.lr0_goto(C[j], t, add_count, goto_cache) j = cidhash.get(g, -1) for r in C[j]: if r.name != p.name: continue if r.getlength() != p.getlength(): continue i = 0 while i < r.lr_index: if r.prod[i] != p.prod[i + 1]: break i += 1 else: lookb.append((j, r)) for i in includes: includedict.setdefault(i, []).append((state, N)) lookdict[state, N] = lookb return lookdict, includedict @classmethod def add_lookaheads(cls, lookbacks, followset): for trans, lb in iteritems(lookbacks): for state, p in lb: f = followset.get(trans, []) laheads = p.lookaheads.setdefault(state, []) for a in f: if a not in laheads: laheads.append(a) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/token.py0000644000175000017500000000440500000000000016304 0ustar00alexgaynoralexgaynorclass BaseBox(object): """ A base class for polymorphic boxes that wrap parser results. Simply use this as a base class for anything you return in a production function of a parser. This is necessary because RPython unlike Python expects functions to always return objects of the same type. """ _attrs_ = [] class Token(BaseBox): """ Represents a syntactically relevant piece of text. :param name: A string describing the kind of text represented. :param value: The actual text represented. :param source_pos: A :class:`SourcePosition` object representing the position of the first character in the source from which this token was generated. """ def __init__(self, name, value, source_pos=None): self.name = name self.value = value self.source_pos = source_pos def __repr__(self): return "Token(%r, %r)" % (self.name, self.value) def __eq__(self, other): if not isinstance(other, Token): return NotImplemented return self.name == other.name and self.value == other.value def gettokentype(self): """ Returns the type or name of the token. """ return self.name def getsourcepos(self): """ Returns a :class:`SourcePosition` instance, describing the position of this token's first character in the source. """ return self.source_pos def getstr(self): """ Returns the string represented by this token. """ return self.value class SourcePosition(object): """ Represents the position of a character in some source string. :param idx: The index of the character in the source. :param lineno: The number of the line in which the character occurs. :param colno: The number of the column in which the character occurs. The values passed to this object can be retrieved using the identically named attributes. """ def __init__(self, idx, lineno, colno): self.idx = idx self.lineno = lineno self.colno = colno def __repr__(self): return "SourcePosition(idx={0}, lineno={1}, colno={2})".format( self.idx, self.lineno, self.colno ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781877.0 rply-0.7.8/rply/utils.py0000644000175000017500000000230400000000000016320 0ustar00alexgaynoralexgaynorimport sys if sys.version_info >= (3, 3): from collections.abc import MutableMapping else: from collections import MutableMapping class IdentityDict(MutableMapping): def __init__(self): self._contents = {} self._keepalive = [] def __getitem__(self, key): return self._contents[id(key)][1] def __setitem__(self, key, value): idx = len(self._keepalive) self._keepalive.append(key) self._contents[id(key)] = key, value, idx def __delitem__(self, key): del self._contents[id(key)] for idx, obj in enumerate(self._keepalive): if obj is key: del self._keepalive[idx] break def __len__(self): return len(self._contents) def __iter__(self): for key, _, _ in itervalues(self._contents): yield key class Counter(object): def __init__(self): self.value = 0 def incr(self): self.value += 1 if sys.version_info >= (3,): def itervalues(d): return d.values() def iteritems(d): return d.items() else: def itervalues(d): return d.itervalues() def iteritems(d): return d.iteritems() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1611782043.1266093 rply-0.7.8/rply.egg-info/0000755000175000017500000000000000000000000016301 5ustar00alexgaynoralexgaynor././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611782042.0 rply-0.7.8/rply.egg-info/PKG-INFO0000644000175000017500000001203600000000000017400 0ustar00alexgaynoralexgaynorMetadata-Version: 1.0 Name: rply Version: 0.7.8 Summary: A pure Python Lex/Yacc that works with RPython Home-page: UNKNOWN Author: Alex Gaynor Author-email: alex.gaynor@gmail.com License: BSD 3-Clause License Description: RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: https://travis-ci.org/alex/rply Welcome to RPLY! A pure Python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. You can find the documentation `online`_. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.7, 3.4+, and PyPy. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ .. _`online`: https://rply.readthedocs.io/ Platform: UNKNOWN ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611782042.0 rply-0.7.8/rply.egg-info/SOURCES.txt0000644000175000017500000000052400000000000020166 0ustar00alexgaynoralexgaynorLICENSE MANIFEST.in README.rst setup.cfg setup.py rply/__init__.py rply/errors.py rply/grammar.py rply/lexer.py rply/lexergenerator.py rply/parser.py rply/parsergenerator.py rply/token.py rply/utils.py rply.egg-info/PKG-INFO rply.egg-info/SOURCES.txt rply.egg-info/dependency_links.txt rply.egg-info/requires.txt rply.egg-info/top_level.txt././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611782042.0 rply-0.7.8/rply.egg-info/dependency_links.txt0000644000175000017500000000000100000000000022347 0ustar00alexgaynoralexgaynor ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611782042.0 rply-0.7.8/rply.egg-info/requires.txt0000644000175000017500000000001000000000000020670 0ustar00alexgaynoralexgaynorappdirs ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611782042.0 rply-0.7.8/rply.egg-info/top_level.txt0000644000175000017500000000000500000000000021026 0ustar00alexgaynoralexgaynorrply ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1611782043.1276095 rply-0.7.8/setup.cfg0000644000175000017500000000015000000000000015436 0ustar00alexgaynoralexgaynor[metadata] license = BSD 3-Clause License [wheel] universal = 1 [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1611781925.0 rply-0.7.8/setup.py0000644000175000017500000000062500000000000015336 0ustar00alexgaynoralexgaynorfrom setuptools import setup with open("README.rst") as f: readme = f.read() setup( name="rply", description="A pure Python Lex/Yacc that works with RPython", long_description=readme, # duplicated in docs/conf.py and rply/__init__.py version="0.7.8", author="Alex Gaynor", author_email="alex.gaynor@gmail.com", packages=["rply"], install_requires=["appdirs"], )