rply-0.7.7/0000755000076500000240000000000013421457110014144 5ustar alex_gaynorstaff00000000000000rply-0.7.7/PKG-INFO0000644000076500000240000001204313421457110015241 0ustar alex_gaynorstaff00000000000000Metadata-Version: 1.0 Name: rply Version: 0.7.7 Summary: A pure Python Lex/Yacc that works with RPython Home-page: UNKNOWN Author: Alex Gaynor Author-email: alex.gaynor@gmail.com License: BSD 3-Clause License Description: RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: https://travis-ci.org/alex/rply Welcome to RPLY! A pure Python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. You can find the documentation `online`_. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.6, 2.7, 3.4+, and PyPy. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ .. _`online`: https://rply.readthedocs.io/ Platform: UNKNOWN rply-0.7.7/LICENSE0000644000076500000240000000277712571304150015166 0ustar alex_gaynorstaff00000000000000Copyright (c) Alex Gaynor and individual contributors. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of rply nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. rply-0.7.7/rply/0000755000076500000240000000000013421457110015132 5ustar alex_gaynorstaff00000000000000rply-0.7.7/rply/token.py0000644000076500000240000000440512571304150016627 0ustar alex_gaynorstaff00000000000000class BaseBox(object): """ A base class for polymorphic boxes that wrap parser results. Simply use this as a base class for anything you return in a production function of a parser. This is necessary because RPython unlike Python expects functions to always return objects of the same type. """ _attrs_ = [] class Token(BaseBox): """ Represents a syntactically relevant piece of text. :param name: A string describing the kind of text represented. :param value: The actual text represented. :param source_pos: A :class:`SourcePosition` object representing the position of the first character in the source from which this token was generated. """ def __init__(self, name, value, source_pos=None): self.name = name self.value = value self.source_pos = source_pos def __repr__(self): return "Token(%r, %r)" % (self.name, self.value) def __eq__(self, other): if not isinstance(other, Token): return NotImplemented return self.name == other.name and self.value == other.value def gettokentype(self): """ Returns the type or name of the token. """ return self.name def getsourcepos(self): """ Returns a :class:`SourcePosition` instance, describing the position of this token's first character in the source. """ return self.source_pos def getstr(self): """ Returns the string represented by this token. """ return self.value class SourcePosition(object): """ Represents the position of a character in some source string. :param idx: The index of the character in the source. :param lineno: The number of the line in which the character occurs. :param colno: The number of the column in which the character occurs. The values passed to this object can be retrieved using the identically named attributes. """ def __init__(self, idx, lineno, colno): self.idx = idx self.lineno = lineno self.colno = colno def __repr__(self): return "SourcePosition(idx={0}, lineno={1}, colno={2})".format( self.idx, self.lineno, self.colno ) rply-0.7.7/rply/__init__.py0000644000076500000240000000045213421456650017254 0ustar alex_gaynorstaff00000000000000from rply.errors import LexingError, ParsingError from rply.lexergenerator import LexerGenerator from rply.parsergenerator import ParserGenerator from rply.token import Token __version__ = '0.7.7' __all__ = [ "LexerGenerator", "LexingError", "ParserGenerator", "ParsingError", "Token", ] rply-0.7.7/rply/grammar.py0000644000076500000240000001574012571304150017141 0ustar alex_gaynorstaff00000000000000from rply.errors import ParserGeneratorError from rply.utils import iteritems def rightmost_terminal(symbols, terminals): for sym in reversed(symbols): if sym in terminals: return sym return None class Grammar(object): def __init__(self, terminals): # A list of all the productions self.productions = [None] # A dictionary mapping the names of non-terminals to a list of all # productions of that nonterminal self.prod_names = {} # A dictionary mapping the names of terminals to a list of the rules # where they are used self.terminals = dict((t, []) for t in terminals) self.terminals["error"] = [] # A dictionary mapping names of nonterminals to a list of rule numbers # where they are used self.nonterminals = {} self.first = {} self.follow = {} self.precedence = {} self.start = None def add_production(self, prod_name, syms, func, precedence): if prod_name in self.terminals: raise ParserGeneratorError("Illegal rule name %r" % prod_name) if precedence is None: precname = rightmost_terminal(syms, self.terminals) prod_prec = self.precedence.get(precname, ("right", 0)) else: try: prod_prec = self.precedence[precedence] except KeyError: raise ParserGeneratorError( "Precedence %r doesn't exist" % precedence ) pnumber = len(self.productions) self.nonterminals.setdefault(prod_name, []) for t in syms: if t in self.terminals: self.terminals[t].append(pnumber) else: self.nonterminals.setdefault(t, []).append(pnumber) p = Production(pnumber, prod_name, syms, prod_prec, func) self.productions.append(p) self.prod_names.setdefault(prod_name, []).append(p) def set_precedence(self, term, assoc, level): if term in self.precedence: raise ParserGeneratorError( "Precedence already specified for %s" % term ) if assoc not in ["left", "right", "nonassoc"]: raise ParserGeneratorError( "Precedence must be one of left, right, nonassoc; not %s" % ( assoc ) ) self.precedence[term] = (assoc, level) def set_start(self): start = self.productions[1].name self.productions[0] = Production(0, "S'", [start], ("right", 0), None) self.nonterminals[start].append(0) self.start = start def unused_terminals(self): return [ t for t, prods in iteritems(self.terminals) if not prods and t != "error" ] def unused_productions(self): return [p for p, prods in iteritems(self.nonterminals) if not prods] def build_lritems(self): """ Walks the list of productions and builds a complete set of the LR items. """ for p in self.productions: lastlri = p i = 0 lr_items = [] while True: if i > p.getlength(): lri = None else: try: before = p.prod[i - 1] except IndexError: before = None try: after = self.prod_names[p.prod[i]] except (IndexError, KeyError): after = [] lri = LRItem(p, i, before, after) lastlri.lr_next = lri if lri is None: break lr_items.append(lri) lastlri = lri i += 1 p.lr_items = lr_items def _first(self, beta): result = [] for x in beta: x_produces_empty = False for f in self.first[x]: if f == "": x_produces_empty = True else: if f not in result: result.append(f) if not x_produces_empty: break else: result.append("") return result def compute_first(self): for t in self.terminals: self.first[t] = [t] self.first["$end"] = ["$end"] for n in self.nonterminals: self.first[n] = [] changed = True while changed: changed = False for n in self.nonterminals: for p in self.prod_names[n]: for f in self._first(p.prod): if f not in self.first[n]: self.first[n].append(f) changed = True def compute_follow(self): for k in self.nonterminals: self.follow[k] = [] start = self.start self.follow[start] = ["$end"] added = True while added: added = False for p in self.productions[1:]: for i, B in enumerate(p.prod): if B in self.nonterminals: fst = self._first(p.prod[i + 1:]) has_empty = False for f in fst: if f != "" and f not in self.follow[B]: self.follow[B].append(f) added = True if f == "": has_empty = True if has_empty or i == (len(p.prod) - 1): for f in self.follow[p.name]: if f not in self.follow[B]: self.follow[B].append(f) added = True class Production(object): def __init__(self, num, name, prod, precedence, func): self.name = name self.prod = prod self.number = num self.func = func self.prec = precedence self.unique_syms = [] for s in self.prod: if s not in self.unique_syms: self.unique_syms.append(s) self.lr_items = [] self.lr_next = None self.lr0_added = 0 self.reduced = 0 def __repr__(self): return "Production(%s -> %s)" % (self.name, " ".join(self.prod)) def getlength(self): return len(self.prod) class LRItem(object): def __init__(self, p, n, before, after): self.name = p.name self.prod = p.prod[:] self.prod.insert(n, ".") self.number = p.number self.lr_index = n self.lookaheads = {} self.unique_syms = p.unique_syms self.lr_before = before self.lr_after = after def __repr__(self): return "LRItem(%s -> %s)" % (self.name, " ".join(self.prod)) def getlength(self): return len(self.prod) rply-0.7.7/rply/parser.py0000644000076500000240000000570612571304150017010 0ustar alex_gaynorstaff00000000000000from rply.errors import ParsingError class LRParser(object): def __init__(self, lr_table, error_handler): self.lr_table = lr_table self.error_handler = error_handler def parse(self, tokenizer, state=None): from rply.token import Token lookahead = None lookaheadstack = [] statestack = [0] symstack = [Token("$end", "$end")] current_state = 0 while True: if self.lr_table.default_reductions[current_state]: t = self.lr_table.default_reductions[current_state] current_state = self._reduce_production( t, symstack, statestack, state ) continue if lookahead is None: if lookaheadstack: lookahead = lookaheadstack.pop() else: try: lookahead = next(tokenizer) except StopIteration: lookahead = None if lookahead is None: lookahead = Token("$end", "$end") ltype = lookahead.gettokentype() if ltype in self.lr_table.lr_action[current_state]: t = self.lr_table.lr_action[current_state][ltype] if t > 0: statestack.append(t) current_state = t symstack.append(lookahead) lookahead = None continue elif t < 0: current_state = self._reduce_production( t, symstack, statestack, state ) continue else: n = symstack[-1] return n else: # TODO: actual error handling here if self.error_handler is not None: if state is None: self.error_handler(lookahead) else: self.error_handler(state, lookahead) raise AssertionError("For now, error_handler must raise.") else: raise ParsingError(None, lookahead.getsourcepos()) def _reduce_production(self, t, symstack, statestack, state): # reduce a symbol on the stack and emit a production p = self.lr_table.grammar.productions[-t] pname = p.name plen = p.getlength() start = len(symstack) + (-plen - 1) assert start >= 0 targ = symstack[start + 1:] start = len(symstack) + (-plen) assert start >= 0 del symstack[start:] del statestack[start:] if state is None: value = p.func(targ) else: value = p.func(state, targ) symstack.append(value) current_state = self.lr_table.lr_goto[statestack[-1]][pname] statestack.append(current_state) return current_state rply-0.7.7/rply/utils.py0000644000076500000240000000230413421456515016653 0ustar alex_gaynorstaff00000000000000import sys if sys.version_info >= (3, 3): from collections.abc import MutableMapping else: from collections import MutableMapping class IdentityDict(MutableMapping): def __init__(self): self._contents = {} self._keepalive = [] def __getitem__(self, key): return self._contents[id(key)][1] def __setitem__(self, key, value): idx = len(self._keepalive) self._keepalive.append(key) self._contents[id(key)] = key, value, idx def __delitem__(self, key): del self._contents[id(key)] for idx, obj in enumerate(self._keepalive): if obj is key: del self._keepalive[idx] break def __len__(self): return len(self._contents) def __iter__(self): for key, _, _ in itervalues(self._contents): yield key class Counter(object): def __init__(self): self.value = 0 def incr(self): self.value += 1 if sys.version_info >= (3,): def itervalues(d): return d.values() def iteritems(d): return d.items() else: def itervalues(d): return d.itervalues() def iteritems(d): return d.iteritems() rply-0.7.7/rply/lexer.py0000644000076500000240000000323112663451224016631 0ustar alex_gaynorstaff00000000000000from rply.errors import LexingError from rply.token import SourcePosition, Token class Lexer(object): def __init__(self, rules, ignore_rules): self.rules = rules self.ignore_rules = ignore_rules def lex(self, s): return LexerStream(self, s) class LexerStream(object): def __init__(self, lexer, s): self.lexer = lexer self.s = s self.idx = 0 self._lineno = 1 def __iter__(self): return self def _update_pos(self, match): self.idx = match.end self._lineno += self.s.count("\n", match.start, match.end) last_nl = self.s.rfind("\n", 0, match.start) if last_nl < 0: return match.start + 1 else: return match.start - last_nl def next(self): while True: if self.idx >= len(self.s): raise StopIteration for rule in self.lexer.ignore_rules: match = rule.matches(self.s, self.idx) if match: self._update_pos(match) break else: break for rule in self.lexer.rules: match = rule.matches(self.s, self.idx) if match: lineno = self._lineno colno = self._update_pos(match) source_pos = SourcePosition(match.start, lineno, colno) token = Token( rule.name, self.s[match.start:match.end], source_pos ) return token else: raise LexingError(None, SourcePosition(self.idx, -1, -1)) def __next__(self): return self.next() rply-0.7.7/rply/errors.py0000644000076500000240000000201313421456515017024 0ustar alex_gaynorstaff00000000000000class ParserGeneratorError(Exception): pass class LexingError(Exception): """ Raised by a Lexer, if no rule matches. """ def __init__(self, message, source_pos): self.message = message self.source_pos = source_pos def getsourcepos(self): """ Returns the position in the source, at which this error occurred. """ return self.source_pos def __repr__(self): return 'LexingError(%r, %r)' % (self.message, self.source_pos) class ParsingError(Exception): """ Raised by a Parser, if no production rule can be applied. """ def __init__(self, message, source_pos): self.message = message self.source_pos = source_pos def getsourcepos(self): """ Returns the position in the source, at which this error occurred. """ return self.source_pos def __repr__(self): return 'ParsingError(%r, %r)' % (self.message, self.source_pos) class ParserGeneratorWarning(Warning): pass rply-0.7.7/rply/parsergenerator.py0000644000076500000240000005335713302276472020734 0ustar alex_gaynorstaff00000000000000import errno import hashlib import json import os import sys import tempfile import warnings from appdirs import AppDirs from rply.errors import ParserGeneratorError, ParserGeneratorWarning from rply.grammar import Grammar from rply.parser import LRParser from rply.utils import Counter, IdentityDict, iteritems, itervalues LARGE_VALUE = sys.maxsize class ParserGenerator(object): """ A ParserGenerator represents a set of production rules, that define a sequence of terminals and non-terminals to be replaced with a non-terminal, which can be turned into a parser. :param tokens: A list of token (non-terminal) names. :param precedence: A list of tuples defining the order of operation for avoiding ambiguity, consisting of a string defining associativity (left, right or nonassoc) and a list of token names with the same associativity and level of precedence. :param cache_id: A string specifying an ID for caching. """ VERSION = 1 def __init__(self, tokens, precedence=[], cache_id=None): self.tokens = tokens self.productions = [] self.precedence = precedence self.cache_id = cache_id self.error_handler = None def production(self, rule, precedence=None): """ A decorator that defines a production rule and registers the decorated function to be called with the terminals and non-terminals matched by that rule. A `rule` should consist of a name defining the non-terminal returned by the decorated function and a sequence of non-terminals and terminals that are supposed to be replaced:: replacing_non_terminal : ATERMINAL non_terminal The name of the non-terminal replacing the sequence is on the left, separated from the sequence by a colon. The whitespace around the colon is required. Knowing this we can define productions:: pg = ParserGenerator(['NUMBER', 'ADD']) @pg.production('number : NUMBER') def expr_number(p): return BoxInt(int(p[0].getstr())) @pg.production('expr : number ADD number') def expr_add(p): return BoxInt(p[0].getint() + p[2].getint()) If a state was passed to the parser, the decorated function is additionally called with that state as first argument. """ parts = rule.split() production_name = parts[0] if parts[1] != ":": raise ParserGeneratorError("Expecting :") syms = parts[2:] def inner(func): self.productions.append((production_name, syms, func, precedence)) return func return inner def error(self, func): """ Sets the error handler that is called with the state (if passed to the parser) and the token the parser errored on. Currently error handlers must raise an exception. If an error handler is not defined, a :exc:`rply.ParsingError` will be raised. """ self.error_handler = func return func def compute_grammar_hash(self, g): hasher = hashlib.sha1() hasher.update(g.start.encode()) hasher.update(json.dumps(sorted(g.terminals)).encode()) for term, (assoc, level) in sorted(iteritems(g.precedence)): hasher.update(term.encode()) hasher.update(assoc.encode()) hasher.update(bytes(level)) for p in g.productions: hasher.update(p.name.encode()) hasher.update(json.dumps(p.prec).encode()) hasher.update(json.dumps(p.prod).encode()) return hasher.hexdigest() def serialize_table(self, table): return { "lr_action": table.lr_action, "lr_goto": table.lr_goto, "sr_conflicts": table.sr_conflicts, "rr_conflicts": table.rr_conflicts, "default_reductions": table.default_reductions, "start": table.grammar.start, "terminals": sorted(table.grammar.terminals), "precedence": table.grammar.precedence, "productions": [ (p.name, p.prod, p.prec) for p in table.grammar.productions ], } def data_is_valid(self, g, data): if g.start != data["start"]: return False if sorted(g.terminals) != data["terminals"]: return False if sorted(g.precedence) != sorted(data["precedence"]): return False for key, (assoc, level) in iteritems(g.precedence): if data["precedence"][key] != [assoc, level]: return False if len(g.productions) != len(data["productions"]): return False for p, (name, prod, (assoc, level)) in zip(g.productions, data["productions"]): if p.name != name: return False if p.prod != prod: return False if p.prec != (assoc, level): return False return True def build(self): g = Grammar(self.tokens) for level, (assoc, terms) in enumerate(self.precedence, 1): for term in terms: g.set_precedence(term, assoc, level) for prod_name, syms, func, precedence in self.productions: g.add_production(prod_name, syms, func, precedence) g.set_start() for unused_term in g.unused_terminals(): warnings.warn( "Token %r is unused" % unused_term, ParserGeneratorWarning, stacklevel=2 ) for unused_prod in g.unused_productions(): warnings.warn( "Production %r is not reachable" % unused_prod, ParserGeneratorWarning, stacklevel=2 ) g.build_lritems() g.compute_first() g.compute_follow() table = None if self.cache_id is not None: cache_dir = AppDirs("rply").user_cache_dir cache_file = os.path.join( cache_dir, "%s-%s-%s.json" % ( self.cache_id, self.VERSION, self.compute_grammar_hash(g) ) ) if os.path.exists(cache_file): with open(cache_file) as f: data = json.load(f) if self.data_is_valid(g, data): table = LRTable.from_cache(g, data) if table is None: table = LRTable.from_grammar(g) if self.cache_id is not None: self._write_cache(cache_dir, cache_file, table) if table.sr_conflicts: warnings.warn( "%d shift/reduce conflict%s" % ( len(table.sr_conflicts), "s" if len(table.sr_conflicts) > 1 else "" ), ParserGeneratorWarning, stacklevel=2, ) if table.rr_conflicts: warnings.warn( "%d reduce/reduce conflict%s" % ( len(table.rr_conflicts), "s" if len(table.rr_conflicts) > 1 else "" ), ParserGeneratorWarning, stacklevel=2, ) return LRParser(table, self.error_handler) def _write_cache(self, cache_dir, cache_file, table): if not os.path.exists(cache_dir): try: os.makedirs(cache_dir, mode=0o0700) except OSError as e: if e.errno == errno.EROFS: return raise with tempfile.NamedTemporaryFile(dir=cache_dir, delete=False, mode="w") as f: json.dump(self.serialize_table(table), f) os.rename(f.name, cache_file) def digraph(X, R, FP): N = dict.fromkeys(X, 0) stack = [] F = {} for x in X: if N[x] == 0: traverse(x, N, stack, F, X, R, FP) return F def traverse(x, N, stack, F, X, R, FP): stack.append(x) d = len(stack) N[x] = d F[x] = FP(x) rel = R(x) for y in rel: if N[y] == 0: traverse(y, N, stack, F, X, R, FP) N[x] = min(N[x], N[y]) for a in F.get(y, []): if a not in F[x]: F[x].append(a) if N[x] == d: N[stack[-1]] = LARGE_VALUE F[stack[-1]] = F[x] element = stack.pop() while element != x: N[stack[-1]] = LARGE_VALUE F[stack[-1]] = F[x] element = stack.pop() class LRTable(object): def __init__(self, grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts): self.grammar = grammar self.lr_action = lr_action self.lr_goto = lr_goto self.default_reductions = default_reductions self.sr_conflicts = sr_conflicts self.rr_conflicts = rr_conflicts @classmethod def from_cache(cls, grammar, data): lr_action = [ dict([(str(k), v) for k, v in iteritems(action)]) for action in data["lr_action"] ] lr_goto = [ dict([(str(k), v) for k, v in iteritems(goto)]) for goto in data["lr_goto"] ] return LRTable( grammar, lr_action, lr_goto, data["default_reductions"], data["sr_conflicts"], data["rr_conflicts"] ) @classmethod def from_grammar(cls, grammar): cidhash = IdentityDict() goto_cache = {} add_count = Counter() C = cls.lr0_items(grammar, add_count, cidhash, goto_cache) cls.add_lalr_lookaheads(grammar, C, add_count, cidhash, goto_cache) lr_action = [None] * len(C) lr_goto = [None] * len(C) sr_conflicts = [] rr_conflicts = [] for st, I in enumerate(C): st_action = {} st_actionp = {} st_goto = {} for p in I: if p.getlength() == p.lr_index + 1: if p.name == "S'": # Start symbol. Accept! st_action["$end"] = 0 st_actionp["$end"] = p else: laheads = p.lookaheads[st] for a in laheads: if a in st_action: r = st_action[a] if r > 0: sprec, slevel = grammar.productions[st_actionp[a].number].prec rprec, rlevel = grammar.precedence.get(a, ("right", 0)) if (slevel < rlevel) or (slevel == rlevel and rprec == "left"): st_action[a] = -p.number st_actionp[a] = p if not slevel and not rlevel: sr_conflicts.append((st, repr(a), "reduce")) grammar.productions[p.number].reduced += 1 elif not (slevel == rlevel and rprec == "nonassoc"): if not rlevel: sr_conflicts.append((st, repr(a), "shift")) elif r < 0: oldp = grammar.productions[-r] pp = grammar.productions[p.number] if oldp.number > pp.number: st_action[a] = -p.number st_actionp[a] = p chosenp, rejectp = pp, oldp grammar.productions[p.number].reduced += 1 grammar.productions[oldp.number].reduced -= 1 else: chosenp, rejectp = oldp, pp rr_conflicts.append((st, repr(chosenp), repr(rejectp))) else: raise ParserGeneratorError("Unknown conflict in state %d" % st) else: st_action[a] = -p.number st_actionp[a] = p grammar.productions[p.number].reduced += 1 else: i = p.lr_index a = p.prod[i + 1] if a in grammar.terminals: g = cls.lr0_goto(I, a, add_count, goto_cache) j = cidhash.get(g, -1) if j >= 0: if a in st_action: r = st_action[a] if r > 0: if r != j: raise ParserGeneratorError("Shift/shift conflict in state %d" % st) elif r < 0: rprec, rlevel = grammar.productions[st_actionp[a].number].prec sprec, slevel = grammar.precedence.get(a, ("right", 0)) if (slevel > rlevel) or (slevel == rlevel and rprec == "right"): grammar.productions[st_actionp[a].number].reduced -= 1 st_action[a] = j st_actionp[a] = p if not rlevel: sr_conflicts.append((st, repr(a), "shift")) elif not (slevel == rlevel and rprec == "nonassoc"): if not slevel and not rlevel: sr_conflicts.append((st, repr(a), "reduce")) else: raise ParserGeneratorError("Unknown conflict in state %d" % st) else: st_action[a] = j st_actionp[a] = p nkeys = set() for ii in I: for s in ii.unique_syms: if s in grammar.nonterminals: nkeys.add(s) for n in nkeys: g = cls.lr0_goto(I, n, add_count, goto_cache) j = cidhash.get(g, -1) if j >= 0: st_goto[n] = j lr_action[st] = st_action lr_goto[st] = st_goto default_reductions = [0] * len(lr_action) for state, actions in enumerate(lr_action): actions = set(itervalues(actions)) if len(actions) == 1 and next(iter(actions)) < 0: default_reductions[state] = next(iter(actions)) return LRTable(grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts) @classmethod def lr0_items(cls, grammar, add_count, cidhash, goto_cache): C = [cls.lr0_closure([grammar.productions[0].lr_next], add_count)] for i, I in enumerate(C): cidhash[I] = i i = 0 while i < len(C): I = C[i] i += 1 asyms = set() for ii in I: asyms.update(ii.unique_syms) for x in asyms: g = cls.lr0_goto(I, x, add_count, goto_cache) if not g: continue if g in cidhash: continue cidhash[g] = len(C) C.append(g) return C @classmethod def lr0_closure(cls, I, add_count): add_count.incr() J = I[:] added = True while added: added = False for j in J: for x in j.lr_after: if x.lr0_added == add_count.value: continue J.append(x.lr_next) x.lr0_added = add_count.value added = True return J @classmethod def lr0_goto(cls, I, x, add_count, goto_cache): s = goto_cache.setdefault(x, IdentityDict()) gs = [] for p in I: n = p.lr_next if n and n.lr_before == x: s1 = s.get(n) if not s1: s1 = {} s[n] = s1 gs.append(n) s = s1 g = s.get("$end") if not g: if gs: g = cls.lr0_closure(gs, add_count) s["$end"] = g else: s["$end"] = gs return g @classmethod def add_lalr_lookaheads(cls, grammar, C, add_count, cidhash, goto_cache): nullable = cls.compute_nullable_nonterminals(grammar) trans = cls.find_nonterminal_transitions(grammar, C) readsets = cls.compute_read_sets(grammar, C, trans, nullable, add_count, cidhash, goto_cache) lookd, included = cls.compute_lookback_includes(grammar, C, trans, nullable, add_count, cidhash, goto_cache) followsets = cls.compute_follow_sets(trans, readsets, included) cls.add_lookaheads(lookd, followsets) @classmethod def compute_nullable_nonterminals(cls, grammar): nullable = set() num_nullable = 0 while True: for p in grammar.productions[1:]: if p.getlength() == 0: nullable.add(p.name) continue for t in p.prod: if t not in nullable: break else: nullable.add(p.name) if len(nullable) == num_nullable: break num_nullable = len(nullable) return nullable @classmethod def find_nonterminal_transitions(cls, grammar, C): trans = [] for idx, state in enumerate(C): for p in state: if p.lr_index < p.getlength() - 1: t = (idx, p.prod[p.lr_index + 1]) if t[1] in grammar.nonterminals and t not in trans: trans.append(t) return trans @classmethod def compute_read_sets(cls, grammar, C, ntrans, nullable, add_count, cidhash, goto_cache): return digraph( ntrans, R=lambda x: cls.reads_relation(C, x, nullable, add_count, cidhash, goto_cache), FP=lambda x: cls.dr_relation(grammar, C, x, nullable, add_count, goto_cache) ) @classmethod def compute_follow_sets(cls, ntrans, readsets, includesets): return digraph( ntrans, R=lambda x: includesets.get(x, []), FP=lambda x: readsets[x], ) @classmethod def dr_relation(cls, grammar, C, trans, nullable, add_count, goto_cache): state, N = trans terms = [] g = cls.lr0_goto(C[state], N, add_count, goto_cache) for p in g: if p.lr_index < p.getlength() - 1: a = p.prod[p.lr_index + 1] if a in grammar.terminals and a not in terms: terms.append(a) if state == 0 and N == grammar.productions[0].prod[0]: terms.append("$end") return terms @classmethod def reads_relation(cls, C, trans, empty, add_count, cidhash, goto_cache): rel = [] state, N = trans g = cls.lr0_goto(C[state], N, add_count, goto_cache) j = cidhash.get(g, -1) for p in g: if p.lr_index < p.getlength() - 1: a = p.prod[p.lr_index + 1] if a in empty: rel.append((j, a)) return rel @classmethod def compute_lookback_includes(cls, grammar, C, trans, nullable, add_count, cidhash, goto_cache): lookdict = {} includedict = {} dtrans = dict.fromkeys(trans, 1) for state, N in trans: lookb = [] includes = [] for p in C[state]: if p.name != N: continue lr_index = p.lr_index j = state while lr_index < p.getlength() - 1: lr_index += 1 t = p.prod[lr_index] if (j, t) in dtrans: li = lr_index + 1 while li < p.getlength(): if p.prod[li] in grammar.terminals: break if p.prod[li] not in nullable: break li += 1 else: includes.append((j, t)) g = cls.lr0_goto(C[j], t, add_count, goto_cache) j = cidhash.get(g, -1) for r in C[j]: if r.name != p.name: continue if r.getlength() != p.getlength(): continue i = 0 while i < r.lr_index: if r.prod[i] != p.prod[i + 1]: break i += 1 else: lookb.append((j, r)) for i in includes: includedict.setdefault(i, []).append((state, N)) lookdict[state, N] = lookb return lookdict, includedict @classmethod def add_lookaheads(cls, lookbacks, followset): for trans, lb in iteritems(lookbacks): for state, p in lb: f = followset.get(trans, []) laheads = p.lookaheads.setdefault(state, []) for a in f: if a not in laheads: laheads.append(a) rply-0.7.7/rply/lexergenerator.py0000644000076500000240000000621013302276472020541 0ustar alex_gaynorstaff00000000000000import re try: import rpython from rpython.rlib.objectmodel import we_are_translated from rpython.rlib.rsre import rsre_core from rpython.rlib.rsre.rpy import get_code except ImportError: rpython = None def we_are_translated(): return False from rply.lexer import Lexer class Rule(object): _attrs_ = ['name', 'flags', '_pattern'] def __init__(self, name, pattern, flags=0): self.name = name self.re = re.compile(pattern, flags=flags) if rpython: self.flags = flags self._pattern = get_code(pattern, flags) def _freeze_(self): return True def matches(self, s, pos): if not we_are_translated(): m = self.re.match(s, pos) return Match(*m.span(0)) if m is not None else None else: assert pos >= 0 ctx = rsre_core.StrMatchContext(s, pos, len(s), self.flags) matched = rsre_core.match_context(ctx, self._pattern) if matched: return Match(ctx.match_start, ctx.match_end) else: return None class Match(object): _attrs_ = ["start", "end"] def __init__(self, start, end): self.start = start self.end = end class LexerGenerator(object): r""" A LexerGenerator represents a set of rules that match pieces of text that should either be turned into tokens or ignored by the lexer. Rules are added using the :meth:`add` and :meth:`ignore` methods: >>> from rply import LexerGenerator >>> lg = LexerGenerator() >>> lg.add('NUMBER', r'\d+') >>> lg.add('ADD', r'\+') >>> lg.ignore(r'\s+') The rules are passed to :func:`re.compile`. If you need additional flags, e.g. :const:`re.DOTALL`, you can pass them to :meth:`add` and :meth:`ignore` as an additional optional parameter: >>> import re >>> lg.add('ALL', r'.*', flags=re.DOTALL) You can then build a lexer with which you can lex a string to produce an iterator yielding tokens: >>> lexer = lg.build() >>> iterator = lexer.lex('1 + 1') >>> iterator.next() Token('NUMBER', '1') >>> iterator.next() Token('ADD', '+') >>> iterator.next() Token('NUMBER', '1') >>> iterator.next() Traceback (most recent call last): ... StopIteration """ def __init__(self): self.rules = [] self.ignore_rules = [] def add(self, name, pattern, flags=0): """ Adds a rule with the given `name` and `pattern`. In case of ambiguity, the first rule added wins. """ self.rules.append(Rule(name, pattern, flags=flags)) def ignore(self, pattern, flags=0): """ Adds a rule whose matched value will be ignored. Ignored rules will be matched before regular ones. """ self.ignore_rules.append(Rule("", pattern, flags=flags)) def build(self): """ Returns a lexer instance, which provides a `lex` method that must be called with a string and returns an iterator yielding :class:`~rply.Token` instances. """ return Lexer(self.rules, self.ignore_rules) rply-0.7.7/MANIFEST.in0000644000076500000240000000004312571304150015677 0ustar alex_gaynorstaff00000000000000include README.rst include LICENSE rply-0.7.7/setup.py0000644000076500000240000000062513421456663015675 0ustar alex_gaynorstaff00000000000000from setuptools import setup with open("README.rst") as f: readme = f.read() setup( name="rply", description="A pure Python Lex/Yacc that works with RPython", long_description=readme, # duplicated in docs/conf.py and rply/__init__.py version="0.7.7", author="Alex Gaynor", author_email="alex.gaynor@gmail.com", packages=["rply"], install_requires=["appdirs"], ) rply-0.7.7/rply.egg-info/0000755000076500000240000000000013421457110016624 5ustar alex_gaynorstaff00000000000000rply-0.7.7/rply.egg-info/PKG-INFO0000644000076500000240000001204313421457107017727 0ustar alex_gaynorstaff00000000000000Metadata-Version: 1.0 Name: rply Version: 0.7.7 Summary: A pure Python Lex/Yacc that works with RPython Home-page: UNKNOWN Author: Alex Gaynor Author-email: alex.gaynor@gmail.com License: BSD 3-Clause License Description: RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: https://travis-ci.org/alex/rply Welcome to RPLY! A pure Python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. You can find the documentation `online`_. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.6, 2.7, 3.4+, and PyPy. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ .. _`online`: https://rply.readthedocs.io/ Platform: UNKNOWN rply-0.7.7/rply.egg-info/SOURCES.txt0000644000076500000240000000052413421457107020517 0ustar alex_gaynorstaff00000000000000LICENSE MANIFEST.in README.rst setup.cfg setup.py rply/__init__.py rply/errors.py rply/grammar.py rply/lexer.py rply/lexergenerator.py rply/parser.py rply/parsergenerator.py rply/token.py rply/utils.py rply.egg-info/PKG-INFO rply.egg-info/SOURCES.txt rply.egg-info/dependency_links.txt rply.egg-info/requires.txt rply.egg-info/top_level.txtrply-0.7.7/rply.egg-info/requires.txt0000644000076500000240000000001013421457107021221 0ustar alex_gaynorstaff00000000000000appdirs rply-0.7.7/rply.egg-info/top_level.txt0000644000076500000240000000000513421457107021357 0ustar alex_gaynorstaff00000000000000rply rply-0.7.7/rply.egg-info/dependency_links.txt0000644000076500000240000000000113421457107022700 0ustar alex_gaynorstaff00000000000000 rply-0.7.7/setup.cfg0000644000076500000240000000015013421457110015761 0ustar alex_gaynorstaff00000000000000[metadata] license = BSD 3-Clause License [wheel] universal = 1 [egg_info] tag_build = tag_date = 0 rply-0.7.7/README.rst0000644000076500000240000000753213302276472015652 0ustar alex_gaynorstaff00000000000000RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: https://travis-ci.org/alex/rply Welcome to RPLY! A pure Python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. You can find the documentation `online`_. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.6, 2.7, 3.4+, and PyPy. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ .. _`online`: https://rply.readthedocs.io/