rply-0.7.1/0000755000076500000240000000000012266335322014144 5ustar alex_gaynorstaff00000000000000rply-0.7.1/LICENSE0000644000076500000240000000277712247135650015167 0ustar alex_gaynorstaff00000000000000Copyright (c) Alex Gaynor and individual contributors. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of rply nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. rply-0.7.1/MANIFEST.in0000644000076500000240000000004312247135650015700 0ustar alex_gaynorstaff00000000000000include README.rst include LICENSE rply-0.7.1/PKG-INFO0000644000076500000240000001163412266335322015246 0ustar alex_gaynorstaff00000000000000Metadata-Version: 1.0 Name: rply Version: 0.7.1 Summary: A pure Python Lex/Yacc that works with RPython Home-page: UNKNOWN Author: Alex Gaynor Author-email: alex.gaynor@gmail.com License: UNKNOWN Description: RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: http://travis-ci.org/alex/rply Welcome to RPLY! A pure python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ Platform: UNKNOWN rply-0.7.1/README.rst0000644000076500000240000000740012247135650015635 0ustar alex_gaynorstaff00000000000000RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: http://travis-ci.org/alex/rply Welcome to RPLY! A pure python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ rply-0.7.1/rply/0000755000076500000240000000000012266335322015132 5ustar alex_gaynorstaff00000000000000rply-0.7.1/rply/__init__.py0000644000076500000240000000036212247141056017242 0ustar alex_gaynorstaff00000000000000from rply.errors import ParsingError from rply.lexergenerator import LexerGenerator from rply.parsergenerator import ParserGenerator from rply.token import Token __all__ = [ "LexerGenerator", "ParserGenerator", "ParsingError", "Token" ] rply-0.7.1/rply/errors.py0000644000076500000240000000077512247135650017032 0ustar alex_gaynorstaff00000000000000class ParserGeneratorError(Exception): pass class LexingError(Exception): def __init__(self, message, source_pos): self.message = message self.source_pos = source_pos def getsourcepos(self): return self.source_pos class ParsingError(Exception): def __init__(self, message, source_pos): self.message = message self.source_pos = source_pos def getsourcepos(self): return self.source_pos class ParserGeneratorWarning(Warning): pass rply-0.7.1/rply/grammar.py0000644000076500000240000001574012247141122017131 0ustar alex_gaynorstaff00000000000000from rply.errors import ParserGeneratorError from rply.utils import iteritems def rightmost_terminal(symbols, terminals): for sym in reversed(symbols): if sym in terminals: return sym return None class Grammar(object): def __init__(self, terminals): # A list of all the productions self.productions = [None] # A dictionary mapping the names of non-terminals to a list of all # productions of that nonterminal self.prod_names = {} # A dictionary mapping the names of terminals to a list of the rules # where they are used self.terminals = dict((t, []) for t in terminals) self.terminals["error"] = [] # A dictionary mapping names of nonterminals to a list of rule numbers # where they are used self.nonterminals = {} self.first = {} self.follow = {} self.precedence = {} self.start = None def add_production(self, prod_name, syms, func, precedence): if prod_name in self.terminals: raise ParserGeneratorError("Illegal rule name %r" % prod_name) if precedence is None: precname = rightmost_terminal(syms, self.terminals) prod_prec = self.precedence.get(precname, ("right", 0)) else: try: prod_prec = self.precedence[precedence] except KeyError: raise ParserGeneratorError( "Precedence %r doesn't exist" % precedence ) pnumber = len(self.productions) self.nonterminals.setdefault(prod_name, []) for t in syms: if t in self.terminals: self.terminals[t].append(pnumber) else: self.nonterminals.setdefault(t, []).append(pnumber) p = Production(pnumber, prod_name, syms, prod_prec, func) self.productions.append(p) self.prod_names.setdefault(prod_name, []).append(p) def set_precedence(self, term, assoc, level): if term in self.precedence: raise ParserGeneratorError( "Precedence already specified for %s" % term ) if assoc not in ["left", "right", "nonassoc"]: raise ParserGeneratorError( "Precedence must be one of left, right, nonassoc; not %s" % ( assoc ) ) self.precedence[term] = (assoc, level) def set_start(self): start = self.productions[1].name self.productions[0] = Production(0, "S'", [start], ("right", 0), None) self.nonterminals[start].append(0) self.start = start def unused_terminals(self): return [ t for t, prods in iteritems(self.terminals) if not prods and t != "error" ] def unused_productions(self): return [p for p, prods in iteritems(self.nonterminals) if not prods] def build_lritems(self): """ Walks the list of productions and builds a complete set of the LR items. """ for p in self.productions: lastlri = p i = 0 lr_items = [] while True: if i > p.getlength(): lri = None else: try: before = p.prod[i - 1] except IndexError: before = None try: after = self.prod_names[p.prod[i]] except (IndexError, KeyError): after = [] lri = LRItem(p, i, before, after) lastlri.lr_next = lri if lri is None: break lr_items.append(lri) lastlri = lri i += 1 p.lr_items = lr_items def _first(self, beta): result = [] for x in beta: x_produces_empty = False for f in self.first[x]: if f == "": x_produces_empty = True else: if f not in result: result.append(f) if not x_produces_empty: break else: result.append("") return result def compute_first(self): for t in self.terminals: self.first[t] = [t] self.first["$end"] = ["$end"] for n in self.nonterminals: self.first[n] = [] changed = True while changed: changed = False for n in self.nonterminals: for p in self.prod_names[n]: for f in self._first(p.prod): if f not in self.first[n]: self.first[n].append(f) changed = True def compute_follow(self): for k in self.nonterminals: self.follow[k] = [] start = self.start self.follow[start] = ["$end"] added = True while added: added = False for p in self.productions[1:]: for i, B in enumerate(p.prod): if B in self.nonterminals: fst = self._first(p.prod[i + 1:]) has_empty = False for f in fst: if f != "" and f not in self.follow[B]: self.follow[B].append(f) added = True if f == "": has_empty = True if has_empty or i == (len(p.prod) - 1): for f in self.follow[p.name]: if f not in self.follow[B]: self.follow[B].append(f) added = True class Production(object): def __init__(self, num, name, prod, precedence, func): self.name = name self.prod = prod self.number = num self.func = func self.prec = precedence self.unique_syms = [] for s in self.prod: if s not in self.unique_syms: self.unique_syms.append(s) self.lr_items = [] self.lr_next = None self.lr0_added = 0 self.reduced = 0 def __repr__(self): return "Production(%s -> %s)" % (self.name, " ".join(self.prod)) def getlength(self): return len(self.prod) class LRItem(object): def __init__(self, p, n, before, after): self.name = p.name self.prod = p.prod[:] self.prod.insert(n, ".") self.number = p.number self.lr_index = n self.lookaheads = {} self.unique_syms = p.unique_syms self.lr_before = before self.lr_after = after def __repr__(self): return "LRItem(%s -> %s)" % (self.name, " ".join(self.prod)) def getlength(self): return len(self.prod) rply-0.7.1/rply/lexer.py0000644000076500000240000000305512247141132016617 0ustar alex_gaynorstaff00000000000000from rply.errors import LexingError from rply.token import SourcePosition, Token class Lexer(object): def __init__(self, rules, ignore_rules): self.rules = rules self.ignore_rules = ignore_rules def lex(self, s): return LexerStream(self, s) class LexerStream(object): def __init__(self, lexer, s): self.lexer = lexer self.s = s self.idx = 0 self._lineno = 1 def __iter__(self): return self def _update_pos(self, match): self.idx = match.end self._lineno += self.s.count("\n", match.start, match.end) last_nl = self.s.rfind("\n", 0, match.start) if last_nl < 0: return match.start + 1 else: return match.start - last_nl def next(self): if self.idx >= len(self.s): raise StopIteration for rule in self.lexer.ignore_rules: match = rule.matches(self.s, self.idx) if match: self._update_pos(match) return self.next() for rule in self.lexer.rules: match = rule.matches(self.s, self.idx) if match: colno = self._update_pos(match) source_pos = SourcePosition(match.start, self._lineno, colno) token = Token( rule.name, self.s[match.start:match.end], source_pos ) return token else: raise LexingError(None, SourcePosition(self.idx, -1, -1)) def __next__(self): return self.next() rply-0.7.1/rply/lexergenerator.py0000644000076500000240000001546412247141275020545 0ustar alex_gaynorstaff00000000000000import re try: import rpython from rpython.annotator import model from rpython.annotator.bookkeeper import getbookkeeper from rpython.rlib.objectmodel import instantiate, hlinvoke from rpython.rlib.rsre import rsre_core from rpython.rlib.rsre.rpy import get_code from rpython.rtyper.annlowlevel import llstr, hlstr from rpython.rtyper.extregistry import ExtRegistryEntry from rpython.rtyper.lltypesystem import lltype from rpython.rtyper.lltypesystem.rlist import FixedSizeListRepr from rpython.rtyper.lltypesystem.rstr import STR, string_repr from rpython.rtyper.rmodel import Repr from rpython.tool.pairtype import pairtype except ImportError: rpython = None from rply.lexer import Lexer class Rule(object): def __init__(self, name, pattern): self.name = name self.re = re.compile(pattern) def _freeze_(self): return True def matches(self, s, pos): m = self.re.match(s, pos) return Match(*m.span(0)) if m is not None else None class Match(object): _attrs_ = ["start", "end"] def __init__(self, start, end): self.start = start self.end = end class LexerGenerator(object): def __init__(self): self.rules = [] self.ignore_rules = [] def add(self, name, pattern): self.rules.append(Rule(name, pattern)) def ignore(self, pattern): self.ignore_rules.append(Rule("", pattern)) def build(self): return Lexer(self.rules, self.ignore_rules) if rpython: class RuleEntry(ExtRegistryEntry): _type_ = Rule def compute_annotation(self, *args): return SomeRule() class SomeRule(model.SomeObject): def rtyper_makekey(self): return (type(self),) def rtyper_makerepr(self, rtyper): return RuleRepr(rtyper) def method_matches(self, s_s, s_pos): assert model.SomeString().contains(s_s) assert model.SomeInteger(nonneg=True).contains(s_pos) bk = getbookkeeper() init_pbc = bk.immutablevalue(Match.__init__) bk.emulate_pbc_call((self, "match_init"), init_pbc, [ model.SomeInstance(bk.getuniqueclassdef(Match)), model.SomeInteger(nonneg=True), model.SomeInteger(nonneg=True) ]) init_pbc = bk.immutablevalue(rsre_core.StrMatchContext.__init__) bk.emulate_pbc_call((self, "str_match_context_init"), init_pbc, [ model.SomeInstance(bk.getuniqueclassdef(rsre_core.StrMatchContext)), bk.newlist(model.SomeInteger(nonneg=True)), model.SomeString(), model.SomeInteger(nonneg=True), model.SomeInteger(nonneg=True), model.SomeInteger(nonneg=True), ]) match_context_pbc = bk.immutablevalue(rsre_core.match_context) bk.emulate_pbc_call((self, "match_context"), match_context_pbc, [ model.SomeInstance(bk.getuniqueclassdef(rsre_core.StrMatchContext)), ]) return model.SomeInstance(getbookkeeper().getuniqueclassdef(Match), can_be_None=True) def getattr(self, s_attr): if s_attr.is_constant() and s_attr.const == "name": return model.SomeString() return super(SomeRule, self).getattr(s_attr) class __extend__(pairtype(SomeRule, SomeRule)): def union(self): return SomeRule() class RuleRepr(Repr): def __init__(self, rtyper): super(RuleRepr, self).__init__() self.ll_rule_cache = {} self.match_init_repr = rtyper.getrepr( rtyper.annotator.bookkeeper.immutablevalue(Match.__init__) ) self.match_context_init_repr = rtyper.getrepr( rtyper.annotator.bookkeeper.immutablevalue(rsre_core.StrMatchContext.__init__) ) self.match_context_repr = rtyper.getrepr( rtyper.annotator.bookkeeper.immutablevalue(rsre_core.match_context) ) list_repr = FixedSizeListRepr(rtyper, rtyper.getrepr(model.SomeInteger(nonneg=True))) list_repr._setup_repr() self.lowleveltype = lltype.Ptr(lltype.GcStruct( "RULE", ("name", lltype.Ptr(STR)), ("code", list_repr.lowleveltype), )) def convert_const(self, rule): if rule not in self.ll_rule_cache: ll_rule = lltype.malloc(self.lowleveltype.TO) ll_rule.name = llstr(rule.name) code = get_code(rule.re.pattern) ll_rule.code = lltype.malloc(self.lowleveltype.TO.code.TO, len(code)) for i, c in enumerate(code): ll_rule.code[i] = c self.ll_rule_cache[rule] = ll_rule return self.ll_rule_cache[rule] def rtype_getattr(self, hop): s_attr = hop.args_s[1] if s_attr.is_constant() and s_attr.const == "name": v_rule = hop.inputarg(self, arg=0) return hop.gendirectcall(LLRule.ll_get_name, v_rule) return super(RuleRepr, self).rtype_getattr(hop) def rtype_method_matches(self, hop): [v_rule, v_s, v_pos] = hop.inputargs(self, string_repr, lltype.Signed) c_MATCHTYPE = hop.inputconst(lltype.Void, Match) c_MATCH_INIT = hop.inputconst(lltype.Void, self.match_init_repr) c_MATCH_CONTEXTTYPE = hop.inputconst(lltype.Void, rsre_core.StrMatchContext) c_MATCH_CONTEXT_INIT = hop.inputconst(lltype.Void, self.match_context_init_repr) c_MATCH_CONTEXT = hop.inputconst(lltype.Void, self.match_context_repr) return hop.gendirectcall( LLRule.ll_matches, c_MATCHTYPE, c_MATCH_INIT, c_MATCH_CONTEXTTYPE, c_MATCH_CONTEXT_INIT, c_MATCH_CONTEXT, v_rule, v_s, v_pos ) class LLRule(object): @staticmethod def ll_get_name(ll_rule): return ll_rule.name @staticmethod def ll_matches(MATCHTYPE, MATCH_INIT, MATCH_CONTEXTTYPE, MATCH_CONTEXT_INIT, MATCH_CONTEXT, ll_rule, s, pos): s = hlstr(s) assert pos >= 0 ctx = instantiate(MATCH_CONTEXTTYPE) hlinvoke( MATCH_CONTEXT_INIT, rsre_core.StrMatchContext.__init__, ctx, ll_rule.code, hlstr(s), pos, len(s), 0 ) matched = hlinvoke(MATCH_CONTEXT, rsre_core.match_context, ctx) if matched: match = instantiate(MATCHTYPE) hlinvoke( MATCH_INIT, Match.__init__, match, ctx.match_start, ctx.match_end ) return match else: return None rply-0.7.1/rply/parser.py0000644000076500000240000000556212247135650017011 0ustar alex_gaynorstaff00000000000000from rply.errors import ParsingError class LRParser(object): def __init__(self, lr_table, error_handler): self.lr_table = lr_table self.error_handler = error_handler def parse(self, tokenizer, state=None): from rply.token import Token lookahead = None lookaheadstack = [] statestack = [0] symstack = [Token("$end", "$end")] current_state = 0 while True: if self.lr_table.default_reductions[current_state]: t = self.lr_table.default_reductions[current_state] current_state = self._reduce_production(t, symstack, statestack, state) continue if lookahead is None: if lookaheadstack: lookahead = lookaheadstack.pop() else: try: lookahead = next(tokenizer) except StopIteration: lookahead = None if lookahead is None: lookahead = Token("$end", "$end") ltype = lookahead.gettokentype() if ltype in self.lr_table.lr_action[current_state]: t = self.lr_table.lr_action[current_state][ltype] if t > 0: statestack.append(t) current_state = t symstack.append(lookahead) lookahead = None continue elif t < 0: current_state = self._reduce_production(t, symstack, statestack, state) continue else: n = symstack[-1] return n else: # TODO: actual error handling here if self.error_handler is not None: if state is None: self.error_handler(lookahead) else: self.error_handler(state, lookahead) raise AssertionError("For now, error_handler must raise.") else: raise ParsingError(None, lookahead.getsourcepos()) def _reduce_production(self, t, symstack, statestack, state): # reduce a symbol on the stack and emit a production p = self.lr_table.grammar.productions[-t] pname = p.name plen = p.getlength() start = len(symstack) + (-plen - 1) assert start >= 0 targ = symstack[start + 1:] start = len(symstack) + (-plen) assert start >= 0 del symstack[start:] del statestack[start:] if state is None: value = p.func(targ) else: value = p.func(state, targ) symstack.append(value) current_state = self.lr_table.lr_goto[statestack[-1]][pname] statestack.append(current_state) return current_state rply-0.7.1/rply/parsergenerator.py0000644000076500000240000004644412266334213020721 0ustar alex_gaynorstaff00000000000000import os import hashlib import json import random import stat import string import sys import tempfile import warnings from rply.errors import ParserGeneratorError, ParserGeneratorWarning from rply.grammar import Grammar from rply.parser import LRParser from rply.utils import IdentityDict, Counter, iteritems, itervalues LARGE_VALUE = sys.maxsize class ParserGenerator(object): VERSION = 1 def __init__(self, tokens, precedence=[], cache_id=None): self.tokens = tokens self.productions = [] self.precedence = precedence if cache_id is None: # This ensures that we always go through the caching code. cache_id = "".join(random.choice(string.ascii_letters) for _ in range(6)) self.cache_id = cache_id self.error_handler = None def production(self, rule, precedence=None): parts = rule.split() production_name = parts[0] if parts[1] != ":": raise ParserGeneratorError("Expecting :") syms = parts[2:] def inner(func): self.productions.append((production_name, syms, func, precedence)) return func return inner def error(self, func): self.error_handler = func return func def compute_grammar_hash(self, g): hasher = hashlib.sha1() hasher.update(g.start.encode()) hasher.update(json.dumps(sorted(g.terminals)).encode()) for term, (assoc, level) in sorted(iteritems(g.precedence)): hasher.update(term.encode()) hasher.update(assoc.encode()) hasher.update(bytes(level)) for p in g.productions: hasher.update(p.name.encode()) hasher.update(json.dumps(p.prec).encode()) hasher.update(json.dumps(p.prod).encode()) return hasher.hexdigest() def serialize_table(self, table): return { "lr_action": table.lr_action, "lr_goto": table.lr_goto, "sr_conflicts": table.sr_conflicts, "rr_conflicts": table.rr_conflicts, "default_reductions": table.default_reductions, "start": table.grammar.start, "terminals": sorted(table.grammar.terminals), "precedence": table.grammar.precedence, "productions": [(p.name, p.prod, p.prec) for p in table.grammar.productions], } def data_is_valid(self, g, data): if g.start != data["start"]: return False if sorted(g.terminals) != data["terminals"]: return False if sorted(g.precedence) != sorted(data["precedence"]): return False for key, (assoc, level) in iteritems(g.precedence): if data["precedence"][key] != [assoc, level]: return False if len(g.productions) != len(data["productions"]): return False for p, (name, prod, (assoc, level)) in zip(g.productions, data["productions"]): if p.name != name: return False if p.prod != prod: return False if p.prec != (assoc, level): return False return True def build(self): g = Grammar(self.tokens) for level, (assoc, terms) in enumerate(self.precedence, 1): for term in terms: g.set_precedence(term, assoc, level) for prod_name, syms, func, precedence in self.productions: g.add_production(prod_name, syms, func, precedence) g.set_start() for unused_term in g.unused_terminals(): warnings.warn( "Token %r is unused" % unused_term, ParserGeneratorWarning, stacklevel=2 ) for unused_prod in g.unused_productions(): warnings.warn( "Production %r is not reachable" % unused_prod, ParserGeneratorWarning, stacklevel=2 ) g.build_lritems() g.compute_first() g.compute_follow() cache_file = os.path.join( tempfile.gettempdir(), "rply-%s-%s-%s-%s.json" % (self.VERSION, os.getuid(), self.cache_id, self.compute_grammar_hash(g)) ) table = None if os.path.exists(cache_file): with open(cache_file) as f: data = json.load(f) stat_result = os.fstat(f.fileno()) if ( stat_result.st_uid == os.getuid() and stat.S_IMODE(stat_result.st_mode) == 0o0600 ): if self.data_is_valid(g, data): table = LRTable.from_cache(g, data) if table is None: table = LRTable.from_grammar(g) fd = os.open(cache_file, os.O_RDWR | os.O_CREAT | os.O_EXCL, 0o0600) with os.fdopen(fd, "w") as f: json.dump(self.serialize_table(table), f) if table.sr_conflicts: warnings.warn( "%d shift/reduce conflict%s" % (len(table.sr_conflicts), "s" if len(table.sr_conflicts) > 1 else ""), ParserGeneratorWarning, stacklevel=2, ) if table.rr_conflicts: warnings.warn( "%d reduce/reduce conflict%s" % (len(table.rr_conflicts), "s" if len(table.rr_conflicts) > 1 else ""), ParserGeneratorWarning, stacklevel=2, ) return LRParser(table, self.error_handler) def digraph(X, R, FP): N = dict.fromkeys(X, 0) stack = [] F = {} for x in X: if N[x] == 0: traverse(x, N, stack, F, X, R, FP) return F def traverse(x, N, stack, F, X, R, FP): stack.append(x) d = len(stack) N[x] = d F[x] = FP(x) rel = R(x) for y in rel: if N[y] == 0: traverse(y, N, stack, F, X, R, FP) N[x] = min(N[x], N[y]) for a in F.get(y, []): if a not in F[x]: F[x].append(a) if N[x] == d: N[stack[-1]] = LARGE_VALUE F[stack[-1]] = F[x] element = stack.pop() while element != x: N[stack[-1]] = LARGE_VALUE F[stack[-1]] = F[x] element = stack.pop() class LRTable(object): def __init__(self, grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts): self.grammar = grammar self.lr_action = lr_action self.lr_goto = lr_goto self.default_reductions = default_reductions self.sr_conflicts = sr_conflicts self.rr_conflicts = rr_conflicts @classmethod def from_cache(cls, grammar, data): lr_action = [ dict([(str(k), v) for k, v in iteritems(action)]) for action in data["lr_action"] ] lr_goto = [ dict([(str(k), v) for k, v in iteritems(goto)]) for goto in data["lr_goto"] ] return LRTable( grammar, lr_action, lr_goto, data["default_reductions"], data["sr_conflicts"], data["rr_conflicts"] ) @classmethod def from_grammar(cls, grammar): cidhash = IdentityDict() goto_cache = {} add_count = Counter() C = cls.lr0_items(grammar, add_count, cidhash, goto_cache) cls.add_lalr_lookaheads(grammar, C, add_count, cidhash, goto_cache) lr_action = [None] * len(C) lr_goto = [None] * len(C) sr_conflicts = [] rr_conflicts = [] for st, I in enumerate(C): st_action = {} st_actionp = {} st_goto = {} for p in I: if p.getlength() == p.lr_index + 1: if p.name == "S'": # Start symbol. Accept! st_action["$end"] = 0 st_actionp["$end"] = p else: laheads = p.lookaheads[st] for a in laheads: if a in st_action: r = st_action[a] if r > 0: sprec, slevel = grammar.productions[st_actionp[a].number].prec rprec, rlevel = grammar.precedence.get(a, ("right", 0)) if (slevel < rlevel) or (slevel == rlevel and rprec == "left"): st_action[a] = -p.number st_actionp[a] = p if not slevel and not rlevel: sr_conflicts.append((st, repr(a), "reduce")) grammar.productions[p.number].reduced += 1 elif not (slevel == rlevel and rprec == "nonassoc"): if not rlevel: sr_conflicts.append((st, repr(a), "shift")) elif r < 0: oldp = grammar.productions[-r] pp = grammar.productions[p.number] if oldp.number > pp.number: st_action[a] = -p.number st_actionp[a] = p chosenp, rejectp = pp, oldp grammar.productions[p.number].reduced += 1 grammar.productions[oldp.number].reduced -= 1 else: chosenp, rejectp = oldp, pp rr_conflicts.append((st, repr(chosenp), repr(rejectp))) else: raise LALRError("Unknown conflict in state %d" % st) else: st_action[a] = -p.number st_actionp[a] = p grammar.productions[p.number].reduced += 1 else: i = p.lr_index a = p.prod[i + 1] if a in grammar.terminals: g = cls.lr0_goto(I, a, add_count, goto_cache) j = cidhash.get(g, -1) if j >= 0: if a in st_action: r = st_action[a] if r > 0: if r != j: raise LALRError("Shift/shift conflict in state %d" % st) elif r < 0: rprec, rlevel = grammar.productions[st_actionp[a].number].prec sprec, slevel = grammar.precedence.get(a, ("right", 0)) if (slevel > rlevel) or (slevel == rlevel and rprec == "right"): grammar.productions[st_actionp[a].number].reduced -= 1 st_action[a] = j st_actionp[a] = p if not rlevel: sr_conflicts.append((st, repr(a), "shift")) elif not (slevel == rlevel and rprec == "nonassoc"): if not slevel and not rlevel: sr_conflicts.append((st, repr(a), "reduce")) else: raise LALRError("Unknown conflict in state %d" % st) else: st_action[a] = j st_actionp[a] = p nkeys = set() for ii in I: for s in ii.unique_syms: if s in grammar.nonterminals: nkeys.add(s) for n in nkeys: g = cls.lr0_goto(I, n, add_count, goto_cache) j = cidhash.get(g, -1) if j >= 0: st_goto[n] = j lr_action[st] = st_action lr_goto[st] = st_goto default_reductions = [0] * len(lr_action) for state, actions in enumerate(lr_action): actions = set(itervalues(actions)) if len(actions) == 1 and next(iter(actions)) < 0: default_reductions[state] = next(iter(actions)) return LRTable(grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts) @classmethod def lr0_items(cls, grammar, add_count, cidhash, goto_cache): C = [cls.lr0_closure([grammar.productions[0].lr_next], add_count)] for i, I in enumerate(C): cidhash[I] = i i = 0 while i < len(C): I = C[i] i += 1 asyms = set() for ii in I: asyms.update(ii.unique_syms) for x in asyms: g = cls.lr0_goto(I, x, add_count, goto_cache) if not g: continue if g in cidhash: continue cidhash[g] = len(C) C.append(g) return C @classmethod def lr0_closure(cls, I, add_count): add_count.incr() J = I[:] added = True while added: added = False for j in J: for x in j.lr_after: if x.lr0_added == add_count.value: continue J.append(x.lr_next) x.lr0_added = add_count.value added = True return J @classmethod def lr0_goto(cls, I, x, add_count, goto_cache): s = goto_cache.setdefault(x, IdentityDict()) gs = [] for p in I: n = p.lr_next if n and n.lr_before == x: s1 = s.get(n) if not s1: s1 = {} s[n] = s1 gs.append(n) s = s1 g = s.get("$end") if not g: if gs: g = cls.lr0_closure(gs, add_count) s["$end"] = g else: s["$end"] = gs return g @classmethod def add_lalr_lookaheads(cls, grammar, C, add_count, cidhash, goto_cache): nullable = cls.compute_nullable_nonterminals(grammar) trans = cls.find_nonterminal_transitions(grammar, C) readsets = cls.compute_read_sets(grammar, C, trans, nullable, add_count, cidhash, goto_cache) lookd, included = cls.compute_lookback_includes(grammar, C, trans, nullable, add_count, cidhash, goto_cache) followsets = cls.compute_follow_sets(trans, readsets, included) cls.add_lookaheads(lookd, followsets) @classmethod def compute_nullable_nonterminals(cls, grammar): nullable = set() num_nullable = 0 while True: for p in grammar.productions[1:]: if p.getlength() == 0: nullable.add(p.name) continue for t in p.prod: if t not in nullable: break else: nullable.add(p.name) if len(nullable) == num_nullable: break num_nullable = len(nullable) return nullable @classmethod def find_nonterminal_transitions(cls, grammar, C): trans = [] for idx, state in enumerate(C): for p in state: if p.lr_index < p.getlength() - 1: t = (idx, p.prod[p.lr_index + 1]) if t[1] in grammar.nonterminals and t not in trans: trans.append(t) return trans @classmethod def compute_read_sets(cls, grammar, C, ntrans, nullable, add_count, cidhash, goto_cache): FP = lambda x: cls.dr_relation(grammar, C, x, nullable, add_count, goto_cache) R = lambda x: cls.reads_relation(C, x, nullable, add_count, cidhash, goto_cache) return digraph(ntrans, R, FP) @classmethod def compute_follow_sets(cls, ntrans, readsets, includesets): FP = lambda x: readsets[x] R = lambda x: includesets.get(x, []) return digraph(ntrans, R, FP) @classmethod def dr_relation(cls, grammar, C, trans, nullable, add_count, goto_cache): state, N = trans terms = [] g = cls.lr0_goto(C[state], N, add_count, goto_cache) for p in g: if p.lr_index < p.getlength() - 1: a = p.prod[p.lr_index + 1] if a in grammar.terminals and a not in terms: terms.append(a) if state == 0 and N == grammar.productions[0].prod[0]: terms.append("$end") return terms @classmethod def reads_relation(cls, C, trans, empty, add_count, cidhash, goto_cache): rel = [] state, N = trans g = cls.lr0_goto(C[state], N, add_count, goto_cache) j = cidhash.get(g, -1) for p in g: if p.lr_index < p.getlength() - 1: a = p.prod[p.lr_index + 1] if a in empty: rel.append((j, a)) return rel @classmethod def compute_lookback_includes(cls, grammar, C, trans, nullable, add_count, cidhash, goto_cache): lookdict = {} includedict = {} dtrans = dict.fromkeys(trans, 1) for state, N in trans: lookb = [] includes = [] for p in C[state]: if p.name != N: continue lr_index = p.lr_index j = state while lr_index < p.getlength() - 1: lr_index += 1 t = p.prod[lr_index] if (j, t) in dtrans: li = lr_index + 1 while li < p.getlength(): if p.prod[li] in grammar.terminals: break if p.prod[li] not in nullable: break li += 1 else: includes.append((j, t)) g = cls.lr0_goto(C[j], t, add_count, goto_cache) j = cidhash.get(g, -1) for r in C[j]: if r.name != p.name: continue if r.getlength() != p.getlength(): continue i = 0 while i < r.lr_index: if r.prod[i] != p.prod[i + 1]: break i += 1 else: lookb.append((j, r)) for i in includes: includedict.setdefault(i, []).append((state, N)) lookdict[state, N] = lookb return lookdict, includedict @classmethod def add_lookaheads(cls, lookbacks, followset): for trans, lb in iteritems(lookbacks): for state, p in lb: f = followset.get(trans, []) laheads = p.lookaheads.setdefault(state, []) for a in f: if a not in laheads: laheads.append(a) rply-0.7.1/rply/token.py0000644000076500000240000000141512247135650016626 0ustar alex_gaynorstaff00000000000000class BaseBox(object): _attrs_ = [] class Token(BaseBox): def __init__(self, name, value, source_pos=None): self.name = name self.value = value self.source_pos = source_pos def __repr__(self): return "Token(%r, %r)" % (self.name, self.value) def __eq__(self, other): if not isinstance(other, Token): return NotImplemented return self.name == other.name and self.value == other.value def gettokentype(self): return self.name def getsourcepos(self): return self.source_pos def getstr(self): return self.value class SourcePosition(object): def __init__(self, idx, lineno, colno): self.idx = idx self.lineno = lineno self.colno = colno rply-0.7.1/rply/utils.py0000644000076500000240000000215312247135650016646 0ustar alex_gaynorstaff00000000000000import sys from collections import MutableMapping class IdentityDict(MutableMapping): def __init__(self): self._contents = {} self._keepalive = [] def __getitem__(self, key): return self._contents[id(key)][1] def __setitem__(self, key, value): idx = len(self._keepalive) self._keepalive.append(key) self._contents[id(key)] = key, value, idx def __delitem__(self, key): del self._contents[id(key)] for idx, obj in enumerate(self._keepalive): if obj is key: del self._keepalive[idx] break def __len__(self): return len(self._contents) def __iter__(self): for key, _, _ in itervalues(self._contents): yield key class Counter(object): def __init__(self): self.value = 0 def incr(self): self.value += 1 if sys.version_info >= (3,): def itervalues(d): return d.values() def iteritems(d): return d.items() else: def itervalues(d): return d.itervalues() def iteritems(d): return d.iteritems() rply-0.7.1/rply.egg-info/0000755000076500000240000000000012266335322016624 5ustar alex_gaynorstaff00000000000000rply-0.7.1/rply.egg-info/dependency_links.txt0000644000076500000240000000000112266335322022672 0ustar alex_gaynorstaff00000000000000 rply-0.7.1/rply.egg-info/PKG-INFO0000644000076500000240000001163412266335322017726 0ustar alex_gaynorstaff00000000000000Metadata-Version: 1.0 Name: rply Version: 0.7.1 Summary: A pure Python Lex/Yacc that works with RPython Home-page: UNKNOWN Author: Alex Gaynor Author-email: alex.gaynor@gmail.com License: UNKNOWN Description: RPLY ==== .. image:: https://secure.travis-ci.org/alex/rply.png :target: http://travis-ci.org/alex/rply Welcome to RPLY! A pure python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support. Basic API: .. code:: python from rply import ParserGenerator, LexerGenerator from rply.token import BaseBox lg = LexerGenerator() # Add takes a rule name, and a regular expression that defines the rule. lg.add("PLUS", r"\+") lg.add("MINUS", r"-") lg.add("NUMBER", r"\d+") lg.ignore(r"\s+") # This is a list of the token names. precedence is an optional list of # tuples which specifies order of operation for avoiding ambiguity. # precedence must be one of "left", "right", "nonassoc". # cache_id is an optional string which specifies an ID to use for # caching. It should *always* be safe to use caching, # RPly will automatically detect when your grammar is # changed and refresh the cache for you. pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"], precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser") @pg.production("main : expr") def main(p): # p is a list, of each of the pieces on the right hand side of the # grammar rule return p[0] @pg.production("expr : expr PLUS expr") @pg.production("expr : expr MINUS expr") def expr_op(p): lhs = p[0].getint() rhs = p[2].getint() if p[1].gettokentype() == "PLUS": return BoxInt(lhs + rhs) elif p[1].gettokentype() == "MINUS": return BoxInt(lhs - rhs) else: raise AssertionError("This is impossible, abort the time machine!") @pg.production("expr : NUMBER") def expr_num(p): return BoxInt(int(p[0].getstr())) lexer = lg.build() parser = pg.build() class BoxInt(BaseBox): def __init__(self, value): self.value = value def getint(self): return self.value Then you can do: .. code:: python parser.parse(lexer.lex("1 + 3 - 2+12-32")) You can also substitute your own lexer. A lexer is an object with a ``next()`` method that returns either the next token in sequence, or ``None`` if the token stream has been exhausted. Why do we have the boxes? ------------------------- In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project. If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method. Error handling -------------- By default, when a parsing error is encountered, an ``rply.ParsingError`` is raised, it has a method ``getsourcepos()``, which returns an ``rply.token.SourcePosition`` object. You may also provide an error handler, which, at the moment, must raise an exception. It receives the ``Token`` object that the parser errored on. .. code:: python pg = ParserGenerator(...) @pg.error def error_handler(token): raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype()) Python compatibility -------------------- RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards. Links ----- * `Source code and issue tracker `_ * `PyPI releases `_ * `Talk at PyCon US 2013: So you want to write an interpreter? `_ Platform: UNKNOWN rply-0.7.1/rply.egg-info/SOURCES.txt0000644000076500000240000000047112266335322020512 0ustar alex_gaynorstaff00000000000000LICENSE MANIFEST.in README.rst setup.cfg setup.py rply/__init__.py rply/errors.py rply/grammar.py rply/lexer.py rply/lexergenerator.py rply/parser.py rply/parsergenerator.py rply/token.py rply/utils.py rply.egg-info/PKG-INFO rply.egg-info/SOURCES.txt rply.egg-info/dependency_links.txt rply.egg-info/top_level.txtrply-0.7.1/rply.egg-info/top_level.txt0000644000076500000240000000000512266335322021351 0ustar alex_gaynorstaff00000000000000rply rply-0.7.1/setup.cfg0000644000076500000240000000012212266335322015760 0ustar alex_gaynorstaff00000000000000[wheel] universal = 1 [egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 rply-0.7.1/setup.py0000644000076500000240000000047512266335167015673 0ustar alex_gaynorstaff00000000000000from setuptools import setup with open("README.rst") as f: readme = f.read() setup( name="rply", description="A pure Python Lex/Yacc that works with RPython", long_description=readme, version="0.7.1", author="Alex Gaynor", author_email="alex.gaynor@gmail.com", packages=["rply"], )