rply-0.7.1/ 0000755 0000765 0000024 00000000000 12266335322 014144 5 ustar alex_gaynor staff 0000000 0000000 rply-0.7.1/LICENSE 0000644 0000765 0000024 00000002777 12247135650 015167 0 ustar alex_gaynor staff 0000000 0000000 Copyright (c) Alex Gaynor and individual contributors.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of rply nor the names of its contributors may be used
to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
rply-0.7.1/MANIFEST.in 0000644 0000765 0000024 00000000043 12247135650 015700 0 ustar alex_gaynor staff 0000000 0000000 include README.rst
include LICENSE
rply-0.7.1/PKG-INFO 0000644 0000765 0000024 00000011634 12266335322 015246 0 ustar alex_gaynor staff 0000000 0000000 Metadata-Version: 1.0
Name: rply
Version: 0.7.1
Summary: A pure Python Lex/Yacc that works with RPython
Home-page: UNKNOWN
Author: Alex Gaynor
Author-email: alex.gaynor@gmail.com
License: UNKNOWN
Description: RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: http://travis-ci.org/alex/rply
Welcome to RPLY! A pure python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker `_
* `PyPI releases `_
* `Talk at PyCon US 2013: So you want to write an interpreter? `_
Platform: UNKNOWN
rply-0.7.1/README.rst 0000644 0000765 0000024 00000007400 12247135650 015635 0 ustar alex_gaynor staff 0000000 0000000 RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: http://travis-ci.org/alex/rply
Welcome to RPLY! A pure python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker `_
* `PyPI releases `_
* `Talk at PyCon US 2013: So you want to write an interpreter? `_
rply-0.7.1/rply/ 0000755 0000765 0000024 00000000000 12266335322 015132 5 ustar alex_gaynor staff 0000000 0000000 rply-0.7.1/rply/__init__.py 0000644 0000765 0000024 00000000362 12247141056 017242 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import ParsingError
from rply.lexergenerator import LexerGenerator
from rply.parsergenerator import ParserGenerator
from rply.token import Token
__all__ = [
"LexerGenerator", "ParserGenerator", "ParsingError", "Token"
]
rply-0.7.1/rply/errors.py 0000644 0000765 0000024 00000000775 12247135650 017032 0 ustar alex_gaynor staff 0000000 0000000 class ParserGeneratorError(Exception):
pass
class LexingError(Exception):
def __init__(self, message, source_pos):
self.message = message
self.source_pos = source_pos
def getsourcepos(self):
return self.source_pos
class ParsingError(Exception):
def __init__(self, message, source_pos):
self.message = message
self.source_pos = source_pos
def getsourcepos(self):
return self.source_pos
class ParserGeneratorWarning(Warning):
pass
rply-0.7.1/rply/grammar.py 0000644 0000765 0000024 00000015740 12247141122 017131 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import ParserGeneratorError
from rply.utils import iteritems
def rightmost_terminal(symbols, terminals):
for sym in reversed(symbols):
if sym in terminals:
return sym
return None
class Grammar(object):
def __init__(self, terminals):
# A list of all the productions
self.productions = [None]
# A dictionary mapping the names of non-terminals to a list of all
# productions of that nonterminal
self.prod_names = {}
# A dictionary mapping the names of terminals to a list of the rules
# where they are used
self.terminals = dict((t, []) for t in terminals)
self.terminals["error"] = []
# A dictionary mapping names of nonterminals to a list of rule numbers
# where they are used
self.nonterminals = {}
self.first = {}
self.follow = {}
self.precedence = {}
self.start = None
def add_production(self, prod_name, syms, func, precedence):
if prod_name in self.terminals:
raise ParserGeneratorError("Illegal rule name %r" % prod_name)
if precedence is None:
precname = rightmost_terminal(syms, self.terminals)
prod_prec = self.precedence.get(precname, ("right", 0))
else:
try:
prod_prec = self.precedence[precedence]
except KeyError:
raise ParserGeneratorError(
"Precedence %r doesn't exist" % precedence
)
pnumber = len(self.productions)
self.nonterminals.setdefault(prod_name, [])
for t in syms:
if t in self.terminals:
self.terminals[t].append(pnumber)
else:
self.nonterminals.setdefault(t, []).append(pnumber)
p = Production(pnumber, prod_name, syms, prod_prec, func)
self.productions.append(p)
self.prod_names.setdefault(prod_name, []).append(p)
def set_precedence(self, term, assoc, level):
if term in self.precedence:
raise ParserGeneratorError(
"Precedence already specified for %s" % term
)
if assoc not in ["left", "right", "nonassoc"]:
raise ParserGeneratorError(
"Precedence must be one of left, right, nonassoc; not %s" % (
assoc
)
)
self.precedence[term] = (assoc, level)
def set_start(self):
start = self.productions[1].name
self.productions[0] = Production(0, "S'", [start], ("right", 0), None)
self.nonterminals[start].append(0)
self.start = start
def unused_terminals(self):
return [
t
for t, prods in iteritems(self.terminals)
if not prods and t != "error"
]
def unused_productions(self):
return [p for p, prods in iteritems(self.nonterminals) if not prods]
def build_lritems(self):
"""
Walks the list of productions and builds a complete set of the LR
items.
"""
for p in self.productions:
lastlri = p
i = 0
lr_items = []
while True:
if i > p.getlength():
lri = None
else:
try:
before = p.prod[i - 1]
except IndexError:
before = None
try:
after = self.prod_names[p.prod[i]]
except (IndexError, KeyError):
after = []
lri = LRItem(p, i, before, after)
lastlri.lr_next = lri
if lri is None:
break
lr_items.append(lri)
lastlri = lri
i += 1
p.lr_items = lr_items
def _first(self, beta):
result = []
for x in beta:
x_produces_empty = False
for f in self.first[x]:
if f == "":
x_produces_empty = True
else:
if f not in result:
result.append(f)
if not x_produces_empty:
break
else:
result.append("")
return result
def compute_first(self):
for t in self.terminals:
self.first[t] = [t]
self.first["$end"] = ["$end"]
for n in self.nonterminals:
self.first[n] = []
changed = True
while changed:
changed = False
for n in self.nonterminals:
for p in self.prod_names[n]:
for f in self._first(p.prod):
if f not in self.first[n]:
self.first[n].append(f)
changed = True
def compute_follow(self):
for k in self.nonterminals:
self.follow[k] = []
start = self.start
self.follow[start] = ["$end"]
added = True
while added:
added = False
for p in self.productions[1:]:
for i, B in enumerate(p.prod):
if B in self.nonterminals:
fst = self._first(p.prod[i + 1:])
has_empty = False
for f in fst:
if f != "" and f not in self.follow[B]:
self.follow[B].append(f)
added = True
if f == "":
has_empty = True
if has_empty or i == (len(p.prod) - 1):
for f in self.follow[p.name]:
if f not in self.follow[B]:
self.follow[B].append(f)
added = True
class Production(object):
def __init__(self, num, name, prod, precedence, func):
self.name = name
self.prod = prod
self.number = num
self.func = func
self.prec = precedence
self.unique_syms = []
for s in self.prod:
if s not in self.unique_syms:
self.unique_syms.append(s)
self.lr_items = []
self.lr_next = None
self.lr0_added = 0
self.reduced = 0
def __repr__(self):
return "Production(%s -> %s)" % (self.name, " ".join(self.prod))
def getlength(self):
return len(self.prod)
class LRItem(object):
def __init__(self, p, n, before, after):
self.name = p.name
self.prod = p.prod[:]
self.prod.insert(n, ".")
self.number = p.number
self.lr_index = n
self.lookaheads = {}
self.unique_syms = p.unique_syms
self.lr_before = before
self.lr_after = after
def __repr__(self):
return "LRItem(%s -> %s)" % (self.name, " ".join(self.prod))
def getlength(self):
return len(self.prod)
rply-0.7.1/rply/lexer.py 0000644 0000765 0000024 00000003055 12247141132 016617 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import LexingError
from rply.token import SourcePosition, Token
class Lexer(object):
def __init__(self, rules, ignore_rules):
self.rules = rules
self.ignore_rules = ignore_rules
def lex(self, s):
return LexerStream(self, s)
class LexerStream(object):
def __init__(self, lexer, s):
self.lexer = lexer
self.s = s
self.idx = 0
self._lineno = 1
def __iter__(self):
return self
def _update_pos(self, match):
self.idx = match.end
self._lineno += self.s.count("\n", match.start, match.end)
last_nl = self.s.rfind("\n", 0, match.start)
if last_nl < 0:
return match.start + 1
else:
return match.start - last_nl
def next(self):
if self.idx >= len(self.s):
raise StopIteration
for rule in self.lexer.ignore_rules:
match = rule.matches(self.s, self.idx)
if match:
self._update_pos(match)
return self.next()
for rule in self.lexer.rules:
match = rule.matches(self.s, self.idx)
if match:
colno = self._update_pos(match)
source_pos = SourcePosition(match.start, self._lineno, colno)
token = Token(
rule.name, self.s[match.start:match.end], source_pos
)
return token
else:
raise LexingError(None, SourcePosition(self.idx, -1, -1))
def __next__(self):
return self.next()
rply-0.7.1/rply/lexergenerator.py 0000644 0000765 0000024 00000015464 12247141275 020545 0 ustar alex_gaynor staff 0000000 0000000 import re
try:
import rpython
from rpython.annotator import model
from rpython.annotator.bookkeeper import getbookkeeper
from rpython.rlib.objectmodel import instantiate, hlinvoke
from rpython.rlib.rsre import rsre_core
from rpython.rlib.rsre.rpy import get_code
from rpython.rtyper.annlowlevel import llstr, hlstr
from rpython.rtyper.extregistry import ExtRegistryEntry
from rpython.rtyper.lltypesystem import lltype
from rpython.rtyper.lltypesystem.rlist import FixedSizeListRepr
from rpython.rtyper.lltypesystem.rstr import STR, string_repr
from rpython.rtyper.rmodel import Repr
from rpython.tool.pairtype import pairtype
except ImportError:
rpython = None
from rply.lexer import Lexer
class Rule(object):
def __init__(self, name, pattern):
self.name = name
self.re = re.compile(pattern)
def _freeze_(self):
return True
def matches(self, s, pos):
m = self.re.match(s, pos)
return Match(*m.span(0)) if m is not None else None
class Match(object):
_attrs_ = ["start", "end"]
def __init__(self, start, end):
self.start = start
self.end = end
class LexerGenerator(object):
def __init__(self):
self.rules = []
self.ignore_rules = []
def add(self, name, pattern):
self.rules.append(Rule(name, pattern))
def ignore(self, pattern):
self.ignore_rules.append(Rule("", pattern))
def build(self):
return Lexer(self.rules, self.ignore_rules)
if rpython:
class RuleEntry(ExtRegistryEntry):
_type_ = Rule
def compute_annotation(self, *args):
return SomeRule()
class SomeRule(model.SomeObject):
def rtyper_makekey(self):
return (type(self),)
def rtyper_makerepr(self, rtyper):
return RuleRepr(rtyper)
def method_matches(self, s_s, s_pos):
assert model.SomeString().contains(s_s)
assert model.SomeInteger(nonneg=True).contains(s_pos)
bk = getbookkeeper()
init_pbc = bk.immutablevalue(Match.__init__)
bk.emulate_pbc_call((self, "match_init"), init_pbc, [
model.SomeInstance(bk.getuniqueclassdef(Match)),
model.SomeInteger(nonneg=True),
model.SomeInteger(nonneg=True)
])
init_pbc = bk.immutablevalue(rsre_core.StrMatchContext.__init__)
bk.emulate_pbc_call((self, "str_match_context_init"), init_pbc, [
model.SomeInstance(bk.getuniqueclassdef(rsre_core.StrMatchContext)),
bk.newlist(model.SomeInteger(nonneg=True)),
model.SomeString(),
model.SomeInteger(nonneg=True),
model.SomeInteger(nonneg=True),
model.SomeInteger(nonneg=True),
])
match_context_pbc = bk.immutablevalue(rsre_core.match_context)
bk.emulate_pbc_call((self, "match_context"), match_context_pbc, [
model.SomeInstance(bk.getuniqueclassdef(rsre_core.StrMatchContext)),
])
return model.SomeInstance(getbookkeeper().getuniqueclassdef(Match), can_be_None=True)
def getattr(self, s_attr):
if s_attr.is_constant() and s_attr.const == "name":
return model.SomeString()
return super(SomeRule, self).getattr(s_attr)
class __extend__(pairtype(SomeRule, SomeRule)):
def union(self):
return SomeRule()
class RuleRepr(Repr):
def __init__(self, rtyper):
super(RuleRepr, self).__init__()
self.ll_rule_cache = {}
self.match_init_repr = rtyper.getrepr(
rtyper.annotator.bookkeeper.immutablevalue(Match.__init__)
)
self.match_context_init_repr = rtyper.getrepr(
rtyper.annotator.bookkeeper.immutablevalue(rsre_core.StrMatchContext.__init__)
)
self.match_context_repr = rtyper.getrepr(
rtyper.annotator.bookkeeper.immutablevalue(rsre_core.match_context)
)
list_repr = FixedSizeListRepr(rtyper, rtyper.getrepr(model.SomeInteger(nonneg=True)))
list_repr._setup_repr()
self.lowleveltype = lltype.Ptr(lltype.GcStruct(
"RULE",
("name", lltype.Ptr(STR)),
("code", list_repr.lowleveltype),
))
def convert_const(self, rule):
if rule not in self.ll_rule_cache:
ll_rule = lltype.malloc(self.lowleveltype.TO)
ll_rule.name = llstr(rule.name)
code = get_code(rule.re.pattern)
ll_rule.code = lltype.malloc(self.lowleveltype.TO.code.TO, len(code))
for i, c in enumerate(code):
ll_rule.code[i] = c
self.ll_rule_cache[rule] = ll_rule
return self.ll_rule_cache[rule]
def rtype_getattr(self, hop):
s_attr = hop.args_s[1]
if s_attr.is_constant() and s_attr.const == "name":
v_rule = hop.inputarg(self, arg=0)
return hop.gendirectcall(LLRule.ll_get_name, v_rule)
return super(RuleRepr, self).rtype_getattr(hop)
def rtype_method_matches(self, hop):
[v_rule, v_s, v_pos] = hop.inputargs(self, string_repr, lltype.Signed)
c_MATCHTYPE = hop.inputconst(lltype.Void, Match)
c_MATCH_INIT = hop.inputconst(lltype.Void, self.match_init_repr)
c_MATCH_CONTEXTTYPE = hop.inputconst(lltype.Void, rsre_core.StrMatchContext)
c_MATCH_CONTEXT_INIT = hop.inputconst(lltype.Void, self.match_context_init_repr)
c_MATCH_CONTEXT = hop.inputconst(lltype.Void, self.match_context_repr)
return hop.gendirectcall(
LLRule.ll_matches,
c_MATCHTYPE, c_MATCH_INIT, c_MATCH_CONTEXTTYPE,
c_MATCH_CONTEXT_INIT, c_MATCH_CONTEXT, v_rule, v_s, v_pos
)
class LLRule(object):
@staticmethod
def ll_get_name(ll_rule):
return ll_rule.name
@staticmethod
def ll_matches(MATCHTYPE, MATCH_INIT, MATCH_CONTEXTTYPE,
MATCH_CONTEXT_INIT, MATCH_CONTEXT, ll_rule, s, pos):
s = hlstr(s)
assert pos >= 0
ctx = instantiate(MATCH_CONTEXTTYPE)
hlinvoke(
MATCH_CONTEXT_INIT, rsre_core.StrMatchContext.__init__,
ctx, ll_rule.code, hlstr(s), pos, len(s), 0
)
matched = hlinvoke(MATCH_CONTEXT, rsre_core.match_context, ctx)
if matched:
match = instantiate(MATCHTYPE)
hlinvoke(
MATCH_INIT, Match.__init__,
match, ctx.match_start, ctx.match_end
)
return match
else:
return None
rply-0.7.1/rply/parser.py 0000644 0000765 0000024 00000005562 12247135650 017011 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import ParsingError
class LRParser(object):
def __init__(self, lr_table, error_handler):
self.lr_table = lr_table
self.error_handler = error_handler
def parse(self, tokenizer, state=None):
from rply.token import Token
lookahead = None
lookaheadstack = []
statestack = [0]
symstack = [Token("$end", "$end")]
current_state = 0
while True:
if self.lr_table.default_reductions[current_state]:
t = self.lr_table.default_reductions[current_state]
current_state = self._reduce_production(t, symstack, statestack, state)
continue
if lookahead is None:
if lookaheadstack:
lookahead = lookaheadstack.pop()
else:
try:
lookahead = next(tokenizer)
except StopIteration:
lookahead = None
if lookahead is None:
lookahead = Token("$end", "$end")
ltype = lookahead.gettokentype()
if ltype in self.lr_table.lr_action[current_state]:
t = self.lr_table.lr_action[current_state][ltype]
if t > 0:
statestack.append(t)
current_state = t
symstack.append(lookahead)
lookahead = None
continue
elif t < 0:
current_state = self._reduce_production(t, symstack, statestack, state)
continue
else:
n = symstack[-1]
return n
else:
# TODO: actual error handling here
if self.error_handler is not None:
if state is None:
self.error_handler(lookahead)
else:
self.error_handler(state, lookahead)
raise AssertionError("For now, error_handler must raise.")
else:
raise ParsingError(None, lookahead.getsourcepos())
def _reduce_production(self, t, symstack, statestack, state):
# reduce a symbol on the stack and emit a production
p = self.lr_table.grammar.productions[-t]
pname = p.name
plen = p.getlength()
start = len(symstack) + (-plen - 1)
assert start >= 0
targ = symstack[start + 1:]
start = len(symstack) + (-plen)
assert start >= 0
del symstack[start:]
del statestack[start:]
if state is None:
value = p.func(targ)
else:
value = p.func(state, targ)
symstack.append(value)
current_state = self.lr_table.lr_goto[statestack[-1]][pname]
statestack.append(current_state)
return current_state
rply-0.7.1/rply/parsergenerator.py 0000644 0000765 0000024 00000046444 12266334213 020721 0 ustar alex_gaynor staff 0000000 0000000 import os
import hashlib
import json
import random
import stat
import string
import sys
import tempfile
import warnings
from rply.errors import ParserGeneratorError, ParserGeneratorWarning
from rply.grammar import Grammar
from rply.parser import LRParser
from rply.utils import IdentityDict, Counter, iteritems, itervalues
LARGE_VALUE = sys.maxsize
class ParserGenerator(object):
VERSION = 1
def __init__(self, tokens, precedence=[], cache_id=None):
self.tokens = tokens
self.productions = []
self.precedence = precedence
if cache_id is None:
# This ensures that we always go through the caching code.
cache_id = "".join(random.choice(string.ascii_letters) for _ in range(6))
self.cache_id = cache_id
self.error_handler = None
def production(self, rule, precedence=None):
parts = rule.split()
production_name = parts[0]
if parts[1] != ":":
raise ParserGeneratorError("Expecting :")
syms = parts[2:]
def inner(func):
self.productions.append((production_name, syms, func, precedence))
return func
return inner
def error(self, func):
self.error_handler = func
return func
def compute_grammar_hash(self, g):
hasher = hashlib.sha1()
hasher.update(g.start.encode())
hasher.update(json.dumps(sorted(g.terminals)).encode())
for term, (assoc, level) in sorted(iteritems(g.precedence)):
hasher.update(term.encode())
hasher.update(assoc.encode())
hasher.update(bytes(level))
for p in g.productions:
hasher.update(p.name.encode())
hasher.update(json.dumps(p.prec).encode())
hasher.update(json.dumps(p.prod).encode())
return hasher.hexdigest()
def serialize_table(self, table):
return {
"lr_action": table.lr_action,
"lr_goto": table.lr_goto,
"sr_conflicts": table.sr_conflicts,
"rr_conflicts": table.rr_conflicts,
"default_reductions": table.default_reductions,
"start": table.grammar.start,
"terminals": sorted(table.grammar.terminals),
"precedence": table.grammar.precedence,
"productions": [(p.name, p.prod, p.prec) for p in table.grammar.productions],
}
def data_is_valid(self, g, data):
if g.start != data["start"]:
return False
if sorted(g.terminals) != data["terminals"]:
return False
if sorted(g.precedence) != sorted(data["precedence"]):
return False
for key, (assoc, level) in iteritems(g.precedence):
if data["precedence"][key] != [assoc, level]:
return False
if len(g.productions) != len(data["productions"]):
return False
for p, (name, prod, (assoc, level)) in zip(g.productions, data["productions"]):
if p.name != name:
return False
if p.prod != prod:
return False
if p.prec != (assoc, level):
return False
return True
def build(self):
g = Grammar(self.tokens)
for level, (assoc, terms) in enumerate(self.precedence, 1):
for term in terms:
g.set_precedence(term, assoc, level)
for prod_name, syms, func, precedence in self.productions:
g.add_production(prod_name, syms, func, precedence)
g.set_start()
for unused_term in g.unused_terminals():
warnings.warn(
"Token %r is unused" % unused_term,
ParserGeneratorWarning,
stacklevel=2
)
for unused_prod in g.unused_productions():
warnings.warn(
"Production %r is not reachable" % unused_prod,
ParserGeneratorWarning,
stacklevel=2
)
g.build_lritems()
g.compute_first()
g.compute_follow()
cache_file = os.path.join(
tempfile.gettempdir(),
"rply-%s-%s-%s-%s.json" % (self.VERSION, os.getuid(), self.cache_id, self.compute_grammar_hash(g))
)
table = None
if os.path.exists(cache_file):
with open(cache_file) as f:
data = json.load(f)
stat_result = os.fstat(f.fileno())
if (
stat_result.st_uid == os.getuid() and
stat.S_IMODE(stat_result.st_mode) == 0o0600
):
if self.data_is_valid(g, data):
table = LRTable.from_cache(g, data)
if table is None:
table = LRTable.from_grammar(g)
fd = os.open(cache_file, os.O_RDWR | os.O_CREAT | os.O_EXCL, 0o0600)
with os.fdopen(fd, "w") as f:
json.dump(self.serialize_table(table), f)
if table.sr_conflicts:
warnings.warn(
"%d shift/reduce conflict%s" % (len(table.sr_conflicts), "s" if len(table.sr_conflicts) > 1 else ""),
ParserGeneratorWarning,
stacklevel=2,
)
if table.rr_conflicts:
warnings.warn(
"%d reduce/reduce conflict%s" % (len(table.rr_conflicts), "s" if len(table.rr_conflicts) > 1 else ""),
ParserGeneratorWarning,
stacklevel=2,
)
return LRParser(table, self.error_handler)
def digraph(X, R, FP):
N = dict.fromkeys(X, 0)
stack = []
F = {}
for x in X:
if N[x] == 0:
traverse(x, N, stack, F, X, R, FP)
return F
def traverse(x, N, stack, F, X, R, FP):
stack.append(x)
d = len(stack)
N[x] = d
F[x] = FP(x)
rel = R(x)
for y in rel:
if N[y] == 0:
traverse(y, N, stack, F, X, R, FP)
N[x] = min(N[x], N[y])
for a in F.get(y, []):
if a not in F[x]:
F[x].append(a)
if N[x] == d:
N[stack[-1]] = LARGE_VALUE
F[stack[-1]] = F[x]
element = stack.pop()
while element != x:
N[stack[-1]] = LARGE_VALUE
F[stack[-1]] = F[x]
element = stack.pop()
class LRTable(object):
def __init__(self, grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts):
self.grammar = grammar
self.lr_action = lr_action
self.lr_goto = lr_goto
self.default_reductions = default_reductions
self.sr_conflicts = sr_conflicts
self.rr_conflicts = rr_conflicts
@classmethod
def from_cache(cls, grammar, data):
lr_action = [
dict([(str(k), v) for k, v in iteritems(action)])
for action in data["lr_action"]
]
lr_goto = [
dict([(str(k), v) for k, v in iteritems(goto)])
for goto in data["lr_goto"]
]
return LRTable(
grammar,
lr_action,
lr_goto,
data["default_reductions"],
data["sr_conflicts"],
data["rr_conflicts"]
)
@classmethod
def from_grammar(cls, grammar):
cidhash = IdentityDict()
goto_cache = {}
add_count = Counter()
C = cls.lr0_items(grammar, add_count, cidhash, goto_cache)
cls.add_lalr_lookaheads(grammar, C, add_count, cidhash, goto_cache)
lr_action = [None] * len(C)
lr_goto = [None] * len(C)
sr_conflicts = []
rr_conflicts = []
for st, I in enumerate(C):
st_action = {}
st_actionp = {}
st_goto = {}
for p in I:
if p.getlength() == p.lr_index + 1:
if p.name == "S'":
# Start symbol. Accept!
st_action["$end"] = 0
st_actionp["$end"] = p
else:
laheads = p.lookaheads[st]
for a in laheads:
if a in st_action:
r = st_action[a]
if r > 0:
sprec, slevel = grammar.productions[st_actionp[a].number].prec
rprec, rlevel = grammar.precedence.get(a, ("right", 0))
if (slevel < rlevel) or (slevel == rlevel and rprec == "left"):
st_action[a] = -p.number
st_actionp[a] = p
if not slevel and not rlevel:
sr_conflicts.append((st, repr(a), "reduce"))
grammar.productions[p.number].reduced += 1
elif not (slevel == rlevel and rprec == "nonassoc"):
if not rlevel:
sr_conflicts.append((st, repr(a), "shift"))
elif r < 0:
oldp = grammar.productions[-r]
pp = grammar.productions[p.number]
if oldp.number > pp.number:
st_action[a] = -p.number
st_actionp[a] = p
chosenp, rejectp = pp, oldp
grammar.productions[p.number].reduced += 1
grammar.productions[oldp.number].reduced -= 1
else:
chosenp, rejectp = oldp, pp
rr_conflicts.append((st, repr(chosenp), repr(rejectp)))
else:
raise LALRError("Unknown conflict in state %d" % st)
else:
st_action[a] = -p.number
st_actionp[a] = p
grammar.productions[p.number].reduced += 1
else:
i = p.lr_index
a = p.prod[i + 1]
if a in grammar.terminals:
g = cls.lr0_goto(I, a, add_count, goto_cache)
j = cidhash.get(g, -1)
if j >= 0:
if a in st_action:
r = st_action[a]
if r > 0:
if r != j:
raise LALRError("Shift/shift conflict in state %d" % st)
elif r < 0:
rprec, rlevel = grammar.productions[st_actionp[a].number].prec
sprec, slevel = grammar.precedence.get(a, ("right", 0))
if (slevel > rlevel) or (slevel == rlevel and rprec == "right"):
grammar.productions[st_actionp[a].number].reduced -= 1
st_action[a] = j
st_actionp[a] = p
if not rlevel:
sr_conflicts.append((st, repr(a), "shift"))
elif not (slevel == rlevel and rprec == "nonassoc"):
if not slevel and not rlevel:
sr_conflicts.append((st, repr(a), "reduce"))
else:
raise LALRError("Unknown conflict in state %d" % st)
else:
st_action[a] = j
st_actionp[a] = p
nkeys = set()
for ii in I:
for s in ii.unique_syms:
if s in grammar.nonterminals:
nkeys.add(s)
for n in nkeys:
g = cls.lr0_goto(I, n, add_count, goto_cache)
j = cidhash.get(g, -1)
if j >= 0:
st_goto[n] = j
lr_action[st] = st_action
lr_goto[st] = st_goto
default_reductions = [0] * len(lr_action)
for state, actions in enumerate(lr_action):
actions = set(itervalues(actions))
if len(actions) == 1 and next(iter(actions)) < 0:
default_reductions[state] = next(iter(actions))
return LRTable(grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts)
@classmethod
def lr0_items(cls, grammar, add_count, cidhash, goto_cache):
C = [cls.lr0_closure([grammar.productions[0].lr_next], add_count)]
for i, I in enumerate(C):
cidhash[I] = i
i = 0
while i < len(C):
I = C[i]
i += 1
asyms = set()
for ii in I:
asyms.update(ii.unique_syms)
for x in asyms:
g = cls.lr0_goto(I, x, add_count, goto_cache)
if not g:
continue
if g in cidhash:
continue
cidhash[g] = len(C)
C.append(g)
return C
@classmethod
def lr0_closure(cls, I, add_count):
add_count.incr()
J = I[:]
added = True
while added:
added = False
for j in J:
for x in j.lr_after:
if x.lr0_added == add_count.value:
continue
J.append(x.lr_next)
x.lr0_added = add_count.value
added = True
return J
@classmethod
def lr0_goto(cls, I, x, add_count, goto_cache):
s = goto_cache.setdefault(x, IdentityDict())
gs = []
for p in I:
n = p.lr_next
if n and n.lr_before == x:
s1 = s.get(n)
if not s1:
s1 = {}
s[n] = s1
gs.append(n)
s = s1
g = s.get("$end")
if not g:
if gs:
g = cls.lr0_closure(gs, add_count)
s["$end"] = g
else:
s["$end"] = gs
return g
@classmethod
def add_lalr_lookaheads(cls, grammar, C, add_count, cidhash, goto_cache):
nullable = cls.compute_nullable_nonterminals(grammar)
trans = cls.find_nonterminal_transitions(grammar, C)
readsets = cls.compute_read_sets(grammar, C, trans, nullable, add_count, cidhash, goto_cache)
lookd, included = cls.compute_lookback_includes(grammar, C, trans, nullable, add_count, cidhash, goto_cache)
followsets = cls.compute_follow_sets(trans, readsets, included)
cls.add_lookaheads(lookd, followsets)
@classmethod
def compute_nullable_nonterminals(cls, grammar):
nullable = set()
num_nullable = 0
while True:
for p in grammar.productions[1:]:
if p.getlength() == 0:
nullable.add(p.name)
continue
for t in p.prod:
if t not in nullable:
break
else:
nullable.add(p.name)
if len(nullable) == num_nullable:
break
num_nullable = len(nullable)
return nullable
@classmethod
def find_nonterminal_transitions(cls, grammar, C):
trans = []
for idx, state in enumerate(C):
for p in state:
if p.lr_index < p.getlength() - 1:
t = (idx, p.prod[p.lr_index + 1])
if t[1] in grammar.nonterminals and t not in trans:
trans.append(t)
return trans
@classmethod
def compute_read_sets(cls, grammar, C, ntrans, nullable, add_count, cidhash, goto_cache):
FP = lambda x: cls.dr_relation(grammar, C, x, nullable, add_count, goto_cache)
R = lambda x: cls.reads_relation(C, x, nullable, add_count, cidhash, goto_cache)
return digraph(ntrans, R, FP)
@classmethod
def compute_follow_sets(cls, ntrans, readsets, includesets):
FP = lambda x: readsets[x]
R = lambda x: includesets.get(x, [])
return digraph(ntrans, R, FP)
@classmethod
def dr_relation(cls, grammar, C, trans, nullable, add_count, goto_cache):
state, N = trans
terms = []
g = cls.lr0_goto(C[state], N, add_count, goto_cache)
for p in g:
if p.lr_index < p.getlength() - 1:
a = p.prod[p.lr_index + 1]
if a in grammar.terminals and a not in terms:
terms.append(a)
if state == 0 and N == grammar.productions[0].prod[0]:
terms.append("$end")
return terms
@classmethod
def reads_relation(cls, C, trans, empty, add_count, cidhash, goto_cache):
rel = []
state, N = trans
g = cls.lr0_goto(C[state], N, add_count, goto_cache)
j = cidhash.get(g, -1)
for p in g:
if p.lr_index < p.getlength() - 1:
a = p.prod[p.lr_index + 1]
if a in empty:
rel.append((j, a))
return rel
@classmethod
def compute_lookback_includes(cls, grammar, C, trans, nullable, add_count, cidhash, goto_cache):
lookdict = {}
includedict = {}
dtrans = dict.fromkeys(trans, 1)
for state, N in trans:
lookb = []
includes = []
for p in C[state]:
if p.name != N:
continue
lr_index = p.lr_index
j = state
while lr_index < p.getlength() - 1:
lr_index += 1
t = p.prod[lr_index]
if (j, t) in dtrans:
li = lr_index + 1
while li < p.getlength():
if p.prod[li] in grammar.terminals:
break
if p.prod[li] not in nullable:
break
li += 1
else:
includes.append((j, t))
g = cls.lr0_goto(C[j], t, add_count, goto_cache)
j = cidhash.get(g, -1)
for r in C[j]:
if r.name != p.name:
continue
if r.getlength() != p.getlength():
continue
i = 0
while i < r.lr_index:
if r.prod[i] != p.prod[i + 1]:
break
i += 1
else:
lookb.append((j, r))
for i in includes:
includedict.setdefault(i, []).append((state, N))
lookdict[state, N] = lookb
return lookdict, includedict
@classmethod
def add_lookaheads(cls, lookbacks, followset):
for trans, lb in iteritems(lookbacks):
for state, p in lb:
f = followset.get(trans, [])
laheads = p.lookaheads.setdefault(state, [])
for a in f:
if a not in laheads:
laheads.append(a)
rply-0.7.1/rply/token.py 0000644 0000765 0000024 00000001415 12247135650 016626 0 ustar alex_gaynor staff 0000000 0000000 class BaseBox(object):
_attrs_ = []
class Token(BaseBox):
def __init__(self, name, value, source_pos=None):
self.name = name
self.value = value
self.source_pos = source_pos
def __repr__(self):
return "Token(%r, %r)" % (self.name, self.value)
def __eq__(self, other):
if not isinstance(other, Token):
return NotImplemented
return self.name == other.name and self.value == other.value
def gettokentype(self):
return self.name
def getsourcepos(self):
return self.source_pos
def getstr(self):
return self.value
class SourcePosition(object):
def __init__(self, idx, lineno, colno):
self.idx = idx
self.lineno = lineno
self.colno = colno
rply-0.7.1/rply/utils.py 0000644 0000765 0000024 00000002153 12247135650 016646 0 ustar alex_gaynor staff 0000000 0000000 import sys
from collections import MutableMapping
class IdentityDict(MutableMapping):
def __init__(self):
self._contents = {}
self._keepalive = []
def __getitem__(self, key):
return self._contents[id(key)][1]
def __setitem__(self, key, value):
idx = len(self._keepalive)
self._keepalive.append(key)
self._contents[id(key)] = key, value, idx
def __delitem__(self, key):
del self._contents[id(key)]
for idx, obj in enumerate(self._keepalive):
if obj is key:
del self._keepalive[idx]
break
def __len__(self):
return len(self._contents)
def __iter__(self):
for key, _, _ in itervalues(self._contents):
yield key
class Counter(object):
def __init__(self):
self.value = 0
def incr(self):
self.value += 1
if sys.version_info >= (3,):
def itervalues(d):
return d.values()
def iteritems(d):
return d.items()
else:
def itervalues(d):
return d.itervalues()
def iteritems(d):
return d.iteritems()
rply-0.7.1/rply.egg-info/ 0000755 0000765 0000024 00000000000 12266335322 016624 5 ustar alex_gaynor staff 0000000 0000000 rply-0.7.1/rply.egg-info/dependency_links.txt 0000644 0000765 0000024 00000000001 12266335322 022672 0 ustar alex_gaynor staff 0000000 0000000
rply-0.7.1/rply.egg-info/PKG-INFO 0000644 0000765 0000024 00000011634 12266335322 017726 0 ustar alex_gaynor staff 0000000 0000000 Metadata-Version: 1.0
Name: rply
Version: 0.7.1
Summary: A pure Python Lex/Yacc that works with RPython
Home-page: UNKNOWN
Author: Alex Gaynor
Author-email: alex.gaynor@gmail.com
License: UNKNOWN
Description: RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: http://travis-ci.org/alex/rply
Welcome to RPLY! A pure python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker `_
* `PyPI releases `_
* `Talk at PyCon US 2013: So you want to write an interpreter? `_
Platform: UNKNOWN
rply-0.7.1/rply.egg-info/SOURCES.txt 0000644 0000765 0000024 00000000471 12266335322 020512 0 ustar alex_gaynor staff 0000000 0000000 LICENSE
MANIFEST.in
README.rst
setup.cfg
setup.py
rply/__init__.py
rply/errors.py
rply/grammar.py
rply/lexer.py
rply/lexergenerator.py
rply/parser.py
rply/parsergenerator.py
rply/token.py
rply/utils.py
rply.egg-info/PKG-INFO
rply.egg-info/SOURCES.txt
rply.egg-info/dependency_links.txt
rply.egg-info/top_level.txt rply-0.7.1/rply.egg-info/top_level.txt 0000644 0000765 0000024 00000000005 12266335322 021351 0 ustar alex_gaynor staff 0000000 0000000 rply
rply-0.7.1/setup.cfg 0000644 0000765 0000024 00000000122 12266335322 015760 0 ustar alex_gaynor staff 0000000 0000000 [wheel]
universal = 1
[egg_info]
tag_build =
tag_date = 0
tag_svn_revision = 0
rply-0.7.1/setup.py 0000644 0000765 0000024 00000000475 12266335167 015673 0 ustar alex_gaynor staff 0000000 0000000 from setuptools import setup
with open("README.rst") as f:
readme = f.read()
setup(
name="rply",
description="A pure Python Lex/Yacc that works with RPython",
long_description=readme,
version="0.7.1",
author="Alex Gaynor",
author_email="alex.gaynor@gmail.com",
packages=["rply"],
)