rply-0.7.7/ 0000755 0000765 0000024 00000000000 13421457110 014144 5 ustar alex_gaynor staff 0000000 0000000 rply-0.7.7/PKG-INFO 0000644 0000765 0000024 00000012043 13421457110 015241 0 ustar alex_gaynor staff 0000000 0000000 Metadata-Version: 1.0
Name: rply
Version: 0.7.7
Summary: A pure Python Lex/Yacc that works with RPython
Home-page: UNKNOWN
Author: Alex Gaynor
Author-email: alex.gaynor@gmail.com
License: BSD 3-Clause License
Description: RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: https://travis-ci.org/alex/rply
Welcome to RPLY! A pure Python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
You can find the documentation `online`_.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.4+, and PyPy. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker `_
* `PyPI releases `_
* `Talk at PyCon US 2013: So you want to write an interpreter? `_
.. _`online`: https://rply.readthedocs.io/
Platform: UNKNOWN
rply-0.7.7/LICENSE 0000644 0000765 0000024 00000002777 12571304150 015166 0 ustar alex_gaynor staff 0000000 0000000 Copyright (c) Alex Gaynor and individual contributors.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of rply nor the names of its contributors may be used
to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
rply-0.7.7/rply/ 0000755 0000765 0000024 00000000000 13421457110 015132 5 ustar alex_gaynor staff 0000000 0000000 rply-0.7.7/rply/token.py 0000644 0000765 0000024 00000004405 12571304150 016627 0 ustar alex_gaynor staff 0000000 0000000 class BaseBox(object):
"""
A base class for polymorphic boxes that wrap parser results. Simply use
this as a base class for anything you return in a production function of a
parser. This is necessary because RPython unlike Python expects functions
to always return objects of the same type.
"""
_attrs_ = []
class Token(BaseBox):
"""
Represents a syntactically relevant piece of text.
:param name: A string describing the kind of text represented.
:param value: The actual text represented.
:param source_pos: A :class:`SourcePosition` object representing the
position of the first character in the source from which
this token was generated.
"""
def __init__(self, name, value, source_pos=None):
self.name = name
self.value = value
self.source_pos = source_pos
def __repr__(self):
return "Token(%r, %r)" % (self.name, self.value)
def __eq__(self, other):
if not isinstance(other, Token):
return NotImplemented
return self.name == other.name and self.value == other.value
def gettokentype(self):
"""
Returns the type or name of the token.
"""
return self.name
def getsourcepos(self):
"""
Returns a :class:`SourcePosition` instance, describing the position of
this token's first character in the source.
"""
return self.source_pos
def getstr(self):
"""
Returns the string represented by this token.
"""
return self.value
class SourcePosition(object):
"""
Represents the position of a character in some source string.
:param idx: The index of the character in the source.
:param lineno: The number of the line in which the character occurs.
:param colno: The number of the column in which the character occurs.
The values passed to this object can be retrieved using the identically
named attributes.
"""
def __init__(self, idx, lineno, colno):
self.idx = idx
self.lineno = lineno
self.colno = colno
def __repr__(self):
return "SourcePosition(idx={0}, lineno={1}, colno={2})".format(
self.idx, self.lineno, self.colno
)
rply-0.7.7/rply/__init__.py 0000644 0000765 0000024 00000000452 13421456650 017254 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import LexingError, ParsingError
from rply.lexergenerator import LexerGenerator
from rply.parsergenerator import ParserGenerator
from rply.token import Token
__version__ = '0.7.7'
__all__ = [
"LexerGenerator", "LexingError", "ParserGenerator", "ParsingError",
"Token",
]
rply-0.7.7/rply/grammar.py 0000644 0000765 0000024 00000015740 12571304150 017141 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import ParserGeneratorError
from rply.utils import iteritems
def rightmost_terminal(symbols, terminals):
for sym in reversed(symbols):
if sym in terminals:
return sym
return None
class Grammar(object):
def __init__(self, terminals):
# A list of all the productions
self.productions = [None]
# A dictionary mapping the names of non-terminals to a list of all
# productions of that nonterminal
self.prod_names = {}
# A dictionary mapping the names of terminals to a list of the rules
# where they are used
self.terminals = dict((t, []) for t in terminals)
self.terminals["error"] = []
# A dictionary mapping names of nonterminals to a list of rule numbers
# where they are used
self.nonterminals = {}
self.first = {}
self.follow = {}
self.precedence = {}
self.start = None
def add_production(self, prod_name, syms, func, precedence):
if prod_name in self.terminals:
raise ParserGeneratorError("Illegal rule name %r" % prod_name)
if precedence is None:
precname = rightmost_terminal(syms, self.terminals)
prod_prec = self.precedence.get(precname, ("right", 0))
else:
try:
prod_prec = self.precedence[precedence]
except KeyError:
raise ParserGeneratorError(
"Precedence %r doesn't exist" % precedence
)
pnumber = len(self.productions)
self.nonterminals.setdefault(prod_name, [])
for t in syms:
if t in self.terminals:
self.terminals[t].append(pnumber)
else:
self.nonterminals.setdefault(t, []).append(pnumber)
p = Production(pnumber, prod_name, syms, prod_prec, func)
self.productions.append(p)
self.prod_names.setdefault(prod_name, []).append(p)
def set_precedence(self, term, assoc, level):
if term in self.precedence:
raise ParserGeneratorError(
"Precedence already specified for %s" % term
)
if assoc not in ["left", "right", "nonassoc"]:
raise ParserGeneratorError(
"Precedence must be one of left, right, nonassoc; not %s" % (
assoc
)
)
self.precedence[term] = (assoc, level)
def set_start(self):
start = self.productions[1].name
self.productions[0] = Production(0, "S'", [start], ("right", 0), None)
self.nonterminals[start].append(0)
self.start = start
def unused_terminals(self):
return [
t
for t, prods in iteritems(self.terminals)
if not prods and t != "error"
]
def unused_productions(self):
return [p for p, prods in iteritems(self.nonterminals) if not prods]
def build_lritems(self):
"""
Walks the list of productions and builds a complete set of the LR
items.
"""
for p in self.productions:
lastlri = p
i = 0
lr_items = []
while True:
if i > p.getlength():
lri = None
else:
try:
before = p.prod[i - 1]
except IndexError:
before = None
try:
after = self.prod_names[p.prod[i]]
except (IndexError, KeyError):
after = []
lri = LRItem(p, i, before, after)
lastlri.lr_next = lri
if lri is None:
break
lr_items.append(lri)
lastlri = lri
i += 1
p.lr_items = lr_items
def _first(self, beta):
result = []
for x in beta:
x_produces_empty = False
for f in self.first[x]:
if f == "":
x_produces_empty = True
else:
if f not in result:
result.append(f)
if not x_produces_empty:
break
else:
result.append("")
return result
def compute_first(self):
for t in self.terminals:
self.first[t] = [t]
self.first["$end"] = ["$end"]
for n in self.nonterminals:
self.first[n] = []
changed = True
while changed:
changed = False
for n in self.nonterminals:
for p in self.prod_names[n]:
for f in self._first(p.prod):
if f not in self.first[n]:
self.first[n].append(f)
changed = True
def compute_follow(self):
for k in self.nonterminals:
self.follow[k] = []
start = self.start
self.follow[start] = ["$end"]
added = True
while added:
added = False
for p in self.productions[1:]:
for i, B in enumerate(p.prod):
if B in self.nonterminals:
fst = self._first(p.prod[i + 1:])
has_empty = False
for f in fst:
if f != "" and f not in self.follow[B]:
self.follow[B].append(f)
added = True
if f == "":
has_empty = True
if has_empty or i == (len(p.prod) - 1):
for f in self.follow[p.name]:
if f not in self.follow[B]:
self.follow[B].append(f)
added = True
class Production(object):
def __init__(self, num, name, prod, precedence, func):
self.name = name
self.prod = prod
self.number = num
self.func = func
self.prec = precedence
self.unique_syms = []
for s in self.prod:
if s not in self.unique_syms:
self.unique_syms.append(s)
self.lr_items = []
self.lr_next = None
self.lr0_added = 0
self.reduced = 0
def __repr__(self):
return "Production(%s -> %s)" % (self.name, " ".join(self.prod))
def getlength(self):
return len(self.prod)
class LRItem(object):
def __init__(self, p, n, before, after):
self.name = p.name
self.prod = p.prod[:]
self.prod.insert(n, ".")
self.number = p.number
self.lr_index = n
self.lookaheads = {}
self.unique_syms = p.unique_syms
self.lr_before = before
self.lr_after = after
def __repr__(self):
return "LRItem(%s -> %s)" % (self.name, " ".join(self.prod))
def getlength(self):
return len(self.prod)
rply-0.7.7/rply/parser.py 0000644 0000765 0000024 00000005706 12571304150 017010 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import ParsingError
class LRParser(object):
def __init__(self, lr_table, error_handler):
self.lr_table = lr_table
self.error_handler = error_handler
def parse(self, tokenizer, state=None):
from rply.token import Token
lookahead = None
lookaheadstack = []
statestack = [0]
symstack = [Token("$end", "$end")]
current_state = 0
while True:
if self.lr_table.default_reductions[current_state]:
t = self.lr_table.default_reductions[current_state]
current_state = self._reduce_production(
t, symstack, statestack, state
)
continue
if lookahead is None:
if lookaheadstack:
lookahead = lookaheadstack.pop()
else:
try:
lookahead = next(tokenizer)
except StopIteration:
lookahead = None
if lookahead is None:
lookahead = Token("$end", "$end")
ltype = lookahead.gettokentype()
if ltype in self.lr_table.lr_action[current_state]:
t = self.lr_table.lr_action[current_state][ltype]
if t > 0:
statestack.append(t)
current_state = t
symstack.append(lookahead)
lookahead = None
continue
elif t < 0:
current_state = self._reduce_production(
t, symstack, statestack, state
)
continue
else:
n = symstack[-1]
return n
else:
# TODO: actual error handling here
if self.error_handler is not None:
if state is None:
self.error_handler(lookahead)
else:
self.error_handler(state, lookahead)
raise AssertionError("For now, error_handler must raise.")
else:
raise ParsingError(None, lookahead.getsourcepos())
def _reduce_production(self, t, symstack, statestack, state):
# reduce a symbol on the stack and emit a production
p = self.lr_table.grammar.productions[-t]
pname = p.name
plen = p.getlength()
start = len(symstack) + (-plen - 1)
assert start >= 0
targ = symstack[start + 1:]
start = len(symstack) + (-plen)
assert start >= 0
del symstack[start:]
del statestack[start:]
if state is None:
value = p.func(targ)
else:
value = p.func(state, targ)
symstack.append(value)
current_state = self.lr_table.lr_goto[statestack[-1]][pname]
statestack.append(current_state)
return current_state
rply-0.7.7/rply/utils.py 0000644 0000765 0000024 00000002304 13421456515 016653 0 ustar alex_gaynor staff 0000000 0000000 import sys
if sys.version_info >= (3, 3):
from collections.abc import MutableMapping
else:
from collections import MutableMapping
class IdentityDict(MutableMapping):
def __init__(self):
self._contents = {}
self._keepalive = []
def __getitem__(self, key):
return self._contents[id(key)][1]
def __setitem__(self, key, value):
idx = len(self._keepalive)
self._keepalive.append(key)
self._contents[id(key)] = key, value, idx
def __delitem__(self, key):
del self._contents[id(key)]
for idx, obj in enumerate(self._keepalive):
if obj is key:
del self._keepalive[idx]
break
def __len__(self):
return len(self._contents)
def __iter__(self):
for key, _, _ in itervalues(self._contents):
yield key
class Counter(object):
def __init__(self):
self.value = 0
def incr(self):
self.value += 1
if sys.version_info >= (3,):
def itervalues(d):
return d.values()
def iteritems(d):
return d.items()
else:
def itervalues(d):
return d.itervalues()
def iteritems(d):
return d.iteritems()
rply-0.7.7/rply/lexer.py 0000644 0000765 0000024 00000003231 12663451224 016631 0 ustar alex_gaynor staff 0000000 0000000 from rply.errors import LexingError
from rply.token import SourcePosition, Token
class Lexer(object):
def __init__(self, rules, ignore_rules):
self.rules = rules
self.ignore_rules = ignore_rules
def lex(self, s):
return LexerStream(self, s)
class LexerStream(object):
def __init__(self, lexer, s):
self.lexer = lexer
self.s = s
self.idx = 0
self._lineno = 1
def __iter__(self):
return self
def _update_pos(self, match):
self.idx = match.end
self._lineno += self.s.count("\n", match.start, match.end)
last_nl = self.s.rfind("\n", 0, match.start)
if last_nl < 0:
return match.start + 1
else:
return match.start - last_nl
def next(self):
while True:
if self.idx >= len(self.s):
raise StopIteration
for rule in self.lexer.ignore_rules:
match = rule.matches(self.s, self.idx)
if match:
self._update_pos(match)
break
else:
break
for rule in self.lexer.rules:
match = rule.matches(self.s, self.idx)
if match:
lineno = self._lineno
colno = self._update_pos(match)
source_pos = SourcePosition(match.start, lineno, colno)
token = Token(
rule.name, self.s[match.start:match.end], source_pos
)
return token
else:
raise LexingError(None, SourcePosition(self.idx, -1, -1))
def __next__(self):
return self.next()
rply-0.7.7/rply/errors.py 0000644 0000765 0000024 00000002013 13421456515 017024 0 ustar alex_gaynor staff 0000000 0000000 class ParserGeneratorError(Exception):
pass
class LexingError(Exception):
"""
Raised by a Lexer, if no rule matches.
"""
def __init__(self, message, source_pos):
self.message = message
self.source_pos = source_pos
def getsourcepos(self):
"""
Returns the position in the source, at which this error occurred.
"""
return self.source_pos
def __repr__(self):
return 'LexingError(%r, %r)' % (self.message, self.source_pos)
class ParsingError(Exception):
"""
Raised by a Parser, if no production rule can be applied.
"""
def __init__(self, message, source_pos):
self.message = message
self.source_pos = source_pos
def getsourcepos(self):
"""
Returns the position in the source, at which this error occurred.
"""
return self.source_pos
def __repr__(self):
return 'ParsingError(%r, %r)' % (self.message, self.source_pos)
class ParserGeneratorWarning(Warning):
pass
rply-0.7.7/rply/parsergenerator.py 0000644 0000765 0000024 00000053357 13302276472 020734 0 ustar alex_gaynor staff 0000000 0000000 import errno
import hashlib
import json
import os
import sys
import tempfile
import warnings
from appdirs import AppDirs
from rply.errors import ParserGeneratorError, ParserGeneratorWarning
from rply.grammar import Grammar
from rply.parser import LRParser
from rply.utils import Counter, IdentityDict, iteritems, itervalues
LARGE_VALUE = sys.maxsize
class ParserGenerator(object):
"""
A ParserGenerator represents a set of production rules, that define a
sequence of terminals and non-terminals to be replaced with a non-terminal,
which can be turned into a parser.
:param tokens: A list of token (non-terminal) names.
:param precedence: A list of tuples defining the order of operation for
avoiding ambiguity, consisting of a string defining
associativity (left, right or nonassoc) and a list of
token names with the same associativity and level of
precedence.
:param cache_id: A string specifying an ID for caching.
"""
VERSION = 1
def __init__(self, tokens, precedence=[], cache_id=None):
self.tokens = tokens
self.productions = []
self.precedence = precedence
self.cache_id = cache_id
self.error_handler = None
def production(self, rule, precedence=None):
"""
A decorator that defines a production rule and registers the decorated
function to be called with the terminals and non-terminals matched by
that rule.
A `rule` should consist of a name defining the non-terminal returned
by the decorated function and a sequence of non-terminals and terminals
that are supposed to be replaced::
replacing_non_terminal : ATERMINAL non_terminal
The name of the non-terminal replacing the sequence is on the left,
separated from the sequence by a colon. The whitespace around the colon
is required.
Knowing this we can define productions::
pg = ParserGenerator(['NUMBER', 'ADD'])
@pg.production('number : NUMBER')
def expr_number(p):
return BoxInt(int(p[0].getstr()))
@pg.production('expr : number ADD number')
def expr_add(p):
return BoxInt(p[0].getint() + p[2].getint())
If a state was passed to the parser, the decorated function is
additionally called with that state as first argument.
"""
parts = rule.split()
production_name = parts[0]
if parts[1] != ":":
raise ParserGeneratorError("Expecting :")
syms = parts[2:]
def inner(func):
self.productions.append((production_name, syms, func, precedence))
return func
return inner
def error(self, func):
"""
Sets the error handler that is called with the state (if passed to the
parser) and the token the parser errored on.
Currently error handlers must raise an exception. If an error handler
is not defined, a :exc:`rply.ParsingError` will be raised.
"""
self.error_handler = func
return func
def compute_grammar_hash(self, g):
hasher = hashlib.sha1()
hasher.update(g.start.encode())
hasher.update(json.dumps(sorted(g.terminals)).encode())
for term, (assoc, level) in sorted(iteritems(g.precedence)):
hasher.update(term.encode())
hasher.update(assoc.encode())
hasher.update(bytes(level))
for p in g.productions:
hasher.update(p.name.encode())
hasher.update(json.dumps(p.prec).encode())
hasher.update(json.dumps(p.prod).encode())
return hasher.hexdigest()
def serialize_table(self, table):
return {
"lr_action": table.lr_action,
"lr_goto": table.lr_goto,
"sr_conflicts": table.sr_conflicts,
"rr_conflicts": table.rr_conflicts,
"default_reductions": table.default_reductions,
"start": table.grammar.start,
"terminals": sorted(table.grammar.terminals),
"precedence": table.grammar.precedence,
"productions": [
(p.name, p.prod, p.prec) for p in table.grammar.productions
],
}
def data_is_valid(self, g, data):
if g.start != data["start"]:
return False
if sorted(g.terminals) != data["terminals"]:
return False
if sorted(g.precedence) != sorted(data["precedence"]):
return False
for key, (assoc, level) in iteritems(g.precedence):
if data["precedence"][key] != [assoc, level]:
return False
if len(g.productions) != len(data["productions"]):
return False
for p, (name, prod, (assoc, level)) in zip(g.productions, data["productions"]):
if p.name != name:
return False
if p.prod != prod:
return False
if p.prec != (assoc, level):
return False
return True
def build(self):
g = Grammar(self.tokens)
for level, (assoc, terms) in enumerate(self.precedence, 1):
for term in terms:
g.set_precedence(term, assoc, level)
for prod_name, syms, func, precedence in self.productions:
g.add_production(prod_name, syms, func, precedence)
g.set_start()
for unused_term in g.unused_terminals():
warnings.warn(
"Token %r is unused" % unused_term,
ParserGeneratorWarning,
stacklevel=2
)
for unused_prod in g.unused_productions():
warnings.warn(
"Production %r is not reachable" % unused_prod,
ParserGeneratorWarning,
stacklevel=2
)
g.build_lritems()
g.compute_first()
g.compute_follow()
table = None
if self.cache_id is not None:
cache_dir = AppDirs("rply").user_cache_dir
cache_file = os.path.join(
cache_dir,
"%s-%s-%s.json" % (
self.cache_id, self.VERSION, self.compute_grammar_hash(g)
)
)
if os.path.exists(cache_file):
with open(cache_file) as f:
data = json.load(f)
if self.data_is_valid(g, data):
table = LRTable.from_cache(g, data)
if table is None:
table = LRTable.from_grammar(g)
if self.cache_id is not None:
self._write_cache(cache_dir, cache_file, table)
if table.sr_conflicts:
warnings.warn(
"%d shift/reduce conflict%s" % (
len(table.sr_conflicts),
"s" if len(table.sr_conflicts) > 1 else ""
),
ParserGeneratorWarning,
stacklevel=2,
)
if table.rr_conflicts:
warnings.warn(
"%d reduce/reduce conflict%s" % (
len(table.rr_conflicts),
"s" if len(table.rr_conflicts) > 1 else ""
),
ParserGeneratorWarning,
stacklevel=2,
)
return LRParser(table, self.error_handler)
def _write_cache(self, cache_dir, cache_file, table):
if not os.path.exists(cache_dir):
try:
os.makedirs(cache_dir, mode=0o0700)
except OSError as e:
if e.errno == errno.EROFS:
return
raise
with tempfile.NamedTemporaryFile(dir=cache_dir, delete=False, mode="w") as f:
json.dump(self.serialize_table(table), f)
os.rename(f.name, cache_file)
def digraph(X, R, FP):
N = dict.fromkeys(X, 0)
stack = []
F = {}
for x in X:
if N[x] == 0:
traverse(x, N, stack, F, X, R, FP)
return F
def traverse(x, N, stack, F, X, R, FP):
stack.append(x)
d = len(stack)
N[x] = d
F[x] = FP(x)
rel = R(x)
for y in rel:
if N[y] == 0:
traverse(y, N, stack, F, X, R, FP)
N[x] = min(N[x], N[y])
for a in F.get(y, []):
if a not in F[x]:
F[x].append(a)
if N[x] == d:
N[stack[-1]] = LARGE_VALUE
F[stack[-1]] = F[x]
element = stack.pop()
while element != x:
N[stack[-1]] = LARGE_VALUE
F[stack[-1]] = F[x]
element = stack.pop()
class LRTable(object):
def __init__(self, grammar, lr_action, lr_goto, default_reductions,
sr_conflicts, rr_conflicts):
self.grammar = grammar
self.lr_action = lr_action
self.lr_goto = lr_goto
self.default_reductions = default_reductions
self.sr_conflicts = sr_conflicts
self.rr_conflicts = rr_conflicts
@classmethod
def from_cache(cls, grammar, data):
lr_action = [
dict([(str(k), v) for k, v in iteritems(action)])
for action in data["lr_action"]
]
lr_goto = [
dict([(str(k), v) for k, v in iteritems(goto)])
for goto in data["lr_goto"]
]
return LRTable(
grammar,
lr_action,
lr_goto,
data["default_reductions"],
data["sr_conflicts"],
data["rr_conflicts"]
)
@classmethod
def from_grammar(cls, grammar):
cidhash = IdentityDict()
goto_cache = {}
add_count = Counter()
C = cls.lr0_items(grammar, add_count, cidhash, goto_cache)
cls.add_lalr_lookaheads(grammar, C, add_count, cidhash, goto_cache)
lr_action = [None] * len(C)
lr_goto = [None] * len(C)
sr_conflicts = []
rr_conflicts = []
for st, I in enumerate(C):
st_action = {}
st_actionp = {}
st_goto = {}
for p in I:
if p.getlength() == p.lr_index + 1:
if p.name == "S'":
# Start symbol. Accept!
st_action["$end"] = 0
st_actionp["$end"] = p
else:
laheads = p.lookaheads[st]
for a in laheads:
if a in st_action:
r = st_action[a]
if r > 0:
sprec, slevel = grammar.productions[st_actionp[a].number].prec
rprec, rlevel = grammar.precedence.get(a, ("right", 0))
if (slevel < rlevel) or (slevel == rlevel and rprec == "left"):
st_action[a] = -p.number
st_actionp[a] = p
if not slevel and not rlevel:
sr_conflicts.append((st, repr(a), "reduce"))
grammar.productions[p.number].reduced += 1
elif not (slevel == rlevel and rprec == "nonassoc"):
if not rlevel:
sr_conflicts.append((st, repr(a), "shift"))
elif r < 0:
oldp = grammar.productions[-r]
pp = grammar.productions[p.number]
if oldp.number > pp.number:
st_action[a] = -p.number
st_actionp[a] = p
chosenp, rejectp = pp, oldp
grammar.productions[p.number].reduced += 1
grammar.productions[oldp.number].reduced -= 1
else:
chosenp, rejectp = oldp, pp
rr_conflicts.append((st, repr(chosenp), repr(rejectp)))
else:
raise ParserGeneratorError("Unknown conflict in state %d" % st)
else:
st_action[a] = -p.number
st_actionp[a] = p
grammar.productions[p.number].reduced += 1
else:
i = p.lr_index
a = p.prod[i + 1]
if a in grammar.terminals:
g = cls.lr0_goto(I, a, add_count, goto_cache)
j = cidhash.get(g, -1)
if j >= 0:
if a in st_action:
r = st_action[a]
if r > 0:
if r != j:
raise ParserGeneratorError("Shift/shift conflict in state %d" % st)
elif r < 0:
rprec, rlevel = grammar.productions[st_actionp[a].number].prec
sprec, slevel = grammar.precedence.get(a, ("right", 0))
if (slevel > rlevel) or (slevel == rlevel and rprec == "right"):
grammar.productions[st_actionp[a].number].reduced -= 1
st_action[a] = j
st_actionp[a] = p
if not rlevel:
sr_conflicts.append((st, repr(a), "shift"))
elif not (slevel == rlevel and rprec == "nonassoc"):
if not slevel and not rlevel:
sr_conflicts.append((st, repr(a), "reduce"))
else:
raise ParserGeneratorError("Unknown conflict in state %d" % st)
else:
st_action[a] = j
st_actionp[a] = p
nkeys = set()
for ii in I:
for s in ii.unique_syms:
if s in grammar.nonterminals:
nkeys.add(s)
for n in nkeys:
g = cls.lr0_goto(I, n, add_count, goto_cache)
j = cidhash.get(g, -1)
if j >= 0:
st_goto[n] = j
lr_action[st] = st_action
lr_goto[st] = st_goto
default_reductions = [0] * len(lr_action)
for state, actions in enumerate(lr_action):
actions = set(itervalues(actions))
if len(actions) == 1 and next(iter(actions)) < 0:
default_reductions[state] = next(iter(actions))
return LRTable(grammar, lr_action, lr_goto, default_reductions, sr_conflicts, rr_conflicts)
@classmethod
def lr0_items(cls, grammar, add_count, cidhash, goto_cache):
C = [cls.lr0_closure([grammar.productions[0].lr_next], add_count)]
for i, I in enumerate(C):
cidhash[I] = i
i = 0
while i < len(C):
I = C[i]
i += 1
asyms = set()
for ii in I:
asyms.update(ii.unique_syms)
for x in asyms:
g = cls.lr0_goto(I, x, add_count, goto_cache)
if not g:
continue
if g in cidhash:
continue
cidhash[g] = len(C)
C.append(g)
return C
@classmethod
def lr0_closure(cls, I, add_count):
add_count.incr()
J = I[:]
added = True
while added:
added = False
for j in J:
for x in j.lr_after:
if x.lr0_added == add_count.value:
continue
J.append(x.lr_next)
x.lr0_added = add_count.value
added = True
return J
@classmethod
def lr0_goto(cls, I, x, add_count, goto_cache):
s = goto_cache.setdefault(x, IdentityDict())
gs = []
for p in I:
n = p.lr_next
if n and n.lr_before == x:
s1 = s.get(n)
if not s1:
s1 = {}
s[n] = s1
gs.append(n)
s = s1
g = s.get("$end")
if not g:
if gs:
g = cls.lr0_closure(gs, add_count)
s["$end"] = g
else:
s["$end"] = gs
return g
@classmethod
def add_lalr_lookaheads(cls, grammar, C, add_count, cidhash, goto_cache):
nullable = cls.compute_nullable_nonterminals(grammar)
trans = cls.find_nonterminal_transitions(grammar, C)
readsets = cls.compute_read_sets(grammar, C, trans, nullable, add_count, cidhash, goto_cache)
lookd, included = cls.compute_lookback_includes(grammar, C, trans, nullable, add_count, cidhash, goto_cache)
followsets = cls.compute_follow_sets(trans, readsets, included)
cls.add_lookaheads(lookd, followsets)
@classmethod
def compute_nullable_nonterminals(cls, grammar):
nullable = set()
num_nullable = 0
while True:
for p in grammar.productions[1:]:
if p.getlength() == 0:
nullable.add(p.name)
continue
for t in p.prod:
if t not in nullable:
break
else:
nullable.add(p.name)
if len(nullable) == num_nullable:
break
num_nullable = len(nullable)
return nullable
@classmethod
def find_nonterminal_transitions(cls, grammar, C):
trans = []
for idx, state in enumerate(C):
for p in state:
if p.lr_index < p.getlength() - 1:
t = (idx, p.prod[p.lr_index + 1])
if t[1] in grammar.nonterminals and t not in trans:
trans.append(t)
return trans
@classmethod
def compute_read_sets(cls, grammar, C, ntrans, nullable, add_count, cidhash, goto_cache):
return digraph(
ntrans,
R=lambda x: cls.reads_relation(C, x, nullable, add_count, cidhash, goto_cache),
FP=lambda x: cls.dr_relation(grammar, C, x, nullable, add_count, goto_cache)
)
@classmethod
def compute_follow_sets(cls, ntrans, readsets, includesets):
return digraph(
ntrans,
R=lambda x: includesets.get(x, []),
FP=lambda x: readsets[x],
)
@classmethod
def dr_relation(cls, grammar, C, trans, nullable, add_count, goto_cache):
state, N = trans
terms = []
g = cls.lr0_goto(C[state], N, add_count, goto_cache)
for p in g:
if p.lr_index < p.getlength() - 1:
a = p.prod[p.lr_index + 1]
if a in grammar.terminals and a not in terms:
terms.append(a)
if state == 0 and N == grammar.productions[0].prod[0]:
terms.append("$end")
return terms
@classmethod
def reads_relation(cls, C, trans, empty, add_count, cidhash, goto_cache):
rel = []
state, N = trans
g = cls.lr0_goto(C[state], N, add_count, goto_cache)
j = cidhash.get(g, -1)
for p in g:
if p.lr_index < p.getlength() - 1:
a = p.prod[p.lr_index + 1]
if a in empty:
rel.append((j, a))
return rel
@classmethod
def compute_lookback_includes(cls, grammar, C, trans, nullable, add_count, cidhash, goto_cache):
lookdict = {}
includedict = {}
dtrans = dict.fromkeys(trans, 1)
for state, N in trans:
lookb = []
includes = []
for p in C[state]:
if p.name != N:
continue
lr_index = p.lr_index
j = state
while lr_index < p.getlength() - 1:
lr_index += 1
t = p.prod[lr_index]
if (j, t) in dtrans:
li = lr_index + 1
while li < p.getlength():
if p.prod[li] in grammar.terminals:
break
if p.prod[li] not in nullable:
break
li += 1
else:
includes.append((j, t))
g = cls.lr0_goto(C[j], t, add_count, goto_cache)
j = cidhash.get(g, -1)
for r in C[j]:
if r.name != p.name:
continue
if r.getlength() != p.getlength():
continue
i = 0
while i < r.lr_index:
if r.prod[i] != p.prod[i + 1]:
break
i += 1
else:
lookb.append((j, r))
for i in includes:
includedict.setdefault(i, []).append((state, N))
lookdict[state, N] = lookb
return lookdict, includedict
@classmethod
def add_lookaheads(cls, lookbacks, followset):
for trans, lb in iteritems(lookbacks):
for state, p in lb:
f = followset.get(trans, [])
laheads = p.lookaheads.setdefault(state, [])
for a in f:
if a not in laheads:
laheads.append(a)
rply-0.7.7/rply/lexergenerator.py 0000644 0000765 0000024 00000006210 13302276472 020541 0 ustar alex_gaynor staff 0000000 0000000 import re
try:
import rpython
from rpython.rlib.objectmodel import we_are_translated
from rpython.rlib.rsre import rsre_core
from rpython.rlib.rsre.rpy import get_code
except ImportError:
rpython = None
def we_are_translated():
return False
from rply.lexer import Lexer
class Rule(object):
_attrs_ = ['name', 'flags', '_pattern']
def __init__(self, name, pattern, flags=0):
self.name = name
self.re = re.compile(pattern, flags=flags)
if rpython:
self.flags = flags
self._pattern = get_code(pattern, flags)
def _freeze_(self):
return True
def matches(self, s, pos):
if not we_are_translated():
m = self.re.match(s, pos)
return Match(*m.span(0)) if m is not None else None
else:
assert pos >= 0
ctx = rsre_core.StrMatchContext(s, pos, len(s), self.flags)
matched = rsre_core.match_context(ctx, self._pattern)
if matched:
return Match(ctx.match_start, ctx.match_end)
else:
return None
class Match(object):
_attrs_ = ["start", "end"]
def __init__(self, start, end):
self.start = start
self.end = end
class LexerGenerator(object):
r"""
A LexerGenerator represents a set of rules that match pieces of text that
should either be turned into tokens or ignored by the lexer.
Rules are added using the :meth:`add` and :meth:`ignore` methods:
>>> from rply import LexerGenerator
>>> lg = LexerGenerator()
>>> lg.add('NUMBER', r'\d+')
>>> lg.add('ADD', r'\+')
>>> lg.ignore(r'\s+')
The rules are passed to :func:`re.compile`. If you need additional flags,
e.g. :const:`re.DOTALL`, you can pass them to :meth:`add` and
:meth:`ignore` as an additional optional parameter:
>>> import re
>>> lg.add('ALL', r'.*', flags=re.DOTALL)
You can then build a lexer with which you can lex a string to produce an
iterator yielding tokens:
>>> lexer = lg.build()
>>> iterator = lexer.lex('1 + 1')
>>> iterator.next()
Token('NUMBER', '1')
>>> iterator.next()
Token('ADD', '+')
>>> iterator.next()
Token('NUMBER', '1')
>>> iterator.next()
Traceback (most recent call last):
...
StopIteration
"""
def __init__(self):
self.rules = []
self.ignore_rules = []
def add(self, name, pattern, flags=0):
"""
Adds a rule with the given `name` and `pattern`. In case of ambiguity,
the first rule added wins.
"""
self.rules.append(Rule(name, pattern, flags=flags))
def ignore(self, pattern, flags=0):
"""
Adds a rule whose matched value will be ignored. Ignored rules will be
matched before regular ones.
"""
self.ignore_rules.append(Rule("", pattern, flags=flags))
def build(self):
"""
Returns a lexer instance, which provides a `lex` method that must be
called with a string and returns an iterator yielding
:class:`~rply.Token` instances.
"""
return Lexer(self.rules, self.ignore_rules)
rply-0.7.7/MANIFEST.in 0000644 0000765 0000024 00000000043 12571304150 015677 0 ustar alex_gaynor staff 0000000 0000000 include README.rst
include LICENSE
rply-0.7.7/setup.py 0000644 0000765 0000024 00000000625 13421456663 015675 0 ustar alex_gaynor staff 0000000 0000000 from setuptools import setup
with open("README.rst") as f:
readme = f.read()
setup(
name="rply",
description="A pure Python Lex/Yacc that works with RPython",
long_description=readme,
# duplicated in docs/conf.py and rply/__init__.py
version="0.7.7",
author="Alex Gaynor",
author_email="alex.gaynor@gmail.com",
packages=["rply"],
install_requires=["appdirs"],
)
rply-0.7.7/rply.egg-info/ 0000755 0000765 0000024 00000000000 13421457110 016624 5 ustar alex_gaynor staff 0000000 0000000 rply-0.7.7/rply.egg-info/PKG-INFO 0000644 0000765 0000024 00000012043 13421457107 017727 0 ustar alex_gaynor staff 0000000 0000000 Metadata-Version: 1.0
Name: rply
Version: 0.7.7
Summary: A pure Python Lex/Yacc that works with RPython
Home-page: UNKNOWN
Author: Alex Gaynor
Author-email: alex.gaynor@gmail.com
License: BSD 3-Clause License
Description: RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: https://travis-ci.org/alex/rply
Welcome to RPLY! A pure Python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
You can find the documentation `online`_.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.4+, and PyPy. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker `_
* `PyPI releases `_
* `Talk at PyCon US 2013: So you want to write an interpreter? `_
.. _`online`: https://rply.readthedocs.io/
Platform: UNKNOWN
rply-0.7.7/rply.egg-info/SOURCES.txt 0000644 0000765 0000024 00000000524 13421457107 020517 0 ustar alex_gaynor staff 0000000 0000000 LICENSE
MANIFEST.in
README.rst
setup.cfg
setup.py
rply/__init__.py
rply/errors.py
rply/grammar.py
rply/lexer.py
rply/lexergenerator.py
rply/parser.py
rply/parsergenerator.py
rply/token.py
rply/utils.py
rply.egg-info/PKG-INFO
rply.egg-info/SOURCES.txt
rply.egg-info/dependency_links.txt
rply.egg-info/requires.txt
rply.egg-info/top_level.txt rply-0.7.7/rply.egg-info/requires.txt 0000644 0000765 0000024 00000000010 13421457107 021221 0 ustar alex_gaynor staff 0000000 0000000 appdirs
rply-0.7.7/rply.egg-info/top_level.txt 0000644 0000765 0000024 00000000005 13421457107 021357 0 ustar alex_gaynor staff 0000000 0000000 rply
rply-0.7.7/rply.egg-info/dependency_links.txt 0000644 0000765 0000024 00000000001 13421457107 022700 0 ustar alex_gaynor staff 0000000 0000000
rply-0.7.7/setup.cfg 0000644 0000765 0000024 00000000150 13421457110 015761 0 ustar alex_gaynor staff 0000000 0000000 [metadata]
license = BSD 3-Clause License
[wheel]
universal = 1
[egg_info]
tag_build =
tag_date = 0
rply-0.7.7/README.rst 0000644 0000765 0000024 00000007532 13302276472 015652 0 ustar alex_gaynor staff 0000000 0000000 RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: https://travis-ci.org/alex/rply
Welcome to RPLY! A pure Python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
You can find the documentation `online`_.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.4+, and PyPy. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker `_
* `PyPI releases `_
* `Talk at PyCon US 2013: So you want to write an interpreter? `_
.. _`online`: https://rply.readthedocs.io/