LEPL-5.1.3/0000755000175000001440000000000011764776700012732 5ustar andrewusers00000000000000LEPL-5.1.3/setup.py0000644000175000001440000001104511764776275014454 0ustar andrewusers00000000000000from setuptools import setup setup(name='LEPL', version='5.1.3', description='A Parser Library for Python 2.6+/3+: Recursive Descent; Full Backtracking', long_description=''' THIS PROJECT IS NO LONGER DEVELOPED. PLEASE SEE THE `SITE ` FOR MORE INFORMATION. LEPL is a recursive descent parser, written in Python, which has a a friendly, easy-to-use syntax. The underlying implementation includes several features that make it more powerful than might be expected. For example, it is not limited by the Python stack, because it uses trampolining and co-routines. Multiple parses can be found for ambiguous grammars and it can also handle left-recursive grammars. The aim is a powerful, extensible parser that will also give solid, reliable results to first-time users. `Release 5 `_ has simpler stream (input) handling. Memoisation, line-aware lexing and memory use have also been revised. These changes make future extension easier, fix several bugs, and improve performance. Features -------- * **Parsers are Python code**, defined in Python itself. No separate grammar is necessary. * **Friendly syntax** using Python's operators allows grammars to be defined in a declarative style close to BNF. * Integrated, optional **lexer** simplifies handling whitespace. * Built-in **AST support** with support for iteration, traversal and re--writing. * Generic, pure-Python approach supports parsing a wide variety of data including **bytes** (Python 3+ only). * **Well documented** and easy to extend. * **Unlimited recursion depth**. The underlying algorithm is recursive descent, which can exhaust the stack for complex grammars and large data sets. LEPL avoids this problem by using Python generators as coroutines (aka "trampolining"). * **Parser rewriting**. The parser can itself be manipulated by Python code. This gives unlimited opportunities for future expansion and optimisation. * Support for ambiguous grammars (**complete backtracking**). A parser can return more than one result (aka **"parse forests"**). * Parsers can be made more **efficient** with automatic memoisation ("packrat parsing"). * Memoisation can detect and control **left-recursive grammars**. Together with LEPL's support for ambiguity this means that "any" grammar can be supported. * Trace and resource management, including **"deepest match" diagnostics** and the ability to limit backtracking. ''', author='Andrew Cooke', author_email='andrew@acooke.org', url='http://www.acooke.org/lepl/', packages=['lepl', 'lepl._test', 'lepl._example', 'lepl.apps', 'lepl.apps._test', 'lepl.bin', 'lepl.bin._test', 'lepl.bin._example', 'lepl.contrib', 'lepl.core', 'lepl.core._test', 'lepl.lexer', 'lepl.lexer._test', 'lepl.lexer._example', 'lepl.lexer.lines', 'lepl.lexer.lines._test', 'lepl.lexer.lines._example', 'lepl.matchers', 'lepl.matchers._test', 'lepl.regexp', 'lepl.regexp._test', 'lepl.stream', 'lepl.stream._test', 'lepl.support', 'lepl.support._test', ], package_dir = {'':'src'}, keywords = "parser", classifiers=['Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', 'License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)', 'License :: OSI Approved :: Mozilla Public License 1.1 (MPL 1.1)', 'Natural Language :: English', 'Operating System :: OS Independent', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.0', 'Programming Language :: Python :: 3.1', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Topic :: Software Development', 'Topic :: Software Development :: Libraries', 'Topic :: Software Development :: Libraries :: Python Modules', 'Topic :: Text Processing', 'Topic :: Text Processing :: Filters', 'Topic :: Text Processing :: General', 'Topic :: Utilities' ] ) LEPL-5.1.3/PKG-INFO0000644000175000001440000000776611764776700014047 0ustar andrewusers00000000000000Metadata-Version: 1.0 Name: LEPL Version: 5.1.3 Summary: A Parser Library for Python 2.6+/3+: Recursive Descent; Full Backtracking Home-page: http://www.acooke.org/lepl/ Author: Andrew Cooke Author-email: andrew@acooke.org License: UNKNOWN Description: THIS PROJECT IS NO LONGER DEVELOPED. PLEASE SEE THE `SITE ` FOR MORE INFORMATION. LEPL is a recursive descent parser, written in Python, which has a a friendly, easy-to-use syntax. The underlying implementation includes several features that make it more powerful than might be expected. For example, it is not limited by the Python stack, because it uses trampolining and co-routines. Multiple parses can be found for ambiguous grammars and it can also handle left-recursive grammars. The aim is a powerful, extensible parser that will also give solid, reliable results to first-time users. `Release 5 `_ has simpler stream (input) handling. Memoisation, line-aware lexing and memory use have also been revised. These changes make future extension easier, fix several bugs, and improve performance. Features -------- * **Parsers are Python code**, defined in Python itself. No separate grammar is necessary. * **Friendly syntax** using Python's operators allows grammars to be defined in a declarative style close to BNF. * Integrated, optional **lexer** simplifies handling whitespace. * Built-in **AST support** with support for iteration, traversal and re--writing. * Generic, pure-Python approach supports parsing a wide variety of data including **bytes** (Python 3+ only). * **Well documented** and easy to extend. * **Unlimited recursion depth**. The underlying algorithm is recursive descent, which can exhaust the stack for complex grammars and large data sets. LEPL avoids this problem by using Python generators as coroutines (aka "trampolining"). * **Parser rewriting**. The parser can itself be manipulated by Python code. This gives unlimited opportunities for future expansion and optimisation. * Support for ambiguous grammars (**complete backtracking**). A parser can return more than one result (aka **"parse forests"**). * Parsers can be made more **efficient** with automatic memoisation ("packrat parsing"). * Memoisation can detect and control **left-recursive grammars**. Together with LEPL's support for ambiguity this means that "any" grammar can be supported. * Trace and resource management, including **"deepest match" diagnostics** and the ability to limit backtracking. Keywords: parser Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: License :: OSI Approved :: Mozilla Public License 1.1 (MPL 1.1) Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.0 Classifier: Programming Language :: Python :: 3.1 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Topic :: Software Development Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Text Processing Classifier: Topic :: Text Processing :: Filters Classifier: Topic :: Text Processing :: General Classifier: Topic :: Utilities LEPL-5.1.3/setup.cfg0000644000175000001440000000007311764776700014553 0ustar andrewusers00000000000000[egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 LEPL-5.1.3/src/0000755000175000001440000000000011764776700013521 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/0000755000175000001440000000000011764776700014455 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/_test/0000755000175000001440000000000011764776700015573 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/_test/__init__.py0000644000175000001440000001102411740102237017660 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl package. ''' from logging import getLogger, basicConfig, DEBUG, WARN, ERROR from sys import version from types import ModuleType from unittest import TestSuite, TestLoader, TextTestRunner import lepl # we need to import all files used in the automated self-test # pylint: disable-msg=E0611, W0401 #@PydevCodeAnalysisIgnore import lepl._test.bug_stalled_parser import lepl._test.magus import lepl._test.wrong_cache_bug import lepl._test.wrong_depth_bug import lepl._test.wrong_regexp_bug # Number of tests if running in IDE with Python 3 TOTAL = 429 NOT_DISTRIBUTED = 12 NOT_3 = 22 MODULES = [('apps', []), ('bin', []), ('cairo', []), ('contrib', []), ('core', []), ('lexer', [('lines', [])]), ('matchers', []), ('regexp', []), ('stream', []), ('support', [])] def all(): ''' This runs all tests and examples. It is something of a compromise - seems to be the best solution that's independent of other libraries, doesn't use the file system (since code may be in a zip file), and keeps the number of required imports to a minimum. ''' basicConfig(level=ERROR) log = getLogger('lepl._test.all.all') suite = TestSuite() loader = TestLoader() runner = TextTestRunner(verbosity=4) for module in ls_modules(lepl, MODULES): log.debug(module.__name__) suite.addTest(loader.loadTestsFromModule(module)) result = runner.run(suite) print('\n\n\n----------------------------------------------------------' '------------\n') if version[0] == '2': print('Expect 2-5 failures + 2 errors in Python 2: {0:d}, {1:d} ' .format(len(result.failures), len(result.errors))) assert 2 <= len(result.failures) <= 5, len(result.failures) assert 1 <= len(result.errors) <= 2, len(result.errors) target = TOTAL - NOT_DISTRIBUTED - NOT_3 else: print('Expect at most 1 failure + 0 errors in Python 3: {0:d}, {1:d} ' .format(len(result.failures), len(result.errors))) assert 0 <= len(result.failures) <= 1, len(result.failures) assert 0 <= len(result.errors) <= 0, len(result.errors) target = TOTAL - NOT_DISTRIBUTED print('Expect {0:d} tests total: {1:d}'.format(target, result.testsRun)) assert result.testsRun == target, result.testsRun print('\nLooks OK to me!\n\n') def ls_modules(parent, children): known = set() children += [('_test', []), ('_example', [])] children += map(lambda module: (module, []), dir(parent)) for (child, unborn) in children: name = parent.__name__ + '.' + child try: __import__(name) module = getattr(parent, child) if isinstance(module, ModuleType) and module not in known: yield module known.add(module) for module in ls_modules(module, unborn): yield module except ImportError as e: if not str(e).startswith('No module named'): raise if __name__ == '__main__': all() LEPL-5.1.3/src/lepl/_test/wrong_regexp_bug.py0000644000175000001440000000427111731117151021473 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' The regexp example from the tutorial is returning an error indicating that Dfa or NfaRegexp is being called. ''' from unittest import TestCase from lepl import * class RegexpTest(TestCase): def test_groups(self): matcher = Regexp('a*(b*)c*(d*)e*') matcher.config.clear() p = matcher.get_parse() t = p.matcher.tree() assert t == """FunctionWrapper> `- 'a*(b*)c*(d*)e*'""", t matcher.config.default() p = matcher.get_parse() t = p.matcher.tree() assert t == """TrampolineWrapper +- _RMemo | `- FunctionWrapper> | `- 'a*(b*)c*(d*)e*' `- True""", t result = p('abbcccddddeeeeee') assert result == ['bb', 'dddd'], result LEPL-5.1.3/src/lepl/_test/base.py0000644000175000001440000000604011731117151017036 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Support for matcher tests. ''' #from logging import basicConfig, DEBUG from re import sub from unittest import TestCase from lepl.support.lib import str from lepl.stream.maxdepth import FullFirstMatchException class BaseTest(TestCase): def assert_direct(self, stream, match, target): match.config.no_full_first_match() parser = match.get_parse_all() #print(parser.matcher.tree()) result = list(parser(stream)) assert target == result, result def assert_fail(self, stream, match): try: match.match_string(stream) assert 'Expected error' except FullFirstMatchException: pass def assert_list(self, stream, match, target, **kargs): match.config.no_full_first_match() matcher = match.get_parse_list_all() #print(matcher.matcher) result = list(matcher(stream, **kargs)) assert target == result, result def assert_literal(self, stream, matcher): self.assert_direct(stream, matcher, [[stream]]) def assert_str(a, b): ''' Assert two strings are approximately equal, allowing tests to run in Python 3 and 2. ''' def clean(x): x = str(x) x = x.replace("u'", "'") x = x.replace("lepl.matchers.error.Error", "Error") x = x.replace("lepl.stream.maxdepth.FullFirstMatchException", "FullFirstMatchException") x = sub('<(.+) 0x[0-9a-fA-F]*>', '<\\1 0x...>', x) x = sub('(\\d+)L', '\\1', x) return x a = clean(a) b = clean(b) assert a == b, '"' + a + '" != "' + b + '"' LEPL-5.1.3/src/lepl/_test/magus.py0000644000175000001440000001650711731117215017252 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for a bug reported for 3.2, 3.2.1 ''' # pylint: disable-msg=W0614, W0401, C0103, R0201, R0914, R0915 # test #@PydevCodeAnalysisIgnore #from logging import basicConfig, DEBUG from unittest import TestCase from difflib import Differ from lepl import * from lepl.support.graph import ConstructorWalker from lepl.matchers.matcher import Matcher, canonical_matcher_type,\ MatcherTypeException, is_child from lepl.matchers.memo import _LMemo, _RMemo, LMemo, RMemo from lepl.matchers.transform import Transform, TransformationWrapper from lepl.core.rewriters import NodeStats, Flatten, \ ComposeTransforms, AutoMemoize, clone_matcher, RightMemoize, LeftMemoize class MagusTest(TestCase): ''' Based on the original bug report. ''' def test_magus(self): ''' This was failing. ''' #basicConfig(level=DEBUG) name = Word(Letter()) > 'name' expression = Delayed() variable = Delayed() function = (expression / '()') > 'function' expression += (variable | function) > 'expression' variable += (name | expression / '.' / name) dotted_name = function & Eos() parser = dotted_name.get_parse_string() try: parser("1func()") assert False, 'expected left recursion' except MemoException: pass dotted_name.config.auto_memoize().no_full_first_match() parser = dotted_name.get_parse_string() parser("1func()") #class DelayedCloneTest(TestCase): # ''' # The original problem for 3.2 was related to clones losing children. # ''' # # def test_clone(self): # ''' # Clone and check children. # ''' # a = Delayed() # b = (a | 'c') # a += b # # def simple_clone(node): # ''' # Clone the node. # ''' # walker = ConstructorWalker(node, Matcher) # return walker(DelayedClone()) # # self.assert_children(b) # bb = simple_clone(b) # self.assert_children(bb) # # # def assert_children(self, b): # ''' # Check children are non-None. # ''' ## print('>>>{0!s}<<<'.format(b)) # assert is_child(b, Or) # for child in b.matchers: # assert child class CloneTest(TestCase): ''' Test various clone functions. ''' def test_describe(self): ''' Use a description of the graph to check against changes. ''' #basicConfig(level=DEBUG) name = Word(Letter()) > 'name' expression = Delayed() variable = Delayed() function = (expression / '()') > 'function' expression += (variable | function) > 'expression' variable += (name | expression / '.' / name) dotted_name = function & Eos() base = dotted_name.tree() # print(base) desc0 = NodeStats(dotted_name) print(desc0) assert desc0.total == 18, desc0 self.assert_count(desc0, And, 5) self.assert_count(desc0, Or, 2) self.assert_count(desc0, Delayed, 2) clone0 = clone_matcher(dotted_name) # print(clone0.tree()) diff = Differ() diff_text = '\n'.join(diff.compare(base.split('\n'), clone0.tree().split('\n'))) #print(diff_text) descx = NodeStats(clone0) print(descx) assert descx == desc0 clone1 = Flatten()(dotted_name) print(clone1.tree()) desc1 = NodeStats(clone1) print(desc1) # flattened And (Or no longer flattened as Delayed intervenes) assert desc1.total == 17, desc1 self.assert_count(desc1, And, 4) self.assert_count(desc1, Or, 2) self.assert_count(desc1, Delayed, 2) self.assert_count(desc1, Transform, 7) self.assert_count(desc1, TransformationWrapper, 7) clone2 = ComposeTransforms()(clone1) desc2 = NodeStats(clone2) #print(desc2) # compressed a transform assert desc2.total == 17, desc2 self.assert_count(desc2, And, 4) self.assert_count(desc2, Or, 2) self.assert_count(desc2, Delayed, 2) self.assert_count(desc2, Transform, 6) self.assert_count(desc2, TransformationWrapper, 6) clone3 = RightMemoize()(clone2) desc3 = NodeStats(clone3) #print(desc3) assert desc3.total == 17, desc3 self.assert_count(desc3, _RMemo, 17) self.assert_count(desc3, Delayed, 2) clone4 = LeftMemoize()(clone2) desc4 = NodeStats(clone4) #print(desc4) assert desc4.total == 17, desc4 self.assert_count(desc4, _LMemo, 20) # left memo duplicates delayed self.assert_count(desc4, Delayed, 3) clone5 = AutoMemoize(left=LMemo, right=RMemo)(clone2) desc5 = NodeStats(clone5) #print(desc5) assert desc5.total == 17, desc5 self.assert_count(desc5, _RMemo, 5) self.assert_count(desc5, _LMemo, 15) # left memo duplicates delayed self.assert_count(desc5, Delayed, 3) try: clone3.config.clear() clone3.parse_string('1join()') assert False, 'Expected error' except MemoException as error: assert 'Left recursion was detected' in str(error), str(error) clone4.config.clear() clone4.parse_string('1join()') clone5.config.clear() clone5.parse_string('1join()') def assert_count(self, desc, type_, count): ''' Check the count for a given type. ''' try: type_ = canonical_matcher_type(type_) except MatcherTypeException: pass assert type_ in desc.types and len(desc.types[type_]) == count, \ len(desc.types[type_]) if type_ in desc.types else type_ LEPL-5.1.3/src/lepl/_test/bug_stalled_parser.py0000644000175000001440000000603011731117151021764 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for a regexp bug. ''' # pylint: disable-msg=W0614, W0401, C0111, R0201 #@PydevCodeAnalysisIgnore #from logging import basicConfig, DEBUG from unittest import TestCase from lepl import * class LeftRecursiveTest(TestCase): # def test_limited_lookahead(self): # ''' # This stalls because Lookahead consumes nothing. Can we detect this # case? # ''' # #basicConfig(level=DEBUG) # # item = Delayed() # item += item[1:3] | ~Lookahead('x') # # expr = item[:2] & Drop(Eos()) # expr.config.left_memoize() # parser = expr.get_parse_string() # print(parser.matcher.tree()) # # parser('abc') # def test_plain_lookahead(self): # ''' # This stalls because Lookahead consumes nothing. Can we detect this # case? # ''' # #basicConfig(level=DEBUG) # # item = Delayed() # item += item[1:] | ~Lookahead('\\') # # expr = item & Drop(Eos()) # expr.config.left_memoize() # parser = expr.get_parse_string() # print(parser.matcher.tree()) # # parser('abc') def test_problem_from_regexp(self): item = Delayed() item += item[1:] expr = item & Drop(Eos()) expr.config.no_full_first_match() parser = expr.get_parse_string() try: parser('abc') assert False, 'expcted left recursion error' except MemoException: pass expr.config.left_memoize() parser = expr.get_parse_string() parser('abc') LEPL-5.1.3/src/lepl/_test/wrong_depth_bug.py0000644000175000001440000000404111731117151021300 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Depth warning gets to end of string. ''' from logging import basicConfig, DEBUG from unittest import TestCase from lepl import * class DepthTest(TestCase): def test_depth(self): #basicConfig(level=DEBUG) value = Token(Real()) symbol = Token('[^0-9a-zA-Z \t\r\n]') number = value >> float add = number & ~symbol('+') & number > sum try: add.parse('12+30') assert False, 'error expected' except FullFirstMatchException as e: msg = str(e) assert '+30' in msg, msg assert 'character 3' in msg, msg LEPL-5.1.3/src/lepl/_test/wrong_cache_bug.py0000644000175000001440000001077011731117151021245 0ustar andrewusers00000000000000from lepl.support._test.node import NodeTest from lepl.core.rewriters import NodeStats, NodeStats2 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Example returning less results than before. ''' from logging import basicConfig, DEBUG from unittest import TestCase from lepl import * class CacheTest(TestCase): def test_cache(self): #basicConfig(level=DEBUG) with TraceVariables(): value = Token(UnsignedReal()) symbol = Token('[^0-9a-zA-Z \t\r\n]') number = (Optional(symbol('-')) + value) >> float group2, group3c = Delayed(), Delayed() parens = symbol('(') & group3c & symbol(')') group1 = parens | number mul = (group2 & symbol('*') & group2) > List # changed div = (group2 & symbol('/') & group2) > List # changed group2 += (mul | div | group1) add = (group3c & symbol('+') & group3c) > List # changed sub = (group3c & symbol('-') & group3c) > List # changed group3c += (add | sub | group2) group3c.config.clear().lexer().auto_memoize().trace_variables() p = group3c.get_parse_all() #print(p.matcher.tree()) results = list(p('1+2*(3-4)+5/6+7')) for result in results: #print(result[0]) pass assert len(results) == 12, results def test_left(self): #basicConfig(level=DEBUG) a = Delayed() a += Optional(a) & (a | 'b' | 'c') for (conservative, full, d, n) in [(None, True, 0, 104), (None, True, 1, 38), (False, False, 1, 38)]: a.config.clear().no_full_first_match().auto_memoize( conservative=conservative, full=full, d=d) p = a.get_parse_all() #print(p.matcher.tree()) r = list(p('bcb')) assert len(r) == n, (n, len(r), r) def test_trace_variables(self): # for comparison with TraceVariables(): a = Delayed() a += Optional(a) & (a | 'b' | 'c') #print('\n*** clear') a.config.clear().no_full_first_match() p = a.get_parse_all() #print(p.matcher.tree()) #print(NodeStats2(p.matcher)) #print('*** trace_variables') a.config.clear().no_full_first_match().trace_variables() p = a.get_parse_all() #print(p.matcher.tree()) #print(NodeStats2(p.matcher)) #print('*** auto_memoize') a.config.clear().no_full_first_match().auto_memoize() p = a.get_parse_all() #print(p.matcher.tree()) #print(NodeStats2(p.matcher)) r = list(p('bcb')) assert len(r) == 104, (len(r), r) #basicConfig(level=DEBUG) a.config.clear().no_full_first_match().auto_memoize().trace_variables() p = a.get_parse_all() #print('*** trace_variables and memoize') #print(p.matcher.tree()) #print(NodeStats2(p.matcher)) r = list(p('bcb')) assert len(r) == 104, (len(r), r) LEPL-5.1.3/src/lepl/lexer/0000755000175000001440000000000011764776700015574 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/lexer/_test/0000755000175000001440000000000011764776700016712 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/lexer/_test/__init__.py0000644000175000001440000000316011731117151021002 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.lexer package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.lexer._test.matchers LEPL-5.1.3/src/lepl/lexer/_test/matchers.py0000644000175000001440000003007211731117151021053 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Wide range of tests for lexer. ''' # pylint: disable-msg=R0201, R0904, R0903, R0914 # tests from logging import basicConfig, DEBUG from math import sin, cos from operator import add, sub, truediv, mul from unittest import TestCase from lepl.lexer.matchers import Token from lepl.lexer.support import LexerError, RuntimeLexerError from lepl.matchers.core import Literal, Delayed from lepl.matchers.derived import Real, Any, Eos, UnsignedReal, Word from lepl.matchers.combine import Or from lepl.support.lib import str from lepl.support.node import Node #basicConfig(level=DEBUG) def str26(value): ''' Convert to string with crude hack for 2.6 Unicode ''' string = str(value) return string.replace("u'", "'") class RegexpCompilationTest(TestCase): ''' Test whether embedded matchers are converted to regular expressions. ''' def test_literal(self): ''' Simple literal should compile directly. ''' token = Token(Literal('abc')) token.compile() assert token.regexp == 'abc', repr(token.regexp) def test_words(self): ''' This used to be impossible. ''' results = Token(Word())[:].parse('foo bar') assert results == ['foo', 'bar'], results def test_real(self): ''' A real is more complex, but still compiles. ''' token = Token(Real(exponent='Ee')) token.compile() assert token.regexp == \ '(?:[\\+\\-])?(?:(?:[0-9](?:[0-9])*)?\\.[0-9](?:[0-9])*|[0-9](?:[0-9])*(?:\\.)?)(?:[Ee](?:[\\+\\-])?[0-9](?:[0-9])*)?', \ repr(token.regexp) def test_impossible(self): ''' Cannot compile arbitrary functions. ''' try: token = Token(Real() > (lambda x: x)) token.compile() assert False, 'Expected error' except LexerError: pass class TokenRewriteTest(TestCase): ''' Test token support. ''' def test_defaults(self): ''' Basic configuration. ''' #basicConfig(level=DEBUG) reals = (Token(Real()) >> float)[:] reals.config.lexer() parser = reals.get_parse() results = parser('1 2.3') assert results == [1.0, 2.3], results def test_string_arg(self): ''' Skip anything(not just spaces) ''' words = Token('[a-z]+')[:] words.config.lexer(discard='.') parser = words.get_parse() results = parser('abc defXghi') assert results == ['abc', 'def', 'ghi'], results def test_bad_error_msg(self): ''' An ugly error message. ''' #basicConfig(level=DEBUG) words = Token('[a-z]+')[:] words.config.lexer() parser = words.get_parse_sequence() try: parser('abc defXghi') assert False, 'expected error' except RuntimeLexerError as err: assert str(err) == "No token for 'Xghi' at offset 7, value 'X' of 'abc defXghi'.", str(err) def test_good_error_msg(self): ''' Better error message with streams. ''' #basicConfig(level=DEBUG) words = Token('[a-z]+')[:] words.config.lexer() parser = words.get_parse_string() try: parser('abc defXghi') assert False, 'expected error' except RuntimeLexerError as err: assert str(err) == "No token for 'Xghi' at line 1, character 8 of 'abc defXghi'.", str(err) def test_expr_with_functions(self): ''' Expression with function calls and appropriate binding. ''' #basicConfig(level=DEBUG) # pylint: disable-msg=C0111, C0321 class Call(Node): pass class Term(Node): pass class Factor(Node): pass class Expression(Node): pass value = Token(Real()) > 'value' name = Token('[a-z]+') symbol = Token('[^a-zA-Z0-9\\. ]') expr = Delayed() open_ = ~symbol('(') close = ~symbol(')') funcn = name > 'name' call = funcn & open_ & expr & close > Call term = call | value | open_ & expr & close > Term muldiv = symbol(Any('*/')) > 'operator' factor = term & (muldiv & term)[:] > Factor addsub = symbol(Any('+-')) > 'operator' expr += factor & (addsub & factor)[:] > Expression line = expr & Eos() line.config.trace_stack(True).lexer() parser = line.get_parse_string() results = str26(parser('1 + 2*sin(3+ 4) - 5')[0]) assert results == """Expression +- Factor | `- Term | `- value '1' +- operator '+' +- Factor | +- Term | | `- value '2' | +- operator '*' | `- Term | `- Call | +- name 'sin' | `- Expression | +- Factor | | `- Term | | `- value '3' | +- operator '+' | `- Factor | `- Term | `- value '4' +- operator '-' `- Factor `- Term `- value '5'""", '[' + results + ']' def test_expression2(self): ''' As before, but with evaluation. ''' #basicConfig(level=DEBUG) # we could do evaluation directly in the parser actions. but by # using the nodes instead we allow future expansion into a full # interpreter # pylint: disable-msg=C0111, C0321 class BinaryExpression(Node): op = lambda x, y: None def __float__(self): return self.op(float(self[0]), float(self[1])) class Sum(BinaryExpression): op = add class Difference(BinaryExpression): op = sub class Product(BinaryExpression): op = mul class Ratio(BinaryExpression): op = truediv class Call(Node): funs = {'sin': sin, 'cos': cos} def __float__(self): return self.funs[self[0]](self[1]) # we use unsigned float then handle negative values explicitly; # this lets us handle the ambiguity between subtraction and # negation which requires context (not available to the the lexer) # to resolve correctly. number = Token(UnsignedReal()) name = Token('[a-z]+') symbol = Token('[^a-zA-Z0-9\\. ]') expr = Delayed() factor = Delayed() real_ = Or(number >> float, ~symbol('-') & number >> (lambda x: -float(x))) open_ = ~symbol('(') close = ~symbol(')') trig = name(Or('sin', 'cos')) call = trig & open_ & expr & close > Call parens = open_ & expr & close value = parens | call | real_ ratio = value & ~symbol('/') & factor > Ratio prod = value & ~symbol('*') & factor > Product factor += prod | ratio | value diff = factor & ~symbol('-') & expr > Difference sum_ = factor & ~symbol('+') & expr > Sum expr += sum_ | diff | factor | value line = expr & Eos() parser = line.get_parse() def myeval(text): result = parser(text) return float(result[0]) self.assertAlmostEqual(myeval('1'), 1) self.assertAlmostEqual(myeval('1 + 2*3'), 7) self.assertAlmostEqual(myeval('1 - 4 / (3 - 1)'), -1) self.assertAlmostEqual(myeval('1 -4 / (3 -1)'), -1) self.assertAlmostEqual(myeval('1 + 2*sin(3+ 4) - 5'), -2.68602680256) class ErrorTest(TestCase): ''' Test various error messages. ''' def test_mixed(self): ''' Cannot mix tokens and non-tokens at same level. ''' bad = Token(Any()) & Any() try: bad.get_parse() assert False, 'expected failure' except LexerError as err: assert str(err) == 'The grammar contains a mix of Tokens and ' \ 'non-Token matchers at the top level. If ' \ 'Tokens are used then non-token matchers ' \ 'that consume input must only appear "inside" ' \ 'Tokens. The non-Token matchers include: ' \ 'Any(None).', str(err) else: assert False, 'wrong exception' def test_bad_space(self): ''' An unexpected character fails to match. ''' token = Token('a') token.config.clear().lexer(discard='b') parser = token.get_parse() assert parser('a') == ['a'], parser('a') assert parser('b') == None, parser('b') try: parser('c') assert False, 'expected failure' except RuntimeLexerError as err: assert str(err) == "No token for 'c' at line 1, character 1 of 'c'.", str(err) def test_incomplete(self): ''' A token is not completely consumed (this doesn't raise error messages, it just fails to match). ''' token = Token('[a-z]+')(Any()) token.config.no_full_first_match() parser = token.get_parse_string() assert parser('a') == ['a'], parser('a') # even though this matches the token, the Any() sub-matcher doesn't # consume all the contents assert parser('ab') == None, parser('ab') token = Token('[a-z]+')(Any(), complete=False) token.config.no_full_first_match() parser = token.get_parse_string() assert parser('a') == ['a'], parser('a') # whereas this is fine, since complete=False assert parser('ab') == ['a'], parser('ab') def test_none_discard(self): ''' If discard is '', discard nothing. ''' token = Token('a') token.config.lexer(discard='').no_full_first_match() parser = token[1:].get_parse() result = parser('aa') assert result == ['a', 'a'], result try: parser(' a') except RuntimeLexerError as error: assert str26(error) == "No discard for ' a'.", str26(error) def test_paren(self): try: Token('(').match('foo') assert False, 'expected error' except Exception as e: assert "Cannot parse regexp '('" in str(e), e LEPL-5.1.3/src/lepl/lexer/__init__.py0000644000175000001440000000272311731117151017670 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' A lexer (tokenizer) for Lepl. '''LEPL-5.1.3/src/lepl/lexer/rewriters.py0000644000175000001440000001156111731117151020157 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Rewrite a matcher graph to include lexing. ''' from collections import deque from lepl.core.rewriters import Rewriter from lepl.lexer.lexer import Lexer from lepl.lexer.support import LexerError from lepl.lexer.matchers import BaseToken, NonToken from lepl.matchers.matcher import Matcher, is_child from lepl.regexp.unicode import UnicodeAlphabet from lepl.support.lib import fmt def find_tokens(matcher): ''' Returns a set of Tokens. Also asserts that children of tokens are not themselves Tokens. Should we also check that a Token occurs somewhere on every path to a leaf node? ''' (tokens, visited, non_tokens) = (set(), set(), set()) stack = deque([matcher]) while stack: matcher = stack.popleft() if matcher not in visited: if is_child(matcher, NonToken): non_tokens.add(matcher) if isinstance(matcher, BaseToken): tokens.add(matcher) if matcher.content: assert_not_token(matcher.content, visited) else: for child in matcher: if isinstance(child, Matcher): stack.append(child) visited.add(matcher) if tokens and non_tokens: raise LexerError( fmt('The grammar contains a mix of Tokens and non-Token ' 'matchers at the top level. If Tokens are used then ' 'non-token matchers that consume input must only ' 'appear "inside" Tokens. The non-Token matchers ' 'include: {0}.', '; '.join(str(n) for n in non_tokens))) return tokens def assert_not_token(node, visited): ''' Assert that neither this nor any child node is a Token. ''' if isinstance(node, Matcher) and node not in visited: visited.add(node) if isinstance(node, BaseToken): raise LexerError(fmt('Nested token: {0}', node)) else: for child in node: assert_not_token(child, visited) class AddLexer(Rewriter): ''' This is required when using Tokens. It does the following: - Find all tokens in the matcher graph - Construct a lexer from the tokens - Connect the lexer to the matcher - Check that all children have a token parent (and optionally add a default token) Although possibly not in that order. alphabet is the alphabet for which the regular expressions are defined. discard is a regular expression that is used to match space (typically) if no token can be matched (and which is then discarded) ''' def __init__(self, alphabet=None, discard=None, lexer=None): if alphabet is None: alphabet = UnicodeAlphabet.instance() # use '' to have no discard at all if discard is None: discard = '[ \t\r\n]+' super(AddLexer, self).__init__(Rewriter.LEXER, name=fmt('Lexer({0}, {1}, {2})', alphabet, discard, lexer)) self.alphabet = alphabet self.discard = discard self.lexer = lexer if lexer else Lexer def __call__(self, graph): tokens = find_tokens(graph) if tokens: self._debug(fmt('Found {0}', [token.id_ for token in tokens])) return self.lexer(graph, tokens, self.alphabet, self.discard) else: self._info('Lexer rewriter used, but no tokens found.') return graph LEPL-5.1.3/src/lepl/lexer/matchers.py0000644000175000001440000003010211731117151017727 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Generate and match a stream of tokens that are identified by regular expressions. ''' # pylint currently cannot parse this file from abc import ABCMeta from lepl.stream.core import s_empty, s_line, s_next, s_len from lepl.lexer.support import LexerError from lepl.lexer.operators import TOKENS, TokenNamespace from lepl.lexer.stream import FilteredTokenHelper from lepl.matchers.core import OperatorMatcher, Any, Literal, Lookahead, Regexp from lepl.matchers.matcher import Matcher, add_children from lepl.matchers.memo import NoMemo from lepl.matchers.support import coerce_, trampoline_matcher_factory from lepl.core.parser import tagged from lepl.regexp.matchers import BaseRegexp from lepl.regexp.rewriters import CompileRegexp from lepl.regexp.unicode import UnicodeAlphabet from lepl.support.lib import fmt, str # pylint: disable-msg=W0105 # epydoc convention # pylint: disable-msg=C0103 # it's a class NonToken = ABCMeta('NonToken', (object, ), {}) ''' ABC used to identify matchers that actually consume from the stream. These are the "leaf" matchers that "do the real work" and they cannot be used at the same level as Tokens, but must be embedded inside them. This is a purely infmtive interface used, for example, to generate warnings for the user. Not implementing this interface will not block any functionality. ''' add_children(NonToken, Lookahead, Any, Literal, Regexp) # don't register Empty() here because it's useful as a token(!) # pylint: disable-msg=R0901, R0904, R0913, W0201, W0142, E1101 # lepl standards class BaseToken(OperatorMatcher, NoMemo): ''' Introduce a token that will be recognised by the lexer. A Token instance can be specialised to match particular contents by calling as a function. This is a base class that provides all the functionality, but doesn't set the regexp attribute. This allows subclasses to provide a fixed value, while `Token` uses the constructor. ''' __count = 0 def __init__(self, content=None, id_=None, alphabet=None, complete=True, compiled=False): ''' Define a token that will be generated by the lexer. content is the optional matcher that will be invoked on the value of the token. It is usually set via (), which clones this instance so that the same token can be used more than once. id_ is an optional unique identifier that will be given an integer value if left empty. alphabet is the alphabet associated with the regexp. It should be set by the lexer rewiter, so that all instances share the same value (it appears in the constructor so that Tokens can be cloned). complete indicates whether any sub-matcher must completely exhaust the contents when matching. It can be over-ridden for a particular sub-matcher via __call__(). compiled should only be used internally. It is a flag indicating that the Token has been processed by the rewriter (see below). A Token must be "compiled" --- this completes the configuration using a given alphabet and is done by the lexer_rewriter. Care is taken to allow a Token to be cloned before or after compilation. ''' super(BaseToken, self).__init__(name=TOKENS, namespace=TokenNamespace) self._karg(content=content) if id_ is None: id_ = 'Tk' + str(BaseToken.__count) BaseToken.__count += 1 self._karg(id_=id_) self._karg(alphabet=alphabet) self._karg(complete=complete) self._karg(compiled=compiled) def compile(self, alphabet=None): ''' Convert the regexp if necessary. ''' if alphabet is None: alphabet = UnicodeAlphabet.instance() # pylint: disable-msg=E0203 # set in constructor via _kargs if self.alphabet is None: self.alphabet = alphabet self.regexp = self.__to_regexp(self.regexp, self.alphabet) self.compiled = True @staticmethod def __to_regexp(regexp, alphabet): ''' The regexp may be a matcher; if so we try to convert it to a regular expression and extract the equivalent text. ''' if isinstance(regexp, Matcher): rewriter = CompileRegexp(alphabet) rewrite = rewriter(regexp) # one transformation is empty_adapter if isinstance(rewrite, BaseRegexp) and \ len(rewrite.wrapper.functions) <= 1: regexp = str(rewrite.regexp) else: raise LexerError( fmt('A Token was specified with a matcher, ' 'but the matcher could not be converted to ' 'a regular expression: {0}', rewrite)) return regexp def __call__(self, content, complete=None): ''' If complete is specified as True of False it overrides the value set in the constructor. If True the content matcher must complete match the Token contents. ''' args, kargs = self._constructor_args() kargs['complete'] = self.complete if complete is None else complete kargs['content'] = coerce_(content) return type(self)(*args, **kargs) @tagged def _match(self, stream): ''' On matching we first assert that the token type is correct and then delegate to the content. ''' if not self.compiled: raise LexerError( fmt('A {0} token has not been compiled. ' 'You must use the lexer rewriter with Tokens. ' 'This can be done by using matcher.config.lexer().', self.__class__.__name__)) ((tokens, line_stream), next_stream) = s_next(stream) if self.id_ in tokens: if self.content is None: # result contains all data (use s_next not s_line to set max) (line, _) = s_line(line_stream, True) (line, _) = s_next(line_stream, count=len(line)) yield ([line], next_stream) else: generator = self.content._match(line_stream) while True: (result, next_line_stream) = yield generator if s_empty(next_line_stream) or not self.complete: yield (result, next_stream) def __str__(self): return fmt('{0}: {1!s}', self.id_, self.regexp) def __repr__(self): return fmt('', self) @classmethod def reset_ids(cls): ''' Reset the ID counter. This should not be needed in normal use. ''' cls.__count = 0 class Token(BaseToken): ''' A token with a user-specified regexp. ''' def __init__(self, regexp, content=None, id_=None, alphabet=None, complete=True, compiled=False): ''' Define a token that will be generated by the lexer. regexp is the regular expression that the lexer will use to generate appropriate tokens. content is the optional matcher that will be invoked on the value of the token. It is usually set via (), which clones this instance so that the same token can be used more than once. id_ is an optional unique identifier that will be given an integer value if left empty. alphabet is the alphabet associated with the regexp. It should be set by the lexer rewiter, so that all instances share the same value (it appears in the constructor so that Tokens can be cloned). complete indicates whether any sub-matcher must completely exhaust the contents when matching. It can be over-ridden for a particular sub-matcher via __call__(). compiled should only be used internally. It is a flag indicating that the Token has been processed by the rewriter (see below). A Token must be "compiled" --- this completes the configuration using a given alphabet and is done by the lexer_rewriter. Care is taken to allow a Token to be cloned before or after compilation. ''' super(Token, self).__init__(content=content, id_=id_, alphabet=alphabet, complete=complete, compiled=compiled) self._karg(regexp=regexp) class EmptyToken(Token): ''' A token that cannot be specialised, and that returns nothing. ''' def __call__(self, *args, **kargs): raise TypeError('Empty token') @tagged def _match(self, stream): ''' On matching we first assert that the token type is correct and then delegate to the content. ''' if not self.compiled: raise LexerError( fmt('A {0} token has not been compiled. ' 'You must use the lexer rewriter with Tokens. ' 'This can be done by using matcher.config.lexer().', self.__class__.__name__)) ((tokens, _), next_stream) = s_next(stream) if self.id_ in tokens: yield ([], next_stream) def RestrictTokensBy(*tokens): ''' A matcher factory that generates a new matcher that will transform the stream passed to its arguments so that they do not see the given tokens. So, for example: MyFactory = RestrictTokensBy(A(), B()): RestrictedC = MyFactory(C()) will create a matcher, RestrictedC, that is like C, but which will not see the tokens matced by A and B. In other words, this filters tokens from the input. ''' @trampoline_matcher_factory() def factory(matcher, *tokens): ''' The factory that will be returned, with the tokens supplied above. ''' def match(support, in_stream): ''' The final matcher - delegates to `matcher` with a restricted stream of tokens. ''' ids = [token.id_ for token in tokens] (state, helper) = in_stream filtered = (state, FilteredTokenHelper(helper, *ids)) generator = matcher._match(filtered) while True: (result, (state, _)) = yield generator support._debug(fmt('Result {0}', result)) yield (result, (state, helper)) return match def pass_args(matcher): ''' Dirty trick to pass tokens in to factory. ''' return factory(matcher, *tokens) return pass_args LEPL-5.1.3/src/lepl/lexer/stream.py0000644000175000001440000001346711731117151017433 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Stream support for lexers. ''' from lepl.stream.iter import base_iterable_factory from lepl.stream.core import OFFSET, s_delta, s_line, HashKey, s_key, s_next from lepl.stream.facade import HelperFacade from lepl.support.lib import fmt, LogMixin class TokenHelper(base_iterable_factory(lambda cons: cons.head[1], '')): ''' This wraps a sequence of values generated by the lexer. The sequence is a source of (tokens, stream) instances, where the stream was generated from the source. It follows that the `value` returned by s_next is also (tokens, stream). This is interpreted by `Token` which forwards `stream` to sub-matchers. Implementation is vaguely similar to `IterableHelper`, in that we use a `Cons` based linked list to allow memory handling. However, instead of a "line" of data, each node contains, again, (tokens, stream) and there is no need to store the line_stream explicitly in the state. ''' def __init__(self, id=None, factory=None, max=None, global_kargs=None, cache_level=None, delta=None, len=None): super(TokenHelper, self).__init__(id=id, factory=factory, max=max, global_kargs=global_kargs, cache_level=cache_level, delta=delta) self._len = len def key(self, cons, other): try: (tokens, line_stream) = cons.head key = s_key(line_stream, other) except StopIteration: self._debug('Default hash (EOS)') tokens = '' key = HashKey(self.id, other) #self._debug(fmt('Hash at {0!r} {1}', tokens, hash(key))) return key def next(self, cons, count=1): assert count == 1 s_next(cons.head[1], count=0) # ping max return (cons.head, (cons.tail, self)) def line(self, cons, empty_ok): ''' This doesn't have much meaning in terms of tokens, but might be used for some debug output, so return something vaguely useful. ''' try: # implement in terms of next so that filtering works as expected ((_, line_stream), _) = self.next(cons) return s_line(line_stream, empty_ok) except StopIteration: if empty_ok: raise TypeError('Token stream cannot return an empty line') else: raise def len(self, cons): if self._len is None: self._error('len(tokens)') raise TypeError else: try: (_, line_stream) = cons.head return self._len - s_delta(line_stream)[OFFSET] except StopIteration: return 0 def stream(self, state, value, id_=None): raise TypeError class FilteredTokenHelper(LogMixin, HelperFacade): ''' Used by `RestrictTokensBy` to filter tokens from the delegate. This filters a list of token IDs in order. If the entire list does not match then then next token is returned (even if it appears in the list). ''' def __init__(self, delegate, *ids): super(FilteredTokenHelper, self).__init__(delegate) self._ids = ids self._debug(fmt('Filtering tokens {0}', ids)) def next(self, state, count=1): def add_self(response): ''' Replace the previous helper with this one, which will then delegate to the previous when needed. ''' ((tokens, token), (state, _)) = response self._debug(fmt('Return {0}', tokens)) return ((tokens, token), (state, self)) if count != 1: raise TypeError('Filtered tokens must be read singly') discard = list(reversed(self._ids)) start = state while discard: ((tokens, _), (state, _)) = \ super(FilteredTokenHelper, self).next(state) if discard[-1] in tokens: self._debug(fmt('Discarding token {0}', discard[-1])) discard.pop() else: self._debug(fmt('Failed to discard token {0}: {1}', discard[-1], tokens)) return add_self(super(FilteredTokenHelper, self).next(start)) return add_self(super(FilteredTokenHelper, self).next(state)) LEPL-5.1.3/src/lepl/lexer/support.py0000644000175000001440000000321511731117151017642 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Support classes for the lexer. ''' class LexerError(Exception): ''' Errors associated with the lexer ''' class RuntimeLexerError(LexerError): ''' Error raised for problems with lexing. ''' LEPL-5.1.3/src/lepl/lexer/_example/0000755000175000001440000000000011764776700017366 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/lexer/_example/__init__.py0000644000175000001440000000323711731117151021463 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Examples for the lepl.lexer package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.lexer._example.calculator import lepl.lexer._example.limitations LEPL-5.1.3/src/lepl/lexer/_example/calculator.py0000644000175000001440000001061711731117151022055 0ustar andrewusers00000000000000from lepl.matchers.derived import UnsignedReal # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' An example from the manual based on a test in this package (currently not used in docs because something similar is developed in the tutorial). ''' from math import sin, cos from operator import add, sub, truediv, mul from lepl import Node, Token, UnsignedReal, Delayed, Or, Eos from lepl._example.support import Example class Calculator(Example): ''' Show how tokens can help simplify parsing of an expression; also give a simple interpreter. ''' def test_calculation(self): ''' We could do evaluation directly in the parser actions. but by using the nodes instead we allow future expansion into a full interpreter. ''' # pylint: disable-msg=C0111, C0321 class BinaryExpression(Node): op = lambda x, y: None def __float__(self): return self.op(float(self[0]), float(self[1])) class Sum(BinaryExpression): op = add class Difference(BinaryExpression): op = sub class Product(BinaryExpression): op = mul class Ratio(BinaryExpression): op = truediv class Call(Node): funs = {'sin': sin, 'cos': cos} def __float__(self): return self.funs[self[0]](self[1]) # we use unsigned float then handle negative values explicitly; # this lets us handle the ambiguity between subtraction and # negation which requires context (not available to the the lexer) # to resolve correctly. number = Token(UnsignedReal()) name = Token('[a-z]+') symbol = Token('[^a-zA-Z0-9\\. ]') expr = Delayed() factor = Delayed() real_ = Or(number >> float, ~symbol('-') & number >> (lambda x: -float(x))) open_ = ~symbol('(') close = ~symbol(')') trig = name(Or('sin', 'cos')) call = trig & open_ & expr & close > Call parens = open_ & expr & close value = parens | call | real_ ratio = value & ~symbol('/') & factor > Ratio prod = value & ~symbol('*') & factor > Product factor += prod | ratio | value diff = factor & ~symbol('-') & expr > Difference sum_ = factor & ~symbol('+') & expr > Sum expr += sum_ | diff | factor | value line = expr & Eos() parser = line.get_parse() def calculate(text): return float(parser(text)[0]) self.examples([(lambda: calculate('1'), '1.0'), (lambda: calculate('1 + 2*3'), '7.0'), (lambda: calculate('-1 - 4 / (3 - 1)'), '-3.0'), (lambda: calculate('1 -4 / (3 -1)'), '-1.0'), (lambda: str(calculate('1 + 2*sin(3+ 4) - 5'))[:5], '-2.68')]) LEPL-5.1.3/src/lepl/lexer/_example/limitations.py0000644000175000001440000000527511731117151022264 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Illustrate some of the finer points of lexing. ''' #from logging import basicConfig, DEBUG from lepl import Token, Integer, Eos, Literal from lepl._example.support import Example class Limitations(Example): ''' These are used in the lexer part of the manual. ''' def test_ambiguity(self): ''' A (signed) integer will consume a - sign. ''' tokens = (Token(Integer()) | Token(r'\-'))[:] & Eos() self.examples([(lambda: list(tokens.parse_all('1-2')), "[['1', '-2']]")]) matchers = (Integer() | Literal('-'))[:] & Eos() self.examples([(lambda: list(matchers.parse_all('1-2')), "[['1', '-2'], ['1', '-', '2']]")]) def test_complete(self): ''' The complete flag indicates whether the entire token must be consumed. ''' #basicConfig(level=DEBUG) abc = Token('abc') incomplete = abc(Literal('ab')) incomplete.config.no_full_first_match() self.examples([(lambda: incomplete.parse('abc'), "None")]) abc = Token('abc') incomplete = abc(Literal('ab'), complete=False) incomplete.config.no_full_first_match() self.examples([(lambda: incomplete.parse('abc'), "['ab']")]) LEPL-5.1.3/src/lepl/lexer/operators.py0000644000175000001440000000605711731117151020153 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Operators for tokens. ''' from lepl.support.context import Namespace from lepl.matchers.operators import ADD, AND, OR, APPLY, APPLY_RAW, NOT, \ KARGS, RAISE, REPEAT, FIRST, MAP, RepeatWrapper, REDUCE from lepl.matchers.derived import Add, Apply, Drop, KApply, Map from lepl.matchers.combine import And, Or, First from lepl.matchers.error import raise_error TOKENS = 'tokens' ''' This names a per-thread storage area that contains the definitions of operators (so tokens can have different operators to non-tokens). All token matchers should reference this (either directly via `NamespaceMixin` or indirectly via `BaseToken`). ''' class TokenNamespace(Namespace): ''' A modified version of the usual ``OperatorNamespace`` without handling of spaces (since that is handled by the lexer), allowing Tokens and other matchers to be configured separately (because they process different types). At one point this also defined alphabet and discard, used by the rewriter, but because those are global values it makes more sense to supply them directly to the rewriter. ''' def __init__(self): super(TokenNamespace, self).__init__({ ADD: lambda a, b: Add(And(a, b)), AND: And, OR: Or, APPLY: Apply, APPLY_RAW: lambda a, b: Apply(a, b, raw=True), NOT: Drop, KARGS: KApply, RAISE: lambda a, b: KApply(a, raise_error(b)), REPEAT: RepeatWrapper, FIRST: First, MAP: Map, REDUCE: None, }) LEPL-5.1.3/src/lepl/lexer/lines/0000755000175000001440000000000011764776700016706 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/lexer/lines/_test/0000755000175000001440000000000011764776700020024 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/lexer/lines/_test/__init__.py0000644000175000001440000000365611731117151022126 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.lexer.lines package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.lexer.lines._test.closed_bug import lepl.lexer.lines._test.errors import lepl.lexer.lines._test.indentation import lepl.lexer.lines._test.left_bug import lepl.lexer.lines._test.offside import lepl.lexer.lines._test.pithon import lepl.lexer.lines._test.stream import lepl.lexer.lines._test.text import lepl.lexer.lines._test.word_bug LEPL-5.1.3/src/lepl/lexer/lines/_test/indentation.py0000644000175000001440000000636311731117151022701 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for indentation aware parsing. ''' #from logging import basicConfig, DEBUG from unittest import TestCase from lepl.lexer.matchers import Token from lepl.matchers.derived import Word, Letter from lepl.lexer.lines.matchers import NO_BLOCKS, LineStart from lepl.lexer.lines.matchers import LineEnd # pylint: disable-msg=R0201 # unittest convention class IndentTest(TestCase): ''' Test the `Indent` token. ''' def test_indent(self): ''' Test simple matches against leading spaces. ''' #basicConfig(level=DEBUG) text = ''' left four''' word = Token(Word(Letter())) indent = LineStart() line1 = indent('') + LineEnd() line2 = indent('') & word('left') + LineEnd() line3 = indent(' ') & word('four') + LineEnd() expr = (line1 & line2 & line3) expr.config.lines(block_start=NO_BLOCKS) parser = expr.get_parse_string() result = parser(text) assert result == ['', '', 'left', ' ', 'four'], result class TabTest(TestCase): ''' Check that tabs are expanded. ''' def test_indent(self): ''' Test simple matches against leading spaces. ''' #basicConfig(level=DEBUG) text = ''' onespace \tspaceandtab''' word = Token(Word(Letter())) indent = LineStart() line1 = indent('') & ~LineEnd() line2 = indent(' ') & word('onespace') & ~LineEnd() line3 = indent(' ') & word('spaceandtab') & ~LineEnd() expr = line1 & line2 & line3 expr.config.lines(tabsize=4, block_start=NO_BLOCKS).trace_stack(True) parser = expr.get_parse_string() result = parser(text) #print(result) assert result == ['', ' ', 'onespace', ' ', 'spaceandtab'], result LEPL-5.1.3/src/lepl/lexer/lines/_test/left_bug.py0000644000175000001440000000666011731117151022154 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' See http://groups.google.com/group/lepl/browse_thread/thread/79e39e03a03718cc?hl=en_US The different tree structures seen here seem to be related to the how the left-recursive memoisation fails. In the case without lexer the string is shorter, which causes failure earlier. I am not at all sure about this... ''' from unittest import TestCase from logging import basicConfig, DEBUG from lepl import * from lepl._test.base import assert_str class LeftBugTest(TestCase): def test_right_no_lexer(self): #basicConfig(level=DEBUG) word = Any() expr1 = Delayed() call = (expr1 & word) > List expr1 += (call | Empty() | word) program = expr1 & Eos() program.config.trace_stack().auto_memoize() parser = program.get_parse() print(parser.matcher.tree()) parsed = parser("abc") assert_str(parsed[0], """List +- List | +- List | | `- 'a' | `- 'b' `- 'c'""") def test_right(self): #basicConfig(level=DEBUG) #CLine = ContinuedBLineFactory(Token(r'\\')) word = Token("[A-Za-z_][A-Za-z0-9_]*") expr1 = Delayed() call = (expr1 & word) > List expr1 += (call | Empty() | word) program = Trace(expr1 & Eos()) program.config.trace_stack().auto_memoize() parser = program.get_parse() #print(parser.matcher.tree()) parsed = parser("a b c") assert_str(parsed[0], """List +- List | +- List | | `- 'a' | `- 'b' `- 'c'""") def test_left(self): CLine = ContinuedLineFactory(r'\\') expr0 = Token("[A-Za-z_][A-Za-z0-9_]*") expr1 = Delayed() call = (expr1 & expr0) > List # Deliberately not expr0 & expr1 expr1 += (call | Empty () | expr0) program = (CLine(expr1) & Eos()) program.config.lines(block_policy=explicit).auto_memoize() parsed = program.parse("a b c") assert_str(parsed[0], """List +- List | +- List | | `- 'a' | `- 'b' `- 'c'""") LEPL-5.1.3/src/lepl/lexer/lines/_test/errors.py0000644000175000001440000000651011731117151021673 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' A test related to error handling, based on a bug report. ''' from logging import basicConfig, DEBUG from sys import exc_info from unittest import TestCase from lepl import * class ErrorTest(TestCase): def make_parser(self): with TraceVariables(False): introduce = ~Token(':') hash = Token('#.*') InvalidName = Token('[0-9_][a-zA-Z0-9_]*') name = Or(Token('[a-zA-Z][a-zA-Z0-9_]*'), InvalidName ** make_error( 'InvalidName' )) typename = Or(Token(Literal('int')), Token(Literal('bool'))) memberdef = Line(typename & name) > tuple block = Delayed() line = (Line(name) | block) BlockStart = Or(Line(name & introduce), Line(name) ** make_error('BlockStart-OnlyName'), Line(introduce) ** make_error('BlockStart-OnlyColon')) block += BlockStart & (Block(memberdef) > list) program = (line[:] & Eos()) >> sexpr_throw program.config.lines(block_policy=explicit) return program.get_parse_string() def test_valid(self): ''' There was a bug here with sexpr_throw, which didn't iterate correctly over generators. ''' #basicConfig(level=DEBUG) p = self.make_parser() #print(p.matcher.tree()) result = p('foo:\n int i' ) assert result == ['foo', [('int', 'i')]], result def test_error(self): p = self.make_parser() try: result = p('foo\n int i') assert False, 'no error' except Error: error = exc_info()[1] assert str(error) == "BlockStart-OnlyName (, line 1)", str(error) LEPL-5.1.3/src/lepl/lexer/lines/_test/pithon.py0000644000175000001440000001560111731117151021661 0ustar andrewusers00000000000000from lepl.matchers.variables import TraceVariables # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Test a Python-like grammar. ''' # pylint: disable-msg=W0614, W0401, W0621, C0103, C0111, R0201, R0904 #@PydevCodeAnalysisIgnore from logging import basicConfig, DEBUG from unittest import TestCase from lepl import * class PithonTest(TestCase): @property def parser(self): word = Token(Word(Lower())) continuation = Token(r'\\') symbol = Token(Any('()')) introduce = ~Token(':') comma = ~Token(',') CLine = ContinuedLineFactory(continuation) statement = word[1:] args = Extend(word[:, comma]) > tuple function = word[1:] & ~symbol('(') & args & ~symbol(')') block = Delayed() blank = ~Line(Empty(), indent=False) comment = ~Line(Token('#.*'), indent=False) line = (CLine(statement) | block | blank | comment) > list block += CLine((function | statement) & introduce) & Block(line[1:]) program = (line[:] & Eos()) program.config.lines(block_policy=explicit).trace_stack(True) # program.config.clear().blocks(block_policy=explicit) return program.get_parse_string() def test_blocks(self): #basicConfig(level=DEBUG) program1 = ''' kopo fjire ifejfr ogptkr jgitr gtr ffjireofr(kfkorp, freopk): jifr fireo frefre jifoji jio frefre: jiforejifre fiorej jfore fire jioj jioj jiojio ''' result = self.parser(program1) assert result == [[], ['kopo', 'fjire', 'ifejfr'], ['ogptkr', 'jgitr', 'gtr'], ['ffjireofr', ('kfkorp', 'freopk'), ['jifr', 'fireo'], ['frefre', 'jifoji'], ['jio', 'frefre', ['jiforejifre', 'fiorej'], ['jfore', 'fire'], ['jioj']], ['jioj']], ['jiojio']], result def test_no_lexer(self): #basicConfig(level=DEBUG) try: self.parser('123') assert False, 'expected exception' except LexerError as error: assert str(error) == 'No token for \'123\' at ' \ 'line 1, character 1 of \'123\'.', str(error) def test_extend(self): #basicConfig(level=DEBUG) result = self.parser(''' def foo(abc, def, ghi): jgiog ''') assert result == [[], ['def', 'foo', ('abc', 'def', 'ghi'), ['jgiog']]], result def test_cline(self): #basicConfig(level=DEBUG) result = self.parser(''' this is a single \ line spread over \ many actual \ lines and this is another ''') assert result == [[], ['this', 'is', 'a', 'single', 'line', 'spread', 'over', 'many', 'actual', 'lines'], ['and', 'this', 'is', 'another']], result def test_blanks(self): #basicConfig(level=DEBUG) result = self.parser(''' def foo(): a blank line can be inside a block or can separate blocks ''') assert result == [[], ['def', 'foo', (), ['a', 'blank', 'line', 'can', 'be'], [], ['inside', 'a', 'block'], [] ], ['or', 'can', 'separate', 'blocks'] ], result def test_comments(self): #basicConfig(level=DEBUG) result = self.parser(''' # a comment here def foo(): # one here contents # wrong indentation! more content''') assert result == [[], [], ['def', 'foo', (), [], ['contents'], [], ['more', 'content']]], result def test_all(self): #basicConfig(level=DEBUG) result = self.parser(''' this is a grammar with a similar line structure to python # it supports comments if something: then we indent else: something else def function(a, b, c): we can nest blocks: like this and we can also \ have explicit continuations \ with \ any \ indentation same for (argument, lists): which do not need the continuation marker # and we can have blank lines inside a block: like this ''') assert result == \ [ [], ['this', 'is', 'a', 'grammar', 'with', 'a', 'similar'], ['line', 'structure', 'to', 'python'], [], [], ['if', 'something', ['then', 'we', 'indent']], ['else', ['something', 'else'], []], ['def', 'function', ('a', 'b', 'c'), ['we', 'can', 'nest', 'blocks', ['like', 'this']], ['and', 'we', 'can', 'also', 'have', 'explicit', 'continuations', 'with', 'any', 'indentation'], []], ['same', 'for', ('argument', 'lists'), ['which', 'do', 'not', 'need', 'the'], ['continuation', 'marker'], [], [], ['like', 'this']]], result LEPL-5.1.3/src/lepl/lexer/lines/_test/word_bug.py0000644000175000001440000000633711731117151022176 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests related to a bug when Word() was specified inside Token() with line-aware parsing. ''' from unittest import TestCase from lepl import * from lepl.regexp.str import make_str_parser class WordBugTest(TestCase): def test_simple(self): with DroppedSpace(): line = (Word()[:] & Drop('\n')) > list lines = line[:] result = lines.parse('abc de f\n pqr\n') assert result == [['abc', 'de', 'f'], ['pqr']], result def test_tokens(self): word = Token(Word()) newline = ~Token('\n') line = (word[:] & newline) > list lines = line[:] result = lines.parse('abc de f\n pqr\n') assert result == [['abc', 'de', 'f'], ['pqr']], result def test_line_any(self): word = Token('[a-z]+') line = Line(word[:]) > list lines = line[:] lines.config.lines() result = lines.parse('abc de f\n pqr\n') assert result == [['abc', 'de', 'f'], ['pqr']], result def test_line_word(self): word = Token(Word()) line = Line(word[:]) > list lines = line[:] lines.config.lines() result = lines.parse('abc de f\n pqr\n') assert result == [['abc', 'de', 'f'], ['pqr']], result def test_line_notnewline(self): word = Token('[^\n ]+') line = Line(word[:]) > list lines = line[:] lines.config.lines() result = lines.parse('abc de f\n pqr\n') assert result == [['abc', 'de', 'f'], ['pqr']], result def test_line_word_explicit(self): word = Token(Word()) line = (LineStart() & word[:] & LineEnd()) > list lines = line[:] lines.config.lines() result = lines.parse('abc de f\n pqr\n') assert result == [['abc', 'de', 'f'], ['pqr']], result LEPL-5.1.3/src/lepl/lexer/lines/_test/offside.py0000644000175000001440000001261511731117151022001 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for offside parsing. ''' from logging import basicConfig, DEBUG from unittest import TestCase from lepl.lexer.matchers import Token from lepl.matchers.combine import Or from lepl.matchers.core import Delayed from lepl.matchers.derived import Letter, Digit from lepl.matchers.monitor import Trace from lepl.lexer.lines.matchers import Block, Line, explicit, \ ContinuedLineFactory # pylint: disable-msg=R0201 # unittest convention class OffsideTest(TestCase): ''' Test lines and blocks. ''' def simple_grammar(self): ''' Test a simple example: letters introduce numbers in an indented block. ''' #basicConfig(level=DEBUG) number = Token(Digit()) letter = Token(Letter()) # the simplest whitespace grammar i can think of - lines are either # numbers (which are single, simple statements) or letters (which # mark the start of a new, indented block). block = Delayed() line = Or(Line(number), Line(letter) & block) > list # and a block is simply a collection of lines, as above block += Block(line[1:]) program = Trace(line[1:]) program.config.lines(block_policy=1) return program def test_single_line(self): program = self.simple_grammar() text = '''1''' parser = program.get_parse_string() result = parser(text) assert result == [['1']], result def test_two_lines(self): program = self.simple_grammar() text = '''1 2 ''' parser = program.get_parse_string() result = parser(text) assert result == [['1'], ['2']], result def test_single_block(self): #basicConfig(level=DEBUG) program = self.simple_grammar() text = '''a 3 ''' parser = program.get_parse_string() result = parser(text) assert result == [['a', ['3']]], result def test_bline(self): program = self.simple_grammar() text = '''1 2 a 3 b 4 5 6 ''' parser = program.get_parse_string() result = parser(text) assert result == [['1'], ['2'], ['a', ['3'], ['b', ['4'], ['5']], ['6']]], result def test_explicit(self): #basicConfig(level=DEBUG) number = Token(Digit()) letter = Token(Letter()) block = Delayed() line = Or(Line(number), Line(letter) & block) > list block += Block(line[1:]) program = Trace(line[1:]) text = '''1 2 a 3 b 4 5 6 ''' program.config.lines(block_policy=explicit) parser = program.get_parse_string() result = parser(text) assert result == [['1'], ['2'], ['a', ['3'], ['b', ['4'], ['5']], ['6']]], result def test_continued_explicit(self): number = Token(Digit()) letter = Token(Letter()) block = Delayed() bline = ContinuedLineFactory(r'x') line = Or(bline(number), bline(letter) & block) > list block += Block(line[1:]) program = Trace(line[1:]) text = '''1 2 a 3 b 4 5 6 ''' program.config.lines(block_policy=explicit) parser = program.get_parse_string() result = parser(text) assert result == [['1'], ['2'], ['a', ['3'], ['b', ['4'], ['5']], ['6']]], result LEPL-5.1.3/src/lepl/lexer/lines/_test/closed_bug.py0000644000175000001440000000602611731117151022467 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests related to a now-forgotten bug. ''' from logging import basicConfig, DEBUG from unittest import TestCase from lepl import * class ClosedBugTest(TestCase): def test_as_given(self): empty = ~Line(Empty(), indent=False) word = Token(Word()) comment = ~Line(Token('#.*'), indent=False) CLine = ContinuedLineFactory(Token(r'\\')) token = word[1:] block = Delayed() line = ((CLine(token) | block) > list) | empty | comment block += CLine((token)) & Block(line[:]) program = (line[:] & Eos()) program.config.lines(block_policy=explicit) self.run_test(program.get_parse(), False) def test_fixed(self): #basicConfig(level=DEBUG) empty = ~Line(Empty(), indent=False) word = Token(Word()) text = word[1:] block = Delayed() line = Line(text) | block | empty block += empty | (Block(line[1:]) > list) program = Trace(block[:] & Eos()) program.config.lines(block_policy=to_right, block_start=-1) self.run_test(program.get_parse(), True) def run_test(self, parser, ok): try: result = parser(""" a1 a2 b2 c2 b2 b2 c2 d2 e2 b2 a3 b3 a4 """) if ok: assert result == [['a1', 'a2', ['b2', ['c2'], 'b2', 'b2', ['c2', ['d2', ['e2']]], 'b2'], 'a3', ['b3'], 'a4']], result except: if ok: raise ok = True if not ok: assert False, 'expected error' LEPL-5.1.3/src/lepl/lexer/lines/_test/stream.py0000644000175000001440000000757711731117151021670 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.lexer.lines.stream module. ''' #from logging import basicConfig, DEBUG from unittest import TestCase from lepl.lexer.matchers import Token from lepl.matchers.core import Regexp, Literal, Any from lepl.lexer.lines.matchers import Line, LineStart, LineEnd, NO_BLOCKS class LineTest(TestCase): # no longer a bad config! def test_bad_config(self): #basicConfig(level=DEBUG) text = Token('[^\n\r]+') quoted = Regexp("'[^']'") line = Line(text(quoted)) line.config.lines() parser = line.get_parse_string() assert parser("'a'") == ["'a'"] def test_line(self): #basicConfig(level=DEBUG) text = Token('[^\n\r]+') quoted = Regexp("'[^']'") line = Line(text(quoted)) line.config.lines(block_start=0) parser = line.get_parse_string() assert parser("'a'") == ["'a'"] def test_offset(self): #basicConfig(level=DEBUG) text = Token('[^\n\r]+') line = Line(text(~Literal('aa') & Regexp('.*'))) line.config.lines(block_start=0) parser = line.get_parse_string() assert parser('aabc') == ['bc'] # what happens with an empty match? check = ~Literal('aa') & Regexp('.*') check.config.no_full_first_match() assert check.parse('aa') == [''] assert parser('aa') == [''] # def test_single_line(self): # #basicConfig(level=DEBUG) # line = DfaRegexp('(*SOL)[a-z]*(*EOL)') # line.config.lines() # parser = line.get_parse_string() # assert parser('abc') == ['abc'] def test_tabs(self): ''' Use block_policy here so that the regexp parser that excludes SOL and EOL is used; otherwise Any()[:] matches those and we end up with a single monster token. ''' line = LineStart() & Token(Any()) & LineEnd() line.config.lines(tabsize=8, block_start=NO_BLOCKS, block_policy=0).trace_stack(True) result = line.parse('a') assert result == ['', 'a'], result result = line.parse('\ta') assert result == [' ', 'a'], result line.config.lines(tabsize=None, block_start=NO_BLOCKS, block_policy=0) result = line.parse('\ta') assert result == ['\t', 'a'], result line.config.lines(block_policy=0, block_start=NO_BLOCKS) result = line.parse('\ta') assert result == [' ', 'a'], result LEPL-5.1.3/src/lepl/lexer/lines/_test/text.py0000644000175000001440000000764411731117151021354 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' An example that avoids using tokens with the line aware parsing (you'd have to be crazy to want to do this, but it's possible). ''' #from logging import basicConfig, DEBUG from unittest import TestCase # pylint: disable-msg=W0614 from lepl import * class TextTest(TestCase): def parser(self, regexp): ''' Construct a parser that uses "offside rule" parsing, but which avoids using tokens in the grammar. ''' # we still need one token, which matches "all the text" Text = Token(regexp) def TLine(contents): ''' A version of Line() that takes text-based matchers. ''' return Line(Text(contents)) # from here on we use TLine instead of BLine and don't worry about # tokens. # first define the basic grammar with Separator(~Space()[:]): name = Word() args = name[:, ','] fundef = 'def' & name & '(' & args & ')' & ':' # in reality we would have more expressions! expression = Literal('pass') # then define the block structure statement = Delayed() simple = TLine(expression) empty = TLine(Empty()) block = TLine(fundef) & Block(statement[:]) statement += (simple | empty | block) > list program = statement[:] program.config.lines(block_policy=2) return program.get_parse_string() def do_parse(self, parser): return parser('''pass def foo(): pass def bar(): pass ''') def test_plus(self): parser = self.parser('[^\n]+') result = self.do_parse(parser) assert result == [['pass'], ['def', 'foo', '(', ')', ':', ['pass'], ['def', 'bar', '(', ')', ':', ['pass']]]], result def test_star(self): ''' I have no idea why this fails, but this test was here before I forgot so I assume it is correct behaviour! (I think matching the empty string for a token is probably not a good idea) ''' #basicConfig(level=DEBUG) parser = self.parser('[^\n]*') try: self.do_parse(parser) assert False, 'Expected error' # error changed in Lepl 5 # except RuntimeLexerError: except FullFirstMatchException: pass LEPL-5.1.3/src/lepl/lexer/lines/__init__.py0000644000175000001440000000000011731117151020764 0ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/lexer/lines/matchers.py0000644000175000001440000002177411731117151021060 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. from lepl.lexer.matchers import Token, RestrictTokensBy, EmptyToken from lepl.lexer.lines.lexer import START, END from lepl.lexer.lines.monitor import BlockMonitor from lepl.matchers.support import coerce_, OperatorMatcher, NoMemo from lepl.core.parser import tagged from lepl.support.lib import fmt from lepl.matchers.combine import And from lepl.stream.core import s_key, s_next, s_line NO_BLOCKS = object() ''' Magic initial value for block_offset to disable indentation checks. ''' class LineStart(Token): def __init__(self, indent=True, regexp=None, content=None, id_=None, alphabet=None, complete=True, compiled=False): ''' Arguments used only to support cloning. ''' super(LineStart, self).__init__(regexp=None, content=None, id_=START, alphabet=None, complete=True, compiled=compiled) self._karg(indent=indent) self.monitor_class = BlockMonitor self._current_indent = NO_BLOCKS def on_push(self, monitor): ''' Read the global indentation level. ''' if self.indent: self._current_indent = monitor.indent def on_pop(self, monitor): ''' Unused ''' @tagged def _match(self, stream_in): ''' Check that we match the current level ''' try: generator = super(LineStart, self)._match(stream_in) while True: (indent, stream) = yield generator self._debug(fmt('SOL {0!r}', indent)) if indent and indent[0] and indent[0][-1] == '\n': indent[0] = indent[0][:-1] # if we're not doing indents, this is empty if not self.indent: yield ([], stream) # if we are doing indents, we need a match or NO_BLOCKS elif self._current_indent == NO_BLOCKS or \ len(indent[0]) == self._current_indent: yield (indent, stream) else: self._debug( fmt('Incorrect indent ({0:d} != len({1!r}), {2:d})', self._current_indent, indent[0], len(indent[0]))) except StopIteration: pass class LineEnd(EmptyToken): def __init__(self, regexp=None, content=None, id_=None, alphabet=None, complete=True, compiled=False): ''' Arguments used only to support cloning. ''' super(LineEnd, self).__init__(regexp=None, content=None, id_=END, alphabet=None, complete=True, compiled=compiled) def Line(matcher, indent=True): ''' Match the matcher within a line. ''' return ~LineStart(indent=indent) & matcher & ~LineEnd() def ContinuedLineFactory(matcher): ''' Create a replacement for ``Line()`` that can match multiple lines if they end in the given character/matcher. ''' matcher = coerce_(matcher, lambda regexp: Token(regexp)) restricted = RestrictTokensBy(matcher, LineEnd(), LineStart()) def factory(matcher, indent=True): return restricted(Line(matcher, indent=indent)) return factory def Extend(matcher): ''' Apply the given matcher to a token stream that ignores line endings and starts (so it matches over multiple lines). ''' start = LineStart() end = LineEnd() return RestrictTokensBy(end, start)(matcher) # pylint: disable-msg=W0105 # epydoc convention DEFAULT_TABSIZE = 8 ''' The default number of spaces for a tab. ''' def constant_indent(n_spaces): ''' Construct a simple policy for `Block` that increments the indent by some fixed number of spaces. ''' def policy(current, _indent): ''' Increment current by n_spaces ''' return current + n_spaces return policy def explicit(_current, indent): ''' Another simple policy that matches whatever indent is used. ''' return len(indent) def to_right(current, indent): ''' This allows new blocks to be used without any introduction (eg no colon on the preceding line). See the "closed_bug" test for more details. ''' new = len(indent) if new <= current: raise StopIteration return new DEFAULT_POLICY = constant_indent(DEFAULT_TABSIZE) ''' By default, expect an indent equivalent to a tab. ''' # pylint: disable-msg=E1101, W0212, R0901, R0904 # pylint conventions class Block(OperatorMatcher, NoMemo): ''' Set a new indent level for the enclosed matchers (typically `BLine` and `Block` instances). In the simplest case, this might increment the global indent by 4, say. In a more complex case it might look at the current token, expecting an `Indent`, and set the global indent at that amount if it is larger than the current value. A block will always match an `Indent`, but will not consume it (it will remain in the stream after the block has finished). The usual memoization of left recursive calls will not detect problems with nested blocks (because the indentation changes), so instead we track and block nested calls manually. ''' POLICY = 'policy' # Python 2.6 does not support this syntax # def __init__(self, *lines, policy=None, indent=None): def __init__(self, *lines, **kargs): ''' Lines are invoked in sequence (like `And()`). The policy is passed the current level and the indent and must return a new level. Typically it is set globally by rewriting with a default in the configuration. If it is given as an integer then `constant_indent` is used to create a policy from that. indent is the matcher used to match indents, and is exposed for rewriting/extension (in other words, ignore it). ''' super(Block, self).__init__() self._args(lines=lines) policy = kargs.get(self.POLICY, DEFAULT_POLICY) if isinstance(policy, int): policy = constant_indent(policy) self._karg(policy=policy) self.monitor_class = BlockMonitor self.__monitor = None self.__streams = set() def on_push(self, monitor): ''' Store a reference to the monitor which we will update inside _match ''' self.__monitor = monitor def on_pop(self, monitor): pass @tagged def _match(self, stream_in): ''' Pull indent and call the policy and update the global value, then evaluate the contents. ''' # detect a nested call key = s_key(stream_in) if key in self.__streams: self._debug('Avoided left recursive call to Block.') return self.__streams.add(key) try: ((tokens, token_stream), _) = s_next(stream_in) (indent, _) = s_line(token_stream, True) if START not in tokens: raise StopIteration current = self.__monitor.indent policy = self.policy(current, indent) generator = And(*self.lines)._match(stream_in) while True: self.__monitor.push_level(policy) try: results = yield generator finally: self.__monitor.pop_level() yield results finally: self.__streams.remove(key) LEPL-5.1.3/src/lepl/lexer/lines/monitor.py0000644000175000001440000000571211731117151020733 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Support the stack-scoped tracking of indent level blocks. ''' from lepl.core.monitor import ActiveMonitor from lepl.support.state import State from lepl.support.lib import LogMixin, fmt class BlockMonitor(ActiveMonitor, LogMixin): ''' This tracks the current indent level (in number of spaces). It is read by `Line` and updated by `Block`. ''' def __init__(self, start=0): ''' start is the initial indent (in spaces). ''' super(BlockMonitor, self).__init__() self.__stack = [start] self.__state = State.singleton() def push_level(self, level): ''' Add a new indent level. ''' self.__stack.append(level) self.__state[BlockMonitor] = level self._debug(fmt('Indent -> {0:d}', level)) def pop_level(self): ''' Drop one level. ''' self.__stack.pop() if not self.__stack: raise OffsideError('Closed an unopened indent.') self.__state[BlockMonitor] = self.indent self._debug(fmt('Indent <- {0:d}', self.indent)) @property def indent(self): ''' The current indent value (number of spaces). ''' return self.__stack[-1] def block_monitor(start=0): ''' Add an extra lambda for the standard monitor interface. ''' return lambda: BlockMonitor(start) class OffsideError(Exception): ''' The exception raised by problems when parsing whitespace significant code. ''' LEPL-5.1.3/src/lepl/lexer/lines/_example/0000755000175000001440000000000011764776700020500 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/lexer/lines/_example/__init__.py0000644000175000001440000000326211731117151022573 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Examples for the lepl.lexer.lines package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.lexer.lines._example.line_aware import lepl.lexer.lines._example.old_examples LEPL-5.1.3/src/lepl/lexer/lines/_example/line_aware.py0000644000175000001440000000723011731117151023141 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Show how line aware parsing can be used. ''' from unittest import TestCase from logging import basicConfig, DEBUG from lepl import * from lepl._example.support import Example class LineAwareTest(TestCase): def test_explicit_start_end(self): contents = Token(Any()[:,...]) > list line = LineStart() & contents & LineEnd() lines = line[:] lines.config.lines() result = lines.parse('line one\nline two\nline three') assert result == [['line one\n'], ['line two\n'], ['line three']], result def test_line(self): contents = Token(Any()[:,...]) > list line = Line(contents) lines = line[:] lines.config.lines() result = lines.parse('line one\nline two\nline three') assert result == [['line one\n'], ['line two\n'], ['line three']], result def test_continued_line(self): contents = Token('[a-z]+')[:] > list CLine = ContinuedLineFactory(r'\\') line = CLine(contents) lines = line[:] lines.config.lines() result = lines.parse('line one \\\nline two\nline three') assert result == [['line', 'one', 'line', 'two'], ['line', 'three']], result def test_extend(self): #basicConfig(level=DEBUG) contents = Token('[a-z]+')[:] > list parens = Token('\(') & contents & Token('\)') > list line = Line(contents & Optional(Extend(parens))) lines = line[:] lines.config.lines() result = lines.parse('line one (this\n extends to line two)\nline three') assert result == [['line', 'one'], ['(', ['this', 'extends', 'to', 'line', 'two'], ')'], ['line', 'three']], result def test_extend_deepest(self): ''' This returned None. ''' #basicConfig(level=DEBUG) contents = Token('[a-z]+')[:] > list parens = Token('\(') & contents & Token('\)') > list line = Line(contents & Optional(Extend(parens))) lines = line[:] lines.config.lines().record_deepest() result = lines.parse('line one (this\n extends to line two)\nline three') assert result == [['line', 'one'], ['(', ['this', 'extends', 'to', 'line', 'two'], ')'], ['line', 'three']], result LEPL-5.1.3/src/lepl/lexer/lines/_example/old_examples.py0000644000175000001440000000672511731117151023517 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Show how line aware parsing can be used. ''' #@PydevCodeAnalysisIgnore from logging import basicConfig, DEBUG from lepl import * from lepl._example.support import Example class LineAwareExamples(Example): def test_single_line(self): #basicConfig(level=DEBUG) start = LineStart() words = Token(Word())[:] end = LineEnd() line = start & words & end line.config.lines().no_full_first_match() self.examples([ (lambda: line.parse(' abc def'), "['abc', 'def']"), (lambda: line.parse(' abc def\n pqr'), "['abc', 'def']") ]) def test_multiple_lines(self): #basicConfig(level=DEBUG) start = LineStart() words = Token(Word())[:] > list end = LineEnd() line = start & words & end lines = line[:] lines.config.lines().no_full_first_match() self.examples([ (lambda: lines.parse(' abc def'), "[['abc', 'def']]"), (lambda: lines.parse(' abc def\n pqr'), "[['abc', 'def'], ['pqr']]") ]) # def test_indent_token(self): # #basicConfig(level=DEBUG) # words = Token(Word(Lower()))[:] > list # line = Indent() & words & LineAwareEol() # line.config.default_line_aware(tabsize=4) # parser = line.get_parse_string() # self.examples([(lambda: parser('\tabc def'), # "[' ', ['abc', 'def'], '']")]) def test_line_token(self): #basicConfig(level=DEBUG) words = Token(Word(Lower()))[:] > list line = Line(words) line.config.lines() parser = line.get_parse_string() self.examples([(lambda: parser('\tabc def'), "[['abc', 'def']]")]) def test_continued(self): #basicConfig(level=DEBUG) words = Token(Word(Lower()))[:] > list CLine = ContinuedLineFactory(r'\+') line = CLine(words) line.config.lines() parser = line.get_parse_string() self.examples([(lambda: parser('''abc def + ghi'''), "[['abc', 'def', 'ghi']]")]) LEPL-5.1.3/src/lepl/lexer/lines/lexer.py0000644000175000001440000001221511731117151020357 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' A lexer that adds line start and end tokens. The start may also contain leading spaces, depending on the configuration. ''' from lepl.lexer.lexer import Lexer from lepl.stream.core import s_empty, s_line, s_stream, s_fmt, s_next, s_id from lepl.lexer.support import RuntimeLexerError START = 'SOL' ''' Name for start of line token. ''' END = 'EOL' ''' Name for end of line token. ''' def make_offside_lexer(tabsize, blocks): ''' Provide the standard `Lexer` interface while including `tabsize`. ''' def wrapper(matcher, tokens, alphabet, discard, t_regexp=None, s_regexp=None): ''' Return the lexer with tabsize and blocks as specified earlier. ''' return _OffsideLexer(matcher, tokens, alphabet, discard, t_regexp=t_regexp, s_regexp=s_regexp, tabsize=tabsize, blocks=blocks) return wrapper class _OffsideLexer(Lexer): ''' An alternative lexer that adds `LineStart` and `LineEnd` tokens. Note that because of the extend argument list this must be used in the config via `make_offside_lexer()` (although in normal use it is supplied by simply calling `config.lines()` so you don't need to refer to this class at all) ''' def __init__(self, matcher, tokens, alphabet, discard, t_regexp=None, s_regexp=None, tabsize=8, blocks=False): super(_OffsideLexer, self).__init__(matcher, tokens, alphabet, discard, t_regexp=t_regexp, s_regexp=s_regexp) self._karg(tabsize=tabsize) self._karg(blocks=blocks) if tabsize is not None: self._tab = ' ' * tabsize else: self._tab = '\t' def _tokens(self, stream, max): ''' Generate tokens, on demand. ''' id_ = s_id(stream) try: while not s_empty(stream): # caches for different tokens with same contents differ id_ += 1 (line, next_stream) = s_line(stream, False) line_stream = s_stream(stream, line) size = 0 # if we use blocks, match leading space if self.blocks: try: (_, size, _) = self.s_regexp.size_match(line_stream) except TypeError: pass # this will be empty (size=0) if blocks unused (indent, next_line_stream) = s_next(line_stream, count=size) indent = indent.replace('\t', self._tab) yield ((START,), s_stream(line_stream, indent, id_=id_, max=max)) line_stream = next_line_stream while not s_empty(line_stream): id_ += 1 try: (terminals, match, next_line_stream) = \ self.t_regexp.match(line_stream) yield (terminals, s_stream(line_stream, match, max=max, id_=id_)) except TypeError: (terminals, _size, next_line_stream) = \ self.s_regexp.size_match(line_stream) line_stream = next_line_stream id_ += 1 yield ((END,), s_stream(line_stream, '', max=max, id_=id_)) stream = next_stream except TypeError: raise RuntimeLexerError( s_fmt(stream, 'No token for {rest} at {location} of {text}.')) LEPL-5.1.3/src/lepl/lexer/lexer.py0000644000175000001440000001434111731117215017250 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. from lepl.support.lib import fmt from lepl.support.context import NamespaceMixin from lepl.matchers.support import BaseMatcher from lepl.lexer.operators import TOKENS, TokenNamespace from lepl.core.parser import tagged from lepl.stream.core import s_empty, s_debug, s_stream, s_fmt, s_factory, \ s_max, s_new_max, s_id, s_global_kargs, s_delta, s_len, \ s_cache_level from lepl.lexer.support import RuntimeLexerError from lepl.regexp.core import Compiler # pylint can't detect _kargs etc # pylint: disable-msg=E1101 class Lexer(NamespaceMixin, BaseMatcher): ''' This takes a set of regular expressions and provides a matcher that converts a stream into a stream of tokens, passing the new stream to the embedded matcher. It is added to the matcher graph by the lexer_rewriter; it is not specified explicitly by the user. ''' def __init__(self, matcher, tokens, alphabet, discard, t_regexp=None, s_regexp=None): ''' matcher is the head of the original matcher graph, which will be called with a tokenised stream. tokens is the set of `Token` instances that define the lexer. alphabet is the alphabet for which the regexps are defined. discard is the regular expression for spaces (which are silently dropped if not token can be matcher). t_regexp and s_regexp are internally compiled state, use in cloning, and should not be provided by non-cloning callers. ''' super(Lexer, self).__init__(TOKENS, TokenNamespace) if t_regexp is None: unique = {} for token in tokens: token.compile(alphabet) self._debug(fmt('Token: {0}', token)) # this just reduces the work for the regexp compiler unique[token.id_] = token t_regexp = Compiler.multiple(alphabet, [(t.id_, t.regexp) for t in unique.values() if t.regexp is not None]).dfa() if s_regexp is None and discard is not None: s_regexp = Compiler.single(alphabet, discard).dfa() self._arg(matcher=matcher) self._arg(tokens=tokens) self._arg(alphabet=alphabet) self._arg(discard=discard) self._karg(t_regexp=t_regexp) self._karg(s_regexp=s_regexp) def token_for_id(self, id_): ''' A utility that checks the known tokens for a given ID. The ID is used internally, but is (by default) an unfriendly integer value. Note that a lexed stream associates a chunk of input with a list of IDs - more than one regexp may be a maximal match (and this is a feature, not a bug). ''' for token in self.tokens: if token.id_ == id_: return token def _tokens(self, stream, max): ''' Generate tokens, on demand. ''' try: id_ = s_id(stream) while not s_empty(stream): # avoid conflicts between tokens id_ += 1 try: (terminals, match, next_stream) = \ self.t_regexp.match(stream) self._debug(fmt('Token: {0!r} {1!r} {2!s}', terminals, match, s_debug(stream))) yield (terminals, s_stream(stream, match, max=max, id_=id_)) except TypeError: (terminals, _size, next_stream) = \ self.s_regexp.size_match(stream) self._debug(fmt('Space: {0!r} {1!s}', terminals, s_debug(stream))) stream = next_stream except TypeError: raise RuntimeLexerError( s_fmt(stream, 'No token for {rest} at {location} of {text}.')) @tagged def _match(self, in_stream): ''' Implement matching - pass token stream to tokens. ''' (max, clean_stream) = s_new_max(in_stream) try: length = s_len(in_stream) except TypeError: length = None factory = s_factory(in_stream) token_stream = factory.to_token( self._tokens(clean_stream, max), id=s_id(in_stream), factory=factory, max=s_max(in_stream), global_kargs=s_global_kargs(in_stream), delta=s_delta(in_stream), len=length, cache_level=s_cache_level(in_stream)+1) in_stream = None generator = self.matcher._match(token_stream) while True: yield (yield generator) LEPL-5.1.3/src/lepl/__init__.py0000644000175000001440000002440411764776556016603 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. #@PydevCodeAnalysisIgnore # pylint: disable-msg=C0301, E0611, W0401 # confused by __init__? ''' Lepl is a parser library written in Python. This is the API documentation; the module index is at the bottom of this page. There is also a `manual <../index.html>`_ which gives a higher level overview. The home page for this package is the `Lepl website `_. Example ------- A simple example of how to use Lepl:: from lepl import * # For a simpler result these could be replaced with 'list', giving # an AST as a set of nested lists # (ie replace '> Term' etc with '> list' below). class Term(List): pass class Factor(List): pass class Expression(List): pass def build(): # Here we define the grammar # A delayed value is defined later (see penultimate line in block) expr = Delayed() number = Digit()[1:,...] >> int # Allow spaces between items with DroppedSpace(): term = number | '(' & expr & ')' > Term muldiv = Any('*/') factor = term & (muldiv & term)[:] > Factor addsub = Any('+-') expr += factor & (addsub & factor)[:] > Expression line = Trace(expr) & Eos() return line.get_parse() if __name__ == '__main__': parser = build() # parser returns a list of tokens, but line # returns a single value, so take the first entry print(parser('1 + 2 * (3 + 4 - 5)')[0]) Running this gives the result:: Expression +- Factor | `- Term | `- 1 +- '+' `- Factor +- Term | `- 2 +- '*' `- Term +- '(' +- Expression | +- Factor | | `- Term | | `- 3 | +- '+' | +- Factor | | `- Term | | `- 4 | +- '-' | `- Factor | `- Term | `- 5 `- ')' ''' from lepl.contrib.matchers import SmartSeparator2 from lepl.core.config import Configuration, ConfigBuilder from lepl.core.manager import GeneratorManager from lepl.core.trace import RecordDeepest, TraceStack from lepl.matchers.combine import And, Or, First, Difference, Limit from lepl.matchers.core import Empty, Any, Delayed, Literal, Empty, \ Lookahead, Regexp from lepl.matchers.complex import PostMatch, Columns, Iterate from lepl.matchers.monitor import Trace from lepl.matchers.derived import Apply, args, KApply, Join, \ AnyBut, Optional, Star, ZeroOrMore, Map, Add, Drop, Repeat, Plus, \ OneOrMore, Substitute, Name, Eof, Eos, Identity, Newline, Space, \ Whitespace, Digit, Letter, Upper, Lower, Printable, Punctuation, \ UnsignedInteger, SignedInteger, Integer, UnsignedFloat, SignedFloat, \ UnsignedEFloat, SignedEFloat, Float, UnsignedReal, SignedReal, \ UnsignedEReal, SignedEReal, Real, Word, DropEmpty, Literals, \ String, SingleLineString, SkipString, SkipTo from lepl.matchers.error import Error, make_error, raise_error from lepl.matchers.memo import RMemo, LMemo, MemoException from lepl.matchers.operators import Override, Separator, SmartSeparator1, \ GREEDY, NON_GREEDY, DEPTH_FIRST, BREADTH_FIRST, DroppedSpace, REDUCE from lepl.matchers.support import function_matcher, function_matcher_factory, \ sequence_matcher, sequence_matcher_factory, \ trampoline_matcher, trampoline_matcher_factory from lepl.matchers.transform import PostCondition, Transform, Assert from lepl.matchers.variables import TraceVariables from lepl.lexer.matchers import Token from lepl.lexer.support import LexerError, RuntimeLexerError from lepl.lexer.lines.matchers import Block, Line, LineStart, LineEnd, \ constant_indent, explicit, to_right, ContinuedLineFactory, Extend, \ NO_BLOCKS, DEFAULT_POLICY from lepl.regexp.core import RegexpError from lepl.regexp.matchers import NfaRegexp, DfaRegexp from lepl.regexp.unicode import UnicodeAlphabet from lepl.stream.core import s_debug, s_deepest, s_delta, s_empty, s_eq, \ s_factory, s_fmt, s_global_kargs, s_id, s_join, s_kargs, s_key, \ s_len, s_line, s_max, s_next, s_stream from lepl.stream.maxdepth import FullFirstMatchException from lepl.stream.factory import DEFAULT_STREAM_FACTORY from lepl.support.list import List, sexpr_fold, sexpr_throw, sexpr_flatten, \ sexpr_to_tree from lepl.support.node import Node, make_dict, join_with, node_throw from lepl.support.timer import print_timing __all__ = [ # lepl.core.config 'Configuration', 'ConfigBuilder', # lepl.contrib.matchers 'SmartSeparator2', # lepl.matchers.error 'make_error', 'raise_error', 'Error', # lepl.matchers.core 'Empty', 'Repeat', 'Join', 'Any', 'Literal', 'Empty', 'Lookahead', 'Regexp', # lepl.matchers.combine 'And', 'Or', 'First', 'Difference', 'Limit', # lepl.matchers.derived 'Apply', 'args', 'KApply', 'Delayed', 'Trace', 'AnyBut', 'Optional', 'Star', 'ZeroOrMore', 'Plus', 'OneOrMore', 'Map', 'Add', 'Drop', 'Substitute', 'Name', 'Eof', 'Eos', 'Identity', 'Newline', 'Space', 'Whitespace', 'Digit', 'Letter', 'Upper', 'Lower', 'Printable', 'Punctuation', 'UnsignedInteger', 'SignedInteger', 'Integer', # float matchers exclude integers 'UnsignedFloat', 'SignedFloat', 'UnsignedEFloat', 'SignedEFloat', 'Float', # real matchers match both floats and integers 'UnsignedReal', 'SignedReal', 'UnsignedEReal', 'SignedEReal', 'Real', 'Word', 'DropEmpty', 'Literals', 'String', 'SingleLineString', 'SkipString', 'SkipTo', 'GREEDY', 'NON_GREEDY', 'DEPTH_FIRST', 'BREADTH_FIRST', 'REDUCE', # lepl.matchers.complex 'PostMatch', 'Columns', 'Iterate', # lepl.matchers.support 'function_matcher', 'function_matcher_factory', 'sequence_matcher', 'sequence_matcher_factory', 'trampoline_matcher', 'trampoline_matcher_factory', # lepl.matchers.transform 'PostCondition', 'Transform', 'Assert', # lepl.matchers.variables 'TraceVariables', # lepl.stream.stream 'DEFAULT_STREAM_FACTORY', # lepl.matchers.operators 'Override', 'Separator', 'SmartSeparator1', 'DroppedSpace', # lepl.support.node 'Node', 'make_dict', 'join_with', 'node_throw', # lepl.support.list 'List', 'sexpr_fold', 'sexpr_throw', 'sexpr_flatten', 'sexpr_to_tree', # lepl.lexer.matchers 'Token', 'LexerError', 'RuntimeLexerError', # lepl.core.manager 'GeneratorManager', # lepl.core.trace 'RecordDeepest', 'TraceStack', # lepl.core.memo, 'RMemo', 'LMemo', 'MemoException', # lepl.regexp.core 'RegexpError', # lepl.regexp.matchers 'NfaRegexp', 'DfaRegexp', # lepl.regexp.unicode 'UnicodeAlphabet', # lepl.stream.core 's_debug', 's_deepest', 's_delta', 's_empty', 's_eq', 's_factory', 's_fmt', 's_global_kargs', 's_id', 's_join', 's_kargs', 's_key', 's_len', 's_line', 's_max', 's_next', 's_stream', # lepl.stream.maxdepth 'FullFirstMatchException', # lepl.lexer.lines.matchers 'LineStart', 'LineEnd', 'Line', 'ContinuedLineFactory', 'Extend', 'Block', 'NO_BLOCKS', 'DEFAULT_POLICY', 'explicit', 'constant_indent', 'to_right', # lepl.support.timer 'print_timing' ] __version__ = '5.1.3' if __version__.find('b') > -1: from logging import getLogger, basicConfig, WARN #basicConfig(level=WARN) getLogger('lepl').warn('You are using a BETA version of LEPL.') LEPL-5.1.3/src/lepl/apps/0000755000175000001440000000000011764776700015420 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/apps/_test/0000755000175000001440000000000011764776700016536 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/apps/_test/__init__.py0000644000175000001440000000321111731117151020623 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.apps package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.apps._test.json import lepl.apps._test.rfc3696 LEPL-5.1.3/src/lepl/apps/_test/json.py0000644000175000001440000000202411731117151020036 0ustar andrewusers00000000000000 from lepl.apps.json import Simple from lepl._test.base import BaseTest class JsonTest(BaseTest): def test_dict(self): self.assert_direct('{"a": 123, "b": "somewhere"}', Simple(), [[{'a': 123.0, 'b': 'somewhere'}]]) def test_escape(self): self.assert_direct('"a\\u0020b"', Simple(), [['a b']]) self.assert_direct('"a\\nb"', Simple(), [['a\nb']]) def test_array(self): self.assert_direct('[1,2,[3,4],[[5], 6]]', Simple(), [[[1.0,2.0,[3.0,4.0],[[5.0],6.0]]]]) def test_object(self): self.assert_direct('{"a": 1}', Simple(), [[{"a": 1.0}]]) self.assert_direct('{"a": 1, "b": [2,3]}', Simple(), [[{"a": 1.0, "b": [2.0, 3.0]}]]) def test_spaces(self): self.assert_direct('{"a": 1, "b":"c","d" : [ 2, 3.]}', Simple(), [[{'a': 1.0, 'b': 'c', 'd': [2.0, 3.0]}]]) LEPL-5.1.3/src/lepl/apps/_test/rfc3696.py0000644000175000001440000002737111731117151020203 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.apps.rfc3696 module. ''' from logging import basicConfig, DEBUG from lepl import * from lepl._test.base import BaseTest from lepl.apps.rfc3696 import _PreferredFullyQualifiedDnsName, _EmailLocalPart,\ _Email, _HttpUrl, MailToUrl, HttpUrl, Email, _IpV4Address, _Ipv6Address class DnsNameTest(BaseTest): def test_dns_name_matcher(self): name = _PreferredFullyQualifiedDnsName() & Eos() self.assert_fail('', name) self.assert_fail('a', name) self.assert_fail('12.34', name) self.assert_fail('a.b.', name) self.assert_fail(' a.b', name) self.assert_fail('a.b ', name) self.assert_fail('a._.', name) self.assert_fail('a.-b.c', name) self.assert_fail('a.b-.c', name) self.assert_fail('a.b.c.123', name) self.assert_literal('a.b.123.c', name) self.assert_literal('a.b-c.d', name) self.assert_literal('a.b--c.d', name) self.assert_literal('acooke.org', name) self.assert_literal('EXAMPLE.COM', name) self.assert_literal('example.a23', name) self.assert_literal('example.12c', name) class IpV4AddressTest(BaseTest): def test_ipv4_address(self): address = _IpV4Address() & Eos() self.assert_literal('1.2.3.4', address) self.assert_literal('255.255.255.255', address) self.assert_literal('0.0.0.0', address) self.assert_fail('1.2.3', address) self.assert_fail('1.2.3.', address) self.assert_fail('1.256.3.4', address) self.assert_fail('1.a.3.4', address) self.assert_fail('1.-1.3.4', address) class IpV6AddressTest(BaseTest): def test_ipv6_address(self): address = _Ipv6Address() & Eos() self.assert_literal('FEDC:BA98:7654:3210:FEDC:BA98:7654:3210', address) self.assert_literal('1080:0:0:0:8:800:200C:417A', address) self.assert_literal('FF01:0:0:0:0:0:0:101', address) self.assert_literal('0:0:0:0:0:0:0:1', address) self.assert_literal('0:0:0:0:0:0:0:0', address) self.assert_literal('1080::8:800:200C:417A', address) self.assert_literal('FF01::101', address) self.assert_literal('::1', address) self.assert_literal('::', address) self.assert_literal('0:0:0:0:0:0:13.1.68.3', address) self.assert_literal('0:0:0:0:0:FFFF:129.144.52.38', address) self.assert_literal('::13.1.68.3', address) self.assert_literal('::FFFF:129.144.52.38', address) self.assert_fail('1:2:3:4:5:6:7', address) self.assert_fail('1:2:3:4:5:6:7:8:9', address) self.assert_fail('::1:2:3:4:5:6:7:8', address) self.assert_fail(':1::2:3:4:5:6:7:8', address) self.assert_fail(':1:2:3:4:5:6:7:8::', address) self.assert_fail('1:2:3:4:5:1.2.3.4', address) self.assert_fail('1:2:3:4:5.6.7:1.2.3.4', address) self.assert_fail('::1:2:3:4:5:6:1.2.3.4', address) self.assert_fail('1::2:3:4:5:6:1.2.3.4', address) self.assert_fail('1:2:3:4:5:6::1.2.3.4', address) class _EmailLocalPartTest(BaseTest): def test_email_local_part_matcher(self): local = _EmailLocalPart() & Eos() self.assert_fail('', local) self.assert_fail('""', local) self.assert_fail('"unmatched', local) self.assert_fail('unmatched"', local) self.assert_fail(' ', local) self.assert_fail('a b', local) self.assert_literal(r'andrew', local) self.assert_literal(r'Abc\@def', local) self.assert_literal(r'Fred\ Bloggs', local) self.assert_literal(r'Joe.\\Blow', local) self.assert_literal(r'"Abc@def"', local) self.assert_literal(r'"Fred Bloggs"', local) self.assert_literal(r'user+mailbox', local) self.assert_literal(r'customer/department=shipping', local) self.assert_literal(r'$A12345', local) self.assert_literal(r'!def!xyz%abc', local) self.assert_literal(r'_somename', local) class _EmailTest(BaseTest): def test_email_matcher(self): email = _Email() & Eos() self.assert_literal(r'andrew@acooke.org', email) self.assert_literal(r'Abc\@def@example.com', email) self.assert_literal(r'Fred\ Bloggs@example.com', email) self.assert_literal(r'Joe.\\Blow@example.com', email) self.assert_literal(r'"Abc@def"@example.com', email) self.assert_literal(r'"Fred Bloggs"@example.com', email) self.assert_literal(r'user+mailbox@example.com', email) self.assert_literal(r'customer/department=shipping@example.com', email) self.assert_literal(r'$A12345@example.com', email) self.assert_literal(r'!def!xyz%abc@example.com', email) self.assert_literal(r'_somename@example.com', email) def test_email(self): email = Email() assert email(r'andrew@acooke.org',) assert email(r'Abc\@def@example.com',) assert email(r'Fred\ Bloggs@example.com',) assert email(r'Joe.\\Blow@example.com',) assert email(r'"Abc@def"@example.com',) assert email(r'"Fred Bloggs"@example.com',) assert email(r'user+mailbox@example.com',) assert email(r'customer/department=shipping@example.com',) assert email(r'$A12345@example.com',) assert email(r'!def!xyz%abc@example.com',) assert email(r'_somename@example.com',) addresses = ['', 'a', '12.34', 'a.b.', ' a.b', 'a.b ', 'a._.', 'a.-b.c', 'a.b-.c', 'a.b.c.123'] names = ['', '""', '"unmatched', 'unmatched"', ' ', 'a b'] for name in names: for address in addresses: bad = name + '@' + address assert not email(bad), bad class HttpUrlTest(BaseTest): def test_http_matcher(self): #basicConfig(level=DEBUG) http = _HttpUrl() & Eos() http.config.compile_to_re() #print(http.get_parse().matcher.tree()) self.assert_literal(r'http://www.acooke.org', http) self.assert_literal(r'http://www.acooke.org/', http) self.assert_literal(r'http://www.acooke.org:80', http) self.assert_literal(r'http://www.acooke.org:80/', http) self.assert_literal(r'http://www.acooke.org/andrew', http) self.assert_literal(r'http://www.acooke.org:80/andrew', http) self.assert_literal(r'http://www.acooke.org/andrew/', http) self.assert_literal(r'http://www.acooke.org:80/andrew/', http) self.assert_literal(r'http://www.acooke.org/?foo', http) self.assert_literal(r'http://www.acooke.org:80/?foo', http) self.assert_literal(r'http://www.acooke.org/#bar', http) self.assert_literal(r'http://www.acooke.org:80/#bar', http) self.assert_literal(r'http://www.acooke.org/andrew?foo', http) self.assert_literal(r'http://www.acooke.org:80/andrew?foo', http) self.assert_literal(r'http://www.acooke.org/andrew/?foo', http) self.assert_literal(r'http://www.acooke.org:80/andrew/?foo', http) self.assert_literal(r'http://www.acooke.org/andrew#bar', http) self.assert_literal(r'http://www.acooke.org:80/andrew#bar', http) self.assert_literal(r'http://www.acooke.org/andrew/#bar', http) self.assert_literal(r'http://www.acooke.org:80/andrew/#bar', http) self.assert_literal(r'http://www.acooke.org/andrew?foo#bar', http) self.assert_literal(r'http://www.acooke.org:80/andrew?foo#bar', http) self.assert_literal(r'http://www.acooke.org/andrew/?foo#bar', http) self.assert_literal(r'http://www.acooke.org:80/andrew/?foo#bar', http) self.assert_fail(r'http://www.acooke.org:80/andrew/?foo#bar ', http) self.assert_fail(r'http://www.acooke.org:80/andrew/?foo#bar baz', http) def test_http(self): httpUrl = HttpUrl() assert httpUrl(r'http://www.acooke.org') assert httpUrl(r'http://www.acooke.org/') assert httpUrl(r'http://www.acooke.org:80') assert httpUrl(r'http://www.acooke.org:80/') assert httpUrl(r'http://www.acooke.org/andrew') assert httpUrl(r'http://www.acooke.org:80/andrew') assert httpUrl(r'http://www.acooke.org/andrew/') assert httpUrl(r'http://www.acooke.org:80/andrew/') assert httpUrl(r'http://www.acooke.org/?foo') assert httpUrl(r'http://www.acooke.org:80/?foo') assert httpUrl(r'http://www.acooke.org/#bar') assert httpUrl(r'http://www.acooke.org:80/#bar') assert httpUrl(r'http://www.acooke.org/andrew?foo') assert httpUrl(r'http://www.acooke.org:80/andrew?foo') assert httpUrl(r'http://www.acooke.org/andrew/?foo') assert httpUrl(r'http://www.acooke.org:80/andrew/?foo') assert httpUrl(r'http://www.acooke.org/andrew#bar') assert httpUrl(r'http://www.acooke.org:80/andrew#bar') assert httpUrl(r'http://www.acooke.org/andrew/#bar') assert httpUrl(r'http://www.acooke.org:80/andrew/#bar') assert httpUrl(r'http://www.acooke.org/andrew?foo#bar') assert httpUrl(r'http://www.acooke.org:80/andrew?foo#bar') assert httpUrl(r'http://www.acooke.org/andrew/?foo#bar') assert httpUrl(r'http://www.acooke.org:80/andrew/?foo#bar') assert httpUrl(r'http://1.2.3.4:80/andrew/?foo#bar') assert httpUrl(r'http://[1:2:3:4:5:6:7:8]:80/andrew/?foo#bar') # http://base.google.com/support/bin/answer.py?hl=en&answer=25230 assert not httpUrl(r'http://www.example.com/space here.html') assert not httpUrl(r'http://www.example.com\main.html') assert not httpUrl(r'/main.html') assert not httpUrl(r'www.example.com/main.html') assert not httpUrl(r'http:www.example.com/main.html') class MailToUrlTest(BaseTest): def test_mail_to_url(self): mailToUrl = MailToUrl() assert mailToUrl('mailto:joe@example.com') assert mailToUrl('mailto:user%2Bmailbox@example.com') assert mailToUrl('mailto:customer%2Fdepartment=shipping@example.com') assert mailToUrl('mailto:$A12345@example.com') assert mailToUrl('mailto:!def!xyz%25abc@example.com') assert mailToUrl('mailto:_somename@example.com') LEPL-5.1.3/src/lepl/apps/__init__.py0000644000175000001440000000000011731117151017476 0ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/apps/json.py0000644000175000001440000000304111731117151016720 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Punt to the contrib package (this is just to keep copyright clear). ''' from lepl.contrib.json import Simple LEPL-5.1.3/src/lepl/apps/rfc3696.py0000644000175000001440000004357611754013750017077 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Matchers for validating URIs and related objects, taken from RFC3696. IMPORTANT - the emphasis here is on validation of user input. These matchers are not exact matches for the underlying specs - they are just useful practical approximations. Read RFC3696 to see what I mean (or the quotes from that doc in the source below). ''' from re import compile as compile_ from string import ascii_letters, digits, printable, whitespace from lepl import * _HEX = digits + 'abcdef' + 'ABCDEF' def _guarantee_bool(function): ''' A decorator that guarantees a true/false response. ''' def wrapper(*args, **kargs): try: return bool(function(*args, **kargs)) except: return False return wrapper def _matcher_to_validator(factory): ''' Generate a validator based on the given matcher factory. ''' matcher = factory() matcher.config.compile_to_re().no_memoize() @_guarantee_bool def validator(value): for char in '\n\r': assert char not in value return matcher.parse(value) return validator def _LimitLength(matcher, length): ''' Reject a match if it exceeds a certain length. ''' return PostCondition(matcher, lambda results: len(results[0]) <= length) def _RejectRegexp(matcher, pattern): ''' Reject a match if it matches a (ie some other) regular expression ''' regexp = compile_(pattern) return PostCondition(matcher, lambda results: not regexp.match(results[0])) def _LimitIntValue(matcher, max): ''' Reject a match if the value exceeds some value. ''' return PostCondition(matcher, lambda results: int(results[0]) <= max) def _LimitCount(matcher, char, max): ''' Reject a match if the number of times a particular character occurs exceeds some value. ''' return PostCondition(matcher, lambda results: results[0].count(char) <= max) def _PreferredFullyQualifiedDnsName(): ''' A matcher for DNS names. RFC 3696: Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications. This preferred form has been the only one permitted in the names of top-level domains, or TLDs. In general, it is also the only form permitted in most second-level names registered in TLDs, although some names that are normally not seen by users obey other rules. It derives from the original ARPANET rules for the naming of hosts (i.e., the "hostname" rule) and is perhaps better described as the "LDH rule", after the characters that it permits. The LDH rule, as updated, provides that the labels (words or strings separated by periods) that make up a domain name must consist of only the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen. No other symbols or punctuation characters are permitted, nor is blank space. If the hyphen is used, it is not permitted to appear at either the beginning or end of a label. There is an additional rule that essentially requires that top-level domain names not be all- numeric. [...] Most internet applications that reference other hosts or systems assume they will be supplied with "fully-qualified" domain names, i.e., ones that include all of the labels leading to the root, including the TLD name. Those fully-qualified domain names are then passed to either the domain name resolution protocol itself or to the remote systems. Consequently, purported DNS names to be used in applications and to locate resources generally must contain at least one period (".") character. [...] [...]It is likely that the better strategy has now become to make the "at least one period" test, to verify LDH conformance (including verification that the apparent TLD name is not all-numeric), and then to use the DNS to determine domain name validity, rather than trying to maintain a local list of valid TLD names. [...] A DNS label may be no more than 63 octets long. This is in the form actually stored; if a non-ASCII label is converted to encoded "punycode" form (see Section 5), the length of that form may restrict the number of actual characters (in the original character set) that can be accommodated. A complete, fully-qualified, domain name must not exceed 255 octets. ''' ld = Any(ascii_letters + digits) ldh = ld | '-' label = ld + Optional(ldh[:] + ld) short_label = _LimitLength(label, 63) tld = _RejectRegexp(short_label, r'^[0-9]+$') any_name = short_label[1:, r'\.', ...] + '.' + tld non_numeric = _RejectRegexp(any_name, r'^[0-9\.]+$') short_name = _LimitLength(non_numeric, 255) return short_name def _IpV4Address(): ''' A matcher for IPv4 addresses. RFC 3696 doesn't say much about these; RFC 2396 doesn't mention limits on numerical values, but it must be 255. ''' octet = _LimitIntValue(Any(digits)[1:, ...], 255) address = octet[4, '.', ...] return address def _Ipv6Address(): ''' A matcher for IPv6 addresses. Again, RFC 3696 says little; RFC 2373 (addresses) and 2732 (URLs) have much more information: 1. The preferred form is x:x:x:x:x:x:x:x, where the 'x's are the hexadecimal values of the eight 16-bit pieces of the address. Examples: FEDC:BA98:7654:3210:FEDC:BA98:7654:3210 1080:0:0:0:8:800:200C:417A Note that it is not necessary to write the leading zeros in an individual field, but there must be at least one numeral in every field (except for the case described in 2.). 2. Due to some methods of allocating certain styles of IPv6 addresses, it will be common for addresses to contain long strings of zero bits. In order to make writing addresses containing zero bits easier a special syntax is available to compress the zeros. The use of "::" indicates multiple groups of 16-bits of zeros. The "::" can only appear once in an address. The "::" can also be used to compress the leading and/or trailing zeros in an address. For example the following addresses: 1080:0:0:0:8:800:200C:417A a unicast address FF01:0:0:0:0:0:0:101 a multicast address 0:0:0:0:0:0:0:1 the loopback address 0:0:0:0:0:0:0:0 the unspecified addresses may be represented as: 1080::8:800:200C:417A a unicast address FF01::101 a multicast address ::1 the loopback address :: the unspecified addresses 3. An alternative form that is sometimes more convenient when dealing with a mixed environment of IPv4 and IPv6 nodes is x:x:x:x:x:x:d.d.d.d, where the 'x's are the hexadecimal values of the six high-order 16-bit pieces of the address, and the 'd's are the decimal values of the four low-order 8-bit pieces of the address (standard IPv4 representation). Examples: 0:0:0:0:0:0:13.1.68.3 0:0:0:0:0:FFFF:129.144.52.38 or in compressed form: ::13.1.68.3 ::FFFF:129.144.52.38 ''' piece = Any(_HEX)[1:4, ...] preferred = piece[8, ':', ...] # we need to be careful about how we match the compressed form, since we # have a limit on the total number of pieces. the simplest approach seems # to be to limit the final number of ':' characters, but we must take # care to treat the cases where '::' is at one end separately: # 1::2:3:4:5:6:7 has 7 ':' characters # 1:2:3:4:5:6:7:: has 8 ':' characters compact = Or(_LimitCount(piece[1:6, ':', ...] + '::' + piece[1:6, ':', ...], ':', 7), '::' + piece[1:7, ':', ...], piece[1:7, ':', ...] + '::', '::') # similar to above, but we need to also be careful about the separator # between the v6 and v4 parts alternate = \ Or(piece[6, ':', ...] + ':', _LimitCount(piece[1:4, ':', ...] + '::' + piece[1:4, ':', ...], ':', 5), '::' + piece[1:5, ':', ...] + ':', piece[1:5, ':', ...] + '::', '::') + _IpV4Address() return (preferred | compact | alternate) def _EmailLocalPart(): ''' A matcher for the local part ("username") of an email address. RFC 3696: Contemporary email addresses consist of a "local part" separated from a "domain part" (a fully-qualified domain name) by an at-sign ("@"). The syntax of the domain part corresponds to that in the previous section. The concerns identified in that section about filtering and lists of names apply to the domain names used in an email context as well. The domain name can also be replaced by an IP address in square brackets, but that form is strongly discouraged except for testing and troubleshooting purposes. The local part may appear using the quoting conventions described below. The quoted forms are rarely used in practice, but are required for some legitimate purposes. Hence, they should not be rejected in filtering routines but, should instead be passed to the email system for evaluation by the destination host. The exact rule is that any ASCII character, including control characters, may appear quoted, or in a quoted string. When quoting is needed, the backslash character is used to quote the following character. [...] In addition to quoting using the backslash character, conventional double-quote characters may be used to surround strings. [...] Without quotes, local-parts may consist of any combination of alphabetic characters, digits, or any of the special characters ! # $ % & ' * + - / = ? ^ _ ` . { | } ~ period (".") may also appear, but may not be used to start or end the local part, nor may two or more consecutive periods appear. Stated differently, any ASCII graphic (printing) character other than the at-sign ("@"), backslash, double quote, comma, or square brackets may appear without quoting. If any of that list of excluded characters are to appear, they must be quoted. [...] In addition to restrictions on syntax, there is a length limit on email addresses. That limit is a maximum of 64 characters (octets) in the "local part" (before the "@") and a maximum of 255 characters (octets) in the domain part (after the "@") for a total length of 320 characters. Systems that handle email should be prepared to process addresses which are that long, even though they are rarely encountered. ''' unescaped_chars = ascii_letters + digits + "!#$%&'*+-/=?^_`.{|}~" escapable_chars = unescaped_chars + r'@\",[] ' quotable_chars = unescaped_chars + r'@\,[] ' unquoted_string = (('\\' + Any(escapable_chars)) | Any(unescaped_chars))[1:, ...] quoted_string = '"' + Any(quotable_chars)[1:, ...] + '"' local_part = quoted_string | unquoted_string no_extreme_dot = _RejectRegexp(local_part, r'"?\..*\."?') no_double_dot = _RejectRegexp(no_extreme_dot, r'.*\."*\..*') short_local_part = _LimitLength(no_double_dot, 64) return short_local_part def _Email(): ''' A matcher for email addresses. ''' return _EmailLocalPart() + '@' + _PreferredFullyQualifiedDnsName() def Email(): ''' Generate a validator for emails, according to RFC3696, which returns True if the email is valid, and False otherwise. ''' return _matcher_to_validator(_Email) def _HttpUrl(): ''' A matcher for HTTP URLs. RFC 3696: The following characters are reserved in many URIs -- they must be used for either their URI-intended purpose or must be encoded. Some particular schemes may either broaden or relax these restrictions (see the following sections for URLs applicable to "web pages" and electronic mail), or apply them only to particular URI component parts. ; / ? : @ & = + $ , ? In addition, control characters, the space character, the double- quote (") character, and the following special characters < > # % are generally forbidden and must either be avoided or escaped, as discussed below. [...] When it is necessary to encode these, or other, characters, the method used is to replace it with a percent-sign ("%") followed by two hexidecimal digits representing its octet value. See section 2.4.1 of [RFC2396] for an exact definition. Unless it is used as a delimiter of the URI scheme itself, any character may optionally be encoded this way; systems that are testing URI syntax should be prepared for these encodings to appear in any component of the URI except the scheme name itself. [...] Absolute HTTP URLs consist of the scheme name, a host name (expressed as a domain name or IP address), and optional port number, and then, optionally, a path, a search part, and a fragment identifier. These are separated, respectively, by a colon and the two slashes that precede the host name, a colon, a slash, a question mark, and a hash mark ("#"). So we have http://host:port/path?search#fragment http://host/path/ http://host/path#fragment http://host/path?search http://host and other variations on that form. There is also a "relative" form, but it almost never appears in text that a user might, e.g., enter into a form. See [RFC2616] for details. [...] The characters / ; ? are reserved within the path and search parts and must be encoded; the first of these may be used unencoded, and is often used within the path, to designate hierarchy. ''' path_chars = ''.join(set(printable).difference(set(whitespace)) .difference('/;?<>#%')) other_chars = path_chars + '/' path_string = ('%' + Any(_HEX)[2, ...] | Any(path_chars))[1:, ...] other_string = ('%' + Any(_HEX)[2, ...] | Any(other_chars))[1:, ...] host = _IpV4Address() | ('[' + _Ipv6Address() + ']') | \ _PreferredFullyQualifiedDnsName() url = 'http://' + host + \ Optional(':' + Any(digits)[1:, ...]) + \ Optional('/' + Optional(path_string[1:, '/', ...] + Optional('/')) + Optional('?' + other_string) + Optional('#' + other_string)) return url def HttpUrl(): ''' Generate a validator for HTTP URLs, according to RFC3696, which returns True if the email is valid, and False otherwise. ''' return _matcher_to_validator(_HttpUrl) def MailToUrl(): ''' Generate a validator for email addresses, according to RFC3696, which returns True if the URL is valid, and False otherwise. RFC 3696: The following characters may appear in MAILTO URLs only with the specific defined meanings given. If they appear in an email address (i.e., for some other purpose), they must be encoded: : The colon in "mailto:" < > # " % { } | \ ^ ~ ` These characters are "unsafe" in any URL, and must always be encoded. The following characters must also be encoded if they appear in a MAILTO URL ? & = Used to delimit headers and their values when these are encoded into URLs. ---------- The RFC isn't that great a guide here. The best approach, I think, is to check the URL for "forbidden" characters, then decode it, and finally validate the decoded email. So we implement the validator directly (ie this is not a matcher). ''' MAIL_TO = 'mailto:' encoded_token = compile_('(%.{0,2})') email = _Email() email.config.compile_to_re().no_memoize() @_guarantee_bool def validator(url): assert url.startswith(MAIL_TO) url = url[len(MAIL_TO):] for char in r':<>#"{}|\^~`': assert char not in url def unpack(chunk): if chunk.startswith('%'): assert len(chunk) == 3 return chr(int(chunk[1:], 16)) else: return chunk url = ''.join(unpack(chunk) for chunk in encoded_token.split(url)) assert url return email.parse(url) return validator LEPL-5.1.3/src/lepl/stream/0000755000175000001440000000000011764776700015750 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/stream/_test/0000755000175000001440000000000011764776700017066 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/stream/_test/__init__.py0000644000175000001440000000325511731117151021163 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.stream package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.stream._test.file import lepl.stream._test.iter import lepl.stream._test.simple LEPL-5.1.3/src/lepl/stream/_test/iter.py0000644000175000001440000000605311731117151020366 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. from lepl._test.base import BaseTest from lepl.stream.core import s_empty, s_fmt, s_line, s_next, s_stream, \ s_debug, s_deepest from lepl.stream.factory import DEFAULT_STREAM_FACTORY class GenericTest(BaseTest): def test_all(self): lines = iter(['first line', 'second line', 'third line']) f = DEFAULT_STREAM_FACTORY s1 = f(lines) # just created assert not s_empty(s1) # get first line (l1, s2) = s_line(s1, False) assert 'first line' == l1, l1 # get first character of next line (c21, s21) = s_next(s2) assert c21 == 's', c21 # and test fmtting locn = s_fmt(s21, '{location}: {rest}') assert locn == "line 2, character 2: 'econd line'", locn # then get rest of second line (c22, s3) = s_next(s21, count=len('econd line')) assert c22 == 'econd line', c22 d = s_debug(s21) assert d == "1:'e'", d # and move on to third line (c31, s31) = s_next(s3) assert c31 == 't', c31 (c32, s32) = s_next(s31) assert c32 == 'h', c32 # now try branching (think tokens) at line 1 s10 = s_stream(s2, l1) (l1, s20) = s_line(s10, False) assert l1 == 'first line', l1 assert not s_empty(s20) (c1, s11) = s_next(s10) assert c1 == 'f', c1 d = s_debug(s11) assert d == "1:'i'", d # finally look at max depth (which was after 'h' in third line) m = s_deepest(s1) locn = s_fmt(m, '{location}: {rest}') assert locn == "line 3, character 3: 'ird line'", locn LEPL-5.1.3/src/lepl/stream/_test/simple.py0000644000175000001440000001036611731117215020717 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. from lepl.support.lib import fmt from lepl._test.base import BaseTest from lepl.stream.core import s_empty, s_fmt, s_line, s_next, s_stream from lepl.stream.factory import DEFAULT_STREAM_FACTORY class GenericTest(BaseTest): def test_empty(self): f = DEFAULT_STREAM_FACTORY for (constructor, data) in ((f.from_sequence, ''), (f.from_sequence, []), (f.from_sequence, ()), (f.from_string, ''), (f.from_list, [])): s = constructor(data) assert s_empty(s) try: s_next(s) assert False, fmt('expected error: {0}', s) except StopIteration: pass try: s_line(s, False) assert False, fmt('expected error: {0}', s) except StopIteration: pass def test_single_value(self): f = DEFAULT_STREAM_FACTORY for (constructor, data) in ((f.from_sequence, 'a'), (f.from_sequence, [1]), (f.from_sequence, (2,)), (f.from_string, 'b'), (f.from_list, ['c'])): s = constructor(data) assert not s_empty(s) (value, n) = s_next(s) assert value == data assert s_empty(n) (line, n) = s_line(s, False) assert line == data assert s_empty(n) def test_two_values(self): f = DEFAULT_STREAM_FACTORY for (constructor, data) in ((f.from_sequence, 'ab'), (f.from_sequence, [1, 2]), (f.from_sequence, (2,3)), (f.from_string, 'bc'), (f.from_list, ['c', 6])): s = constructor(data) assert not s_empty(s) (value, n) = s_next(s) assert value == data[0:1] (value, n) = s_next(n) assert value == data[1:2] assert s_empty(n) (line, n) = s_line(s, False) assert line == data assert s_empty(n) def test_string_lines(self): f = DEFAULT_STREAM_FACTORY s = f.from_string('line 1\nline 2\nline 3\n') (l, s) = s_line(s, False) assert l == 'line 1\n', l (l, _) = s_line(s, False) assert l == 'line 2\n', repr(l) locn = s_fmt(s, '{location}') assert locn == 'line 2, character 1', locn sl = s_stream(s, l) (_, sl) = s_next(sl, count=2) locn = s_fmt(sl, '{location}') assert locn == 'line 2, character 3', locn LEPL-5.1.3/src/lepl/stream/_test/file.py0000644000175000001440000000427711731117151020350 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke. All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. from __future__ import print_function from sys import version from lepl._test.base import BaseTest from lepl.lexer.matchers import Token from lepl.support.lib import str from tempfile import TemporaryFile class FileTest(BaseTest): def test_file(self): if version[0] == '3': f = TemporaryFile('w+', encoding='utf8') else: f = TemporaryFile('w+') print("hello world\n", file=f) f.flush() # f.seek(0) # print(f.readlines()) f.seek(0) w = Token('[a-z]+') s = Token(' +') v = w & s & w v.parse_iterable(f) def test_default(self): w = Token('[a-z]+') s = Token(' +') v = w & s & w v.parse_string("hello world\n") LEPL-5.1.3/src/lepl/stream/__init__.py0000644000175000001440000000000011731117151020026 0ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/stream/maxdepth.py0000644000175000001440000000613211731117151020115 0ustar andrewusers00000000000000# The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Raise an exception if the stream is not consumed entirely. ''' from lepl.stream.core import s_empty, s_fmt, s_deepest, s_next from lepl.matchers.support import trampoline_matcher_factory @trampoline_matcher_factory() def FullFirstMatch(matcher, eos=True): ''' Raise an exception if the first match fails (if eos=False) or does not consume the entire input stream (eos=True). The exception includes information about the location of the deepest match. This only works for the first match because we cannot reset the stream facade for subsequent matches (also, if you want multiple matches you probably want more sophisticated error handling than this). ''' def _matcher(support, stream1): # set default maxdepth s_next(stream1, count=0) # first match generator = matcher._match(stream1) try: (result2, stream2) = yield generator if eos and not s_empty(stream2): raise FullFirstMatchException(stream2) else: yield (result2, stream2) except StopIteration: raise FullFirstMatchException(stream1) # subsequent matches: while True: result = yield generator yield result return _matcher class FullFirstMatchException(Exception): ''' The exception raised by `FullFirstMatch`. This includes information about the deepest point read in the stream. ''' def __init__(self, stream): super(FullFirstMatchException, self).__init__( s_fmt(s_deepest(stream), 'The match failed in {filename} at {rest} ({location}).')) LEPL-5.1.3/src/lepl/stream/factory.py0000644000175000001440000001123311731117215017751 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. from collections import Iterable from lepl.stream.simple import SequenceHelper, StringHelper, ListHelper from lepl.stream.iter import IterableHelper, Cons from lepl.support.lib import basestring, fmt, add_defaults, file from lepl.lexer.stream import TokenHelper class StreamFactory(object): ''' Given a value (typically a sequence), generate a stream. ''' def from_string(self, text, **kargs): ''' Provide a stream for the contents of the string. ''' add_defaults(kargs, {'factory': self}) return (0, StringHelper(text, **kargs)) def from_list(self, list_, **kargs): ''' Provide a stream for the contents of the list. ''' add_defaults(kargs, {'factory': self}) return (0, ListHelper(list_, **kargs)) def from_sequence(self, sequence, **kargs): ''' Return a generic stream for any indexable sequence. ''' add_defaults(kargs, {'factory': self}) return (0, SequenceHelper(sequence, **kargs)) def from_iterable(self, iterable, **kargs): ''' Provide a stream for the contents of the iterable. This assumes that each value from the iterable is a "line" which will, in turn, be passed to the stream factory. ''' add_defaults(kargs, {'factory': self}) cons = Cons(iterable) return ((cons, self(cons.head, **kargs)), IterableHelper(**kargs)) def from_file(self, file_, **kargs): ''' Provide a stream for the contents of the file. There is no corresponding `from_path` because the opening and closing of the path must be done outside the parsing (or the contents will become unavailable), so use instead: with open(path) as f: parser.parse_file(f) which will close the file after parsing. ''' try: gkargs = kargs.get('global_kargs', {}) add_defaults(gkargs, {'filename': file_.name}) add_defaults(kargs, {'global_kargs': gkargs}) except AttributeError: pass return self.from_iterable(file_, **kargs) def to_token(self, iterable, **kargs): ''' Create a stream for tokens. The `iterable` is a source of (token_ids, sub_stream) tuples, where `sub_stream` will be matched within the token. ''' return (Cons(iterable), TokenHelper(**kargs)) def __call__(self, sequence, **kargs): ''' Auto-detect type and wrap appropriately. ''' if isinstance(sequence, basestring): return self.from_string(sequence, **kargs) elif isinstance(sequence, list): return self.from_list(sequence, **kargs) elif isinstance(sequence, file): return self.from_file(sequence, **kargs) elif hasattr(sequence, '__getitem__') and hasattr(sequence, '__len__'): return self.from_sequence(sequence, **kargs) elif isinstance(sequence, Iterable): return self.from_iterable(sequence, **kargs) else: raise TypeError(fmt('Cannot generate a stream for type {0}', type(sequence))) DEFAULT_STREAM_FACTORY = StreamFactory() LEPL-5.1.3/src/lepl/stream/iter.py0000644000175000001440000002230711731117215017251 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' A stream for iterable sources. Each value in the iteration is considered as a line (which makes sense for files, for example, which iterate over lines). The source is wrapped in a `Cons` object. This has an attribute `head` which contains the current line and a method `tail()` which returns another `Cons` instance, or raise a `StopIteration`. The stream has the form `(state, helper)`, where `helper` is an `IterableHelper` instance, as described below. The `state` value in the stream described above has the form `(cons, line_stream)` where `cons` is a `Cons` instance and line_stream is a stream generated from `cons.head` (so has the structure (state', helper') where state' and helper' depend on the type of the line and the stream factory used). Evaluation of stream methods then typically has the form: - call to IterableHelper - unpacking of state - delegation to line_stream - possible exception handling This has the advantages of being generic in the type returned by the iterator, of being customizable (by specifying a new factory), and re-using existing code where possible (in the use of the sub-helper). It should even be possible to have iterables of iterables... ''' from lepl.support.lib import add_defaults, fmt from lepl.stream.simple import OFFSET, LINENO, BaseHelper from lepl.stream.core import s_delta, s_kargs, s_fmt, s_debug, s_next, \ s_line, s_join, s_empty, s_eq, HashKey class Cons(object): ''' A linked list cell that is a lazy wrapper around an iterable. So "tail" returns the next iterable on demand. ''' __slots__ = ['_iterable', '_head', '_tail', '_expanded'] def __init__(self, iterable): self._iterable = iterable self._head = None self._tail = None self._expanded = False def _expand(self): if not self._expanded: self._head = next(self._iterable) self._tail = Cons(self._iterable) self._expanded = True @property def head(self): self._expand() return self._head @property def tail(self): self._expand() return self._tail def base_iterable_factory(state_to_line_stream, type_): ''' `IterableHelper` and the token helper differ mainly in how they map from `state` to `line_stream`. ''' class BaseIterableHelper(BaseHelper): def __init__(self, id=None, factory=None, max=None, global_kargs=None, cache_level=None, delta=None): super(BaseIterableHelper, self).__init__(id=id, factory=factory, max=max, global_kargs=global_kargs, cache_level=cache_level, delta=delta) add_defaults(self.global_kargs, { 'global_type': type_, 'filename': type_}) self._kargs = dict(self.global_kargs) add_defaults(self._kargs, {'type': type_}) def key(self, state, other): try: line_stream = state_to_line_stream(state) offset = s_delta(line_stream)[OFFSET] except StopIteration: self._warn('Default hash') offset = -1 key = HashKey(self.id ^ offset ^ hash(other), (self.id, other)) #self._debug(fmt('Hash at {0!r} ({1}): {2}', state, offset, hash(key))) return key def kargs(self, state, prefix='', kargs=None): line_stream = state_to_line_stream(state) return s_kargs(line_stream, prefix=prefix, kargs=kargs) def fmt(self, state, template, prefix='', kargs=None): line_stream = state_to_line_stream(state) return s_fmt(line_stream, template, prefix=prefix, kargs=kargs) def debug(self, state): try: line_stream = state_to_line_stream(state) return s_debug(line_stream) except StopIteration: return '' def join(self, state, *values): line_stream = state_to_line_stream(state) return s_join(line_stream, *values) def empty(self, state): try: self.next(state) return False except StopIteration: return True def delta(self, state): line_stream = state_to_line_stream(state) return s_delta(line_stream) def eq(self, state1, state2): line_stream1 = state_to_line_stream(state1) line_stream2 = state_to_line_stream(state2) return s_eq(line_stream1, line_stream2) def deepest(self): return self.max.get() def new_max(self, state): return (self.max, (state, type(self)(id=self.id, factory=self.factory, max=None, delta=self.delta, global_kargs=self.global_kargs, cache_level=self.cache_level))) return BaseIterableHelper class IterableHelper( base_iterable_factory(lambda state: state[1], '')): ''' Implement a stream over iterable values. ''' def _next_line(self, cons, empty_line_stream): delta = s_delta(empty_line_stream) delta = (delta[OFFSET], delta[LINENO]+1, 1) return self.factory(cons.head, id=self.id, factory=self.factory, max=self.max, global_kargs=self.global_kargs, delta=delta) def next(self, state, count=1): (cons, line_stream) = state try: (value, next_line_stream) = s_next(line_stream, count=count) return (value, ((cons, next_line_stream), self)) except StopIteration: # the general approach here is to take what we can from the # current line, create the next, and take the rest from that. # of course, that may also not have enough, in which case it # will recurse. cons = cons.tail if s_empty(line_stream): next_line_stream = self._next_line(cons, line_stream) next_stream = ((cons, next_line_stream), self) return s_next(next_stream, count=count) else: (line, end_line_stream) = s_line(line_stream, False) next_line_stream = self._next_line(cons, end_line_stream) next_stream = ((cons, next_line_stream), self) (extra, final_stream) = s_next(next_stream, count=count-len(line)) value = s_join(line_stream, line, extra) return (value, final_stream) def line(self, state, empty_ok): try: (cons, line_stream) = state if s_empty(line_stream): cons = cons.tail line_stream = self._next_line(cons, line_stream) (value, empty_line_stream) = s_line(line_stream, empty_ok) return (value, ((cons, empty_line_stream), self)) except StopIteration: if empty_ok: raise TypeError('Iterable stream cannot return an empty line') else: raise def len(self, state): self._error('len(iter)') raise TypeError def stream(self, state, value, id_=None, max=None): (cons, line_stream) = state id_ = self.id if id_ is None else id_ max = max if max else self.max next_line_stream = \ self.factory(value, id=id_, factory=self.factory, max=max, global_kargs=self.global_kargs, cache_level=self.cache_level+1, delta=s_delta(line_stream)) return ((cons, next_line_stream), self) LEPL-5.1.3/src/lepl/stream/simple.py0000644000175000001440000003014211731117215017573 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Default implementation of the helper classes for sequences (strings and lists). The state is an integer offset. Sequence and a possible delta for the offset are stored in the helper. ''' from itertools import chain from lepl.support.lib import fmt, add_defaults, str, LogMixin from lepl.stream.core import StreamHelper, OFFSET, LINENO, CHAR, HashKey class BaseHelper(LogMixin, StreamHelper): def __init__(self, id=None, factory=None, max=None, global_kargs=None, cache_level=None, delta=None): super(BaseHelper, self).__init__(id=id, factory=factory, max=max, global_kargs=global_kargs, cache_level=cache_level) self._delta = delta if delta else (0,1,1) class SequenceHelper(BaseHelper): def __init__(self, sequence, id=None, factory=None, max=None, global_kargs=None, cache_level=None, delta=None): super(SequenceHelper, self).__init__(id=id, factory=factory, max=max, global_kargs=global_kargs, cache_level=cache_level, delta=delta) self._sequence = sequence type_ = self._typename(sequence) add_defaults(self.global_kargs, { 'global_type': type_, 'filename': type_}) self._kargs = dict(self.global_kargs) add_defaults(self._kargs, {'type': type_}) def key(self, state, other): # avoid confusion with incremental ids offset = (state + self._delta[OFFSET]) << 16 key = HashKey(self.id ^ offset ^ hash(other), (self.id, hash(other))) #self._debug(fmt('Hash at offset {0}: {1}', offset, hash(key))) return key def _fmt(self, sequence, offset, maxlen=60, left='', right='', index=True): '''fmt a possibly long subsection of data.''' if not sequence: if index: return fmt('{0!r}[{1:d}]', sequence, offset) else: return fmt('{0!r}', sequence) if offset >= 0 and offset < len(sequence): centre = offset elif offset > 0: centre = len(sequence) - 1 else: centre = 0 begin, end = centre, centre+1 longest = None while True: if begin > 0: if end < len(sequence): template = '{0!s}...{1!s}...{2!s}' else: template = '{0!s}...{1!s}{2!s}' else: if end < len(sequence): template = '{0!s}{1!s}...{2!s}' else: template = '{0!s}{1!s}{2!s}' body = repr(sequence[begin:end])[len(left):] if len(right): body = body[:-len(right)] text = fmt(template, left, body, right, offset) if index: text = fmt('{0!s}[{1:d}:]', text, offset) if longest is None or len(text) <= maxlen: longest = text if len(text) > maxlen: return longest begin -= 1 end += 1 if begin < 0 and end > len(sequence): return longest begin = max(begin, 0) end = min(end, len(sequence)) def _location(self, kargs, prefix): '''Location (separate method so subclasses can replace).''' return fmt('offset {' + prefix + 'global_offset}, value {' + prefix + 'repr}', **kargs) def _typename(self, instance): if isinstance(instance, list) and instance: return fmt('', self._typename(instance[0])) else: try: return fmt('<{0}>', instance.__class__.__name__) except: return '' def kargs(self, state, prefix='', kargs=None): ''' Generate a dictionary of values that describe the stream. These may be extended by subclasses. They are provided to `syntax_error_kargs`, for example. Note: Calculating this can be expensive; use only for error messages, not debug messages (that may be discarded). Implementation note: Because some values are ''' offset = state + self._delta[OFFSET] if kargs is None: kargs = {} add_defaults(kargs, self._kargs, prefix=prefix) within = offset > -1 and offset < len(self._sequence) data = self._fmt(self._sequence, state) text = self._fmt(self._sequence, state, index=False) # some values below may be already present in self._global_kargs defaults = {'data': data, 'global_data': data, 'text': text, 'global_text': text, 'offset': state, 'global_offset': offset, 'rest': self._fmt(self._sequence[offset:], 0, index=False), 'repr': repr(self._sequence[offset]) if within else '', 'str': str(self._sequence[offset]) if within else '', 'lineno': 1, 'char': offset+1} add_defaults(kargs, defaults, prefix=prefix) add_defaults(kargs, {prefix + 'location': self._location(kargs, prefix)}) return kargs def next(self, state, count=1): new_state = state+count if new_state <= len(self._sequence): stream = (new_state, self) self.max.update(self._delta[OFFSET] + new_state - 1, stream) return (self._sequence[state:new_state], stream) else: raise StopIteration def join(self, state, *values): assert values, 'Cannot join zero general sequences' result = values[0] for value in values[1:]: result += value return result def empty(self, state): return state >= len(self._sequence) def line(self, state, empty_ok): '''Returns the rest of the data.''' new_state = len(self._sequence) if state < new_state or (empty_ok and state == new_state): stream = (new_state, self) self.max.update(self._delta[OFFSET] + new_state, stream) return (self._sequence[state:new_state], stream) else: raise StopIteration def len(self, state): return len(self._sequence) - state def stream(self, state, value, id_=None, max=None): id_ = self.id if id_ is None else id_ max = max if max else self.max # increment the cache level to expose lower level streams return self.factory(value, id=id_, factory=self.factory, max=max, global_kargs=self.global_kargs, cache_level=self.cache_level+1, delta=self.delta(state)) def deepest(self): return self.max.get() def debug(self, state): try: return fmt('{0:d}:{1!r}', state, self._sequence[state]) except IndexError: return fmt('{0:d}:', state) def delta(self, state): offset = state + self._delta[OFFSET] return (offset, 1, offset+1) def new_max(self, state): return (self.max, (state, type(self)(self._sequence, id=self.id, factory=self.factory, max=None, global_kargs=self.global_kargs, delta=self._delta))) class StringHelper(SequenceHelper): ''' String-specific fmtting and location. ''' def __init__(self, sequence, id=None, factory=None, max=None, global_kargs=None, cache_level=None, delta=None): # avoid duplicating processing on known strings if id is None: id = hash(sequence) super(StringHelper, self).__init__(sequence, id=id, factory=factory, max=max, global_kargs=global_kargs, cache_level=cache_level, delta=delta) def _fmt(self, sequence, offset, maxlen=60, left="'", right="'", index=True): return super(StringHelper, self)._fmt(sequence, offset, maxlen=maxlen, left=left, right=right, index=index) def _location(self, kargs, prefix): return fmt('line {' + prefix + 'lineno:d}, character {' + prefix + 'char:d}', **kargs) def delta(self, state): offset = self._delta[OFFSET] + state lineno = self._delta[LINENO] + self._sequence.count('\n', 0, state) start = self._sequence.rfind('\n', 0, state) if start > -1: char = state - start else: char = self._delta[CHAR] + state return (offset, lineno, char) def kargs(self, state, prefix='', kargs=None): if kargs is None: kargs = {} (_, lineno, char) = self.delta(state) start = self._sequence.rfind('\n', 0, state) + 1 # omit \n end = self._sequence.find('\n', state) # omit \n # all is str() because passed to SyntaxError constructor if end < 0: rest = repr(self._sequence[state:]) all = str(self._sequence[start:]) else: rest = repr(self._sequence[state:end]) all = str(self._sequence[start:end]) add_defaults(kargs, { 'type': '', 'filename': '', 'rest': rest, 'all': all, 'lineno': lineno, 'char': char}, prefix=prefix) return super(StringHelper, self).kargs(state, prefix=prefix, kargs=kargs) def join(self, state, *values): return str().join(values) def line(self, state, empty_ok): '''Returns up to, and including then next \n''' max_len = len(self._sequence) if state < max_len or (empty_ok and state == max_len): end = self._sequence.find('\n', state) + 1 if not end: end = len(self._sequence) return (self._sequence[state:end], (end, self)) else: raise StopIteration def stream(self, state, value, id_=None, max=None): id_ = self.id if id_ is None else id_ max = max if max else self.max return self.factory(value, id=id_, factory=self.factory, max=max, global_kargs=self.global_kargs, delta=self.delta(state)) class ListHelper(SequenceHelper): ''' List-specific fprmatting ''' def _fmt(self, sequence, offset, maxlen=60, left="[", right="]", index=True): return super(ListHelper, self)._fmt(sequence, offset, maxlen=maxlen, left=left, right=right, index=index) def join(self, state, *values): return list(chain(*values)) LEPL-5.1.3/src/lepl/stream/facade.py0000644000175000001440000000635411731117215017515 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' A facade that delegates all methods to an embedded instance. ''' from lepl.stream.core import StreamHelper class HelperFacade(StreamHelper): ''' A facade that delegates all calls to the underlying delegate stream. ''' def __init__(self, delegate): self._delegate = delegate def __repr__(self): return repr(self._delegate) def __eq__(self, other): return self._delegate == other def __hash__(self): return hash(self._delegate) def key(self, state, other): return self._delegate.key(state, other) def kargs(self, state, prefix='', kargs=None): return self._delegate.kargs(state, prefix=prefix, kargs=kargs) def fmt(self, state, template, prefix='', kargs=None): return self._delegate.fmt(state, template, prefix=prefix, kargs=kargs) def debug(self, state): return self._delegate.debug(state) def next(self, state, count=1): return self._delegate.next(state, count=count) def join(self, state, *values): return self._delegate.join(state, *values) def empty(self, state): return self._delegate.empty(state) def line(self, state, empty_ok): return self._delegate.line(state, empty_ok) def len(self, state): return self._delegate.len(state) def stream(self, state, value, id_=None): return self._delegate.stream(state, value, id_) def deepest(self): return self._delegate.deepest() def delta(self, state): return self._delegate.delta(state) def eq(self, state1, state2): return self._delegate.eq(state1, state2) def new_max(self, state): return self._delegate.new_max(state) def cacheable(self): return self._delegate.cacheable() LEPL-5.1.3/src/lepl/stream/core.py0000644000175000001440000002736011731117215017242 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Default implementations of the stream classes. A stream is a tuple (state, helper), where `state` will vary from location to location, while `helper` is an "unchanging" instance of `StreamHelper`, defined below. For simple streams state can be a simple integer and this approach avoids the repeated creation of objects. More complex streams may choose to not use the state at all, simply creating a new helper at each point. ''' from abc import ABCMeta from lepl.support.lib import fmt #class _SimpleStream(metaclass=ABCMeta): # Python 2.6 # pylint: disable-msg=W0105, C0103 _StreamHelper = ABCMeta('_StreamHelper', (object, ), {}) '''ABC used to identify streams.''' DUMMY_HELPER = object() '''Allows tests to specify an arbitrary helper in results.''' OFFSET, LINENO, CHAR = range(3) '''Indices into delta.''' class StreamHelper(_StreamHelper): ''' The interface that all helpers should implement. ''' def __init__(self, id=None, factory=None, max=None, global_kargs=None, cache_level=None): from lepl.stream.factory import DEFAULT_STREAM_FACTORY self.id = id if id is not None else hash(self) self.factory = factory if factory else DEFAULT_STREAM_FACTORY self.max = max if max else MutableMaxDepth() self.global_kargs = global_kargs if global_kargs else {} self.cache_level = 1 if cache_level is None else cache_level def __repr__(self): '''Simplify for comparison in tests''' return '' def __eq__(self, other): return other is DUMMY_HELPER or super(StreamHelper, self).__eq__(other) def __hash__(self): return super(StreamHelper, self).__hash__() def key(self, state, other): ''' Generate an object that can be hashed (implements __hash__ and __eq__). See `HashKey`. ''' raise NotImplementedError def kargs(self, state, prefix='', kargs=None): ''' Generate a dictionary of values that describe the stream. These may be extended by subclasses. They are provided to `syntax_error_kargs`, for example. `prefix` modifies the property names `kargs` allows values to be provided. These are *not* overwritten, so if there is a name clash the provided value remains. Note: Calculating this can be expensive; use only for error messages, not debug messages (that may be discarded). The following names will be defined (at a minimum). For these value the "global" prefix indicates the underlying stream when, for example, tokens are used (other values will be relative to the token). If tokens etc are not in use then global and non-global values will agree. - data: a line representing the data, highlighting the current offset - global_data: as data, but for the entire sequence - text: as data, but without a "[...]" at the end - global_text: as text, but for the entire sequence - type: the type of the sequence - global_type: the type of the entire sequence - global_offset: a 0-based index into the underlying sequence These values are always local: - offset: a 0-based index into the sequence - rest: the data following the current point - repr: the current value, or - str: the current value, or an empty string These values are always global: - filename: a filename, if available, or the type - lineno: a 1-based line number for the current offset - char: a 1-based character count within the line for the current offset - location: a summary of the current location ''' raise NotImplementedError def fmt(self, state, template, prefix='', kargs=None): '''fmt a message using the expensive kargs function.''' return fmt(template, **self.kargs(state, prefix=prefix, kargs=kargs)) def debug(self, state): '''Generate an inexpensive debug message.''' raise NotImplementedError def next(self, state, count=1): ''' Return (value, stream) where `value` is the next value (or values if count > 1) from the stream and `stream` is advanced to the next character. Note that `value` is always a sequence (so if the stream is a list of integers, and `count`=1, then it will be a unitary list, for example). Should raise StopIteration when no more data are available. ''' raise StopIteration def join(self, state, *values): ''' Join sequences of values into a single sequence. ''' raise NotImplementedError def empty(self, state): ''' Return true if no more data available. ''' raise NotImplementedError def line(self, state, empty_ok): ''' Return (values, stream) where `values` correspond to something like "the rest of the line" from the current point and `stream` is advanced to the point after the line ends. If `empty_ok` is true and we are at the end of a line, return an empty line, otherwise advance (and maybe raise a StopIteration). ''' raise NotImplementedError def len(self, state): ''' Return the remaining length of the stream. Streams of unknown length (iterables) should raise a TypeError. ''' raise NotImplementedError def stream(self, state, value, id_=None, max=None): ''' Return a new stream that encapsulates the value given, starting at `state`. IMPORTANT: the stream used is the one that corresponds to the start of the value. For example: (line, next_stream) = s_line(stream, False) token_stream = s_stream(stream, line) # uses stream, not next_stream This is used when processing Tokens, for example, or columns (where fragments in the correct column area are parsed separately). ''' raise NotImplementedError def deepest(self): ''' Return a stream that represents the deepest match. The stream may be incomplete in some sense (it may not be possible to use it for parsing more data), but it will have usable fmt and kargs methods. ''' raise NotImplementedError def delta(self, state): ''' Return the offset, lineno and char of the current point, relative to the entire stream, as a tuple. ''' raise NotImplementedError def eq(self, state1, state2): ''' Are the two states equal? ''' return state1 == state2 def new_max(self, state): ''' Return (old max, new stream), where new stream uses a new max. This is used when we want to read from the stream without affecting the max (eg when looking ahead to generate tokens). ''' raise NotImplementedError def cacheable(self): ''' Is this stream cacheable? ''' return self.cache_level > 0 # The following are helper functions that allow the methods above to be # called on (state, helper) tuples s_key = lambda stream, other=None: stream[1].key(stream[0], other) '''Invoke helper.key(state, other)''' s_kargs = lambda stream, prefix='', kargs=None: stream[1].kargs(stream[0], prefix=prefix, kargs=kargs) '''Invoke helper.kargs(state, prefix, kargs)''' s_fmt = lambda stream, template, prefix='', kargs=None: stream[1].fmt(stream[0], template, prefix=prefix, kargs=kargs) '''Invoke helper.fmt(state, template, prefix, kargs)''' s_debug = lambda stream: stream[1].debug(stream[0]) '''Invoke helper.debug()''' s_next = lambda stream, count=1: stream[1].next(stream[0], count=count) '''Invoke helper.next(state, count)''' s_join = lambda stream, *values: stream[1].join(stream[0], *values) '''Invoke helper.join(*values)''' s_empty = lambda stream: stream[1].empty(stream[0]) '''Invoke helper.empty(state)''' s_line = lambda stream, empty_ok: stream[1].line(stream[0], empty_ok) '''Invoke helper.line(state, empty_ok)''' s_len = lambda stream: stream[1].len(stream[0]) '''Invoke helper.len(state)''' s_stream = lambda stream, value, id_=None, max=None: stream[1].stream(stream[0], value, id_=id_, max=max) '''Invoke helper.stream(state, value)''' s_deepest = lambda stream: stream[1].deepest() '''Invoke helper.deepest()''' s_delta = lambda stream: stream[1].delta(stream[0]) '''Invoke helper.delta(state)''' s_eq = lambda stream1, stream2: stream1[1].eq(stream1[0], stream2[0]) '''Compare two streams (which should have identical helpers)''' s_id = lambda stream: stream[1].id '''Access the ID attribute.''' s_factory = lambda stream: stream[1].factory '''Access the factory attribute.''' s_max = lambda stream: stream[1].max '''Access the max attribute.''' s_new_max = lambda stream: stream[1].new_max(stream[0]) '''Invoke helper.new_max(state).''' s_global_kargs = lambda stream: stream[1].global_kargs '''Access the global_kargs attribute.''' s_cache_level = lambda stream: stream[1].cache_level '''Access the cache_level attribute.''' s_cacheable = lambda stream: stream[1].cacheable() '''Is the stream cacheable?''' class MutableMaxDepth(object): ''' Track maximum depth (offset) reached and the associated stream. Used to generate error message for incomplete matches. ''' def __init__(self): self.depth = 0 self.stream = None def update(self, depth, stream): # the '=' here allows a token to nudge on to the next stream without # changing the offset (when count=0 in s_next) if depth >= self.depth or not self.stream: self.depth = depth self.stream = stream def get(self): return self.stream class HashKey(object): ''' Used to store a value with a given hash. ''' __slots__ = ['hash', 'eq'] def __init__(self, hash, eq=None): self.hash = hash self.eq = eq def __hash__(self): return self.hash def __eq__(self, other): try: return other.hash == self.hash and other.eq == self.eq except AttributeError: return False LEPL-5.1.3/src/lepl/contrib/0000755000175000001440000000000011764776700016115 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/contrib/__init__.py0000644000175000001440000000335111731117151020207 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Contributor(s): # - "mereandor" / mereandor at gmail dot com (Roman) # Portions created by the Contributors are Copyright (C) 2009 # The Contributors. All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Contributed code. For now I'm keeping it in a separate module to simplify licencing/copyright. This may change in the future. ''' LEPL-5.1.3/src/lepl/contrib/matchers.py0000644000175000001440000001114711731117151020260 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Contributor(s): # - "mereandor" / mereandor at gmail dot com (Roman) # Portions created by the Contributors are Copyright (C) 2009 # The Contributors. All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Contributed matchers. ''' from copy import copy from lepl.matchers.derived import Optional from lepl.matchers.combine import And, Or, BaseSearch from lepl.matchers.matcher import is_child from lepl.matchers.transform import Transform from lepl.matchers.operators import _BaseSeparator # (c) 2009 "mereandor" / mereandor at gmail dot com (Roman), Andrew Cooke # pylint: disable-msg=R0903 class SmartSeparator2(_BaseSeparator): ''' A substitute `Separator` with different semantics for optional matchers. This identifies optional matchers by type (whether they subclass `BaseSearch`) and then constructs a replacement that adds space only when both matchers are used. See also `SmartSeparator1`, which is more general but less efficient. ''' def _replacements(self, separator): ''' Provide alternative definitions of '&` and `[]`. ''' def non_optional_copy(matcher): ''' Check whether a matcher is optional and, if so, make it not so. ''' # both of the "copy" calls below make me nervous - it's not the # way the rest of lepl works - but i don't have any specific # criticism, or a good alternative. required, optional = matcher, False if isinstance(matcher, Transform): temp, optional = non_optional_copy(matcher.matcher) if optional: required = copy(matcher) required.matcher = temp elif is_child(matcher, BaseSearch, fail=False): # this introspection only works because Repeat sets named # (ie kargs) arguments. optional = (matcher.start == 0) if optional: required = copy(matcher) required.start = 1 if required.stop == 1: required = required.first return required, optional # pylint: disable-msg=W0141 def and_(matcher_a, matcher_b): ''' Combine two matchers. ''' (requireda, optionala) = non_optional_copy(matcher_a) (requiredb, optionalb) = non_optional_copy(matcher_b) if not (optionala or optionalb): return And(matcher_a, separator, matcher_b) else: matcher = Or( *filter((lambda x: x is not None), [ And(Optional(And(requireda, separator)), requiredb) if optionala else None, And(requireda, Optional(And(separator, requiredb))) if optionalb else None])) if optionala and optionalb: # making this explicit allows chaining (we can detect it # when called again in a tree of "ands") matcher = Optional(matcher) return matcher return (and_, self._repeat(separator)) LEPL-5.1.3/src/lepl/contrib/json.py0000644000175000001440000000563111731117151017424 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Contributor(s): # - "magcius" / jstpierre at mecheye dot net # Portions created by the Contributors are Copyright (C) 2010 # The Contributors. All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' http://www.ietf.org/rfc/rfc4627.txt ''' from lepl import * from lepl.support.lib import str def Simple(): ''' A simple JSON parser. ''' escapes = {'\\b': '\b', '\\f': '\f', '\\n': '\n', '\\t': '\t', '\\r': '\r'} def unescape_string(text): return escapes[text] def unescape_unicode(text): # Python 3 only return bytes(str(text), 'utf8').decode('unicode_escape') value = Delayed() unicode_escape = ("\\u" + Digit()[4, ...]) >> unescape_unicode regular_escape = ("\\" + Any("bfntr")) >> unescape_string escape = (unicode_escape | regular_escape) string = (Drop('"') & (AnyBut('"\\') | escape)[...] & Drop('"')) number = Real() >> float comma = Drop(',') with DroppedSpace(): array = Drop("[") & value[:, comma] & Drop("]") > list pair = string & Drop(":") & value > tuple object_ = Drop("{") & pair[:, comma] & Drop("}") > dict value += ((Literal('true') >= (lambda x: True)) | (Literal('false') >= (lambda x: False)) | (Literal('null') >= (lambda x: None)) | array | object_ | number | string) return value LEPL-5.1.3/src/lepl/support/0000755000175000001440000000000011764776700016171 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/support/_test/0000755000175000001440000000000011764776700017307 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/support/_test/graph.py0000644000175000001440000001402711731117215020746 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.support.graph module. ''' from unittest import TestCase from lepl.support.graph import ArgAsAttributeMixin, preorder, postorder, reset, \ ConstructorWalker, Clone, make_proxy, LEAF, leaves from lepl.support.node import Node # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, C0102, C0321, W0141 # (dude this is just a test) class SimpleNode(ArgAsAttributeMixin): # pylint: disable-msg=E1101 def __init__(self, label, *nodes): super(SimpleNode, self).__init__() self._arg(label=label) self._args(nodes=nodes) def __str__(self): return str(self.label) def __repr__(self): args = [str(self.label)] args.extend(map(repr, self.nodes)) return 'SimpleNode(%s)' % ','.join(args) def __getitem__(self, index): return self.nodes[index] def __len__(self): return len(self.nodes) def graph(): return SimpleNode(1, SimpleNode(11, SimpleNode(111), SimpleNode(112)), SimpleNode(12)) class OrderTest(TestCase): def test_preorder(self): result = [node.label for node in preorder(graph(), SimpleNode, exclude=LEAF)] assert result == [1, 11, 111, 112, 12], result def test_postorder(self): result = [node.label for node in postorder(graph(), SimpleNode, exclude=LEAF)] assert result == [111, 112, 11, 12, 1], result class ResetTest(TestCase): def test_reset(self): nodes = preorder(graph(), SimpleNode, exclude=LEAF) assert next(nodes).label == 1 assert next(nodes).label == 11 reset(nodes) assert next(nodes).label == 1 assert next(nodes).label == 11 class CloneTest(TestCase): def test_simple(self): g1 = graph() g2 = ConstructorWalker(g1, SimpleNode)(Clone()) assert repr(g1) == repr(g2) assert g1 is not g2 def assert_same(self, text1, text2): assert self.__clean(text1) == self.__clean(text2), self.__clean(text1) def __clean(self, text): depth = 0 result = '' for c in text: if c == '<': depth += 1 elif c == '>': depth -= 1 elif depth == 0: result += c return result def test_loop(self): (s, n) = make_proxy() g1 = SimpleNode(1, SimpleNode(11, SimpleNode(111), SimpleNode(112), n), SimpleNode(12)) s(g1) g2 = ConstructorWalker(g1, SimpleNode)(Clone()) self.assert_same(repr(g1), repr(g2)) def test_loops(self): (s1, n1) = make_proxy() (s2, n2) = make_proxy() g1 = SimpleNode(1, SimpleNode(11, SimpleNode(111, n2), SimpleNode(112), n1), SimpleNode(12, n1)) s1(g1) s2(next(iter(g1))) g2 = ConstructorWalker(g1, SimpleNode)(Clone()) self.assert_same(repr(g1), repr(g2)) def test_loops_with_proxy(self): (s1, n1) = make_proxy() (s2, n2) = make_proxy() g1 = SimpleNode(1, SimpleNode(11, SimpleNode(111, n2), SimpleNode(112), n1), SimpleNode(12, n1)) s1(g1) s2(next(iter(g1))) g2 = ConstructorWalker(g1, SimpleNode)(Clone()) g3 = ConstructorWalker(g2, SimpleNode)(Clone()) self.assert_same(repr(g1), repr(g3)) # print(repr(g3)) class GenericOrderTest(TestCase): def test_preorder(self): g = [1, [11, [111, 112], 12]] result = [node for node in preorder(g, list) if isinstance(node, int)] assert result == [1, 11, 111, 112, 12], result def test_postorder(self): ''' At first I was surprised about this (compare with SimpleNode results above), but these are leaf nodes, so postorder doesn't change anything (there's no difference between "before visiting" and "after visiting" a leaf). ''' g = [1, [11, [111, 112], 12]] result = [node for node in postorder(g, list) if isinstance(node, int)] assert result == [1, 11, 111, 112, 12], result class LeafTest(TestCase): def test_order(self): tree = Node(1, 2, Node(3, Node(4), Node(), 5)) result = list(leaves(tree, Node)) assert result == [1,2,3,4,5], result LEPL-5.1.3/src/lepl/support/_test/__init__.py0000644000175000001440000000335711731117151021407 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.matchers package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.support._test.graph import lepl.support._test.lib import lepl.support._test.list import lepl.support._test.node import lepl.support._test.timer LEPL-5.1.3/src/lepl/support/_test/list.py0000644000175000001440000001070411731117151020615 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.support.list module. ''' #from logging import basicConfig, DEBUG, INFO from unittest import TestCase from lepl import * from lepl._test.base import assert_str from lepl.support.list import clone_sexpr, count_sexpr, join, \ sexpr_flatten, sexpr_to_str class FoldTest(TestCase): def test_clone(self): def test(list_): copy = clone_sexpr(list_) assert copy == list_, sexpr_to_str(copy) test([]) test([1,2,3]) test([[1],2,3]) test([[[1]],2,3]) test([[[1]],2,[3]]) test([[[1]],'2',[3]]) test(((1),List([2,3,[4]]))) def test_count(self): def test(list_, value): measured = count_sexpr(list_) assert measured == value, measured test([], 0) test([1,2,3], 3) test([[1],2,3], 3) test([[[1,2],3],'four',5], 5) def test_flatten(self): def test(list_, joined, flattened): if joined is not None: result = join(list_) assert result == joined, result result = sexpr_flatten(list_) assert result == flattened, result test([[1],[2, [3]]], [1,2,[3]], [1,2,3]) test([], [], []) test([1,2,3], None, [1,2,3]) test([[1],2,3], None, [1,2,3]) test([[[1,'two'],3],'four',5], None, [1,'two',3,'four',5]) def test_sexpr_to_string(self): def test(list_, value): result = sexpr_to_str(list_) assert result == value, result test([1,2,3], '[1,2,3]') test((1,2,3), '(1,2,3)') test(List([1,2,3]), 'List([1,2,3])') class Foo(List): pass test(Foo([1,2,(3,List([4]))]), 'Foo([1,2,(3,List([4]))])') class AstTest(TestCase): def test_ast(self): class Term(List): pass class Factor(List): pass class Expression(List): pass expr = Delayed() number = Digit()[1:,...] >> int with Separator(Drop(Regexp(r'\s*'))): term = number | '(' & expr & ')' > Term muldiv = Any('*/') factor = term & (muldiv & term)[:] > Factor addsub = Any('+-') expr += factor & (addsub & factor)[:] > Expression line = expr & Eos() ast = line.parse_string('1 + 2 * (3 + 4 - 5)')[0] text = str(ast) assert_str(text, """Expression +- Factor | `- Term | `- 1 +- '+' `- Factor +- Term | `- 2 +- '*' `- Term +- '(' +- Expression | +- Factor | | `- Term | | `- 3 | +- '+' | +- Factor | | `- Term | | `- 4 | +- '-' | `- Factor | `- Term | `- 5 `- ')'""") class EmptyListBugTest(TestCase): def test_empty(self): s = str(List()) assert s == 'List', s LEPL-5.1.3/src/lepl/support/_test/node.py0000644000175000001440000002064211731117151020571 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.support.node module. ''' #from logging import basicConfig, DEBUG, INFO from unittest import TestCase from lepl import Delayed, Digit, Any, Node, make_error, node_throw, Or, Space, \ AnyBut, Eos from lepl.support.graph import order, PREORDER, POSTORDER, LEAF from lepl._test.base import assert_str # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, C0102, C0321, R0201, R0903 # (dude this is just a test) class NodeTest(TestCase): def test_node(self): #basicConfig(level=DEBUG) class Term(Node): pass class Factor(Node): pass class Expression(Node): pass expression = Delayed() number = Digit()[1:,...] > 'number' term = (number | '(' / expression / ')') > Term muldiv = Any('*/') > 'operator' factor = (term / (muldiv / term)[0::]) > Factor addsub = Any('+-') > 'operator' expression += (factor / (addsub / factor)[0::]) > Expression p = expression.get_parse_string() ast = p('1 + 2 * (3 + 4 - 5)') assert_str(ast[0], """Expression +- Factor | +- Term | | `- number '1' | `- ' ' +- operator '+' +- ' ' `- Factor +- Term | `- number '2' +- ' ' +- operator '*' +- ' ' `- Term +- '(' +- Expression | +- Factor | | +- Term | | | `- number '3' | | `- ' ' | +- operator '+' | +- ' ' | +- Factor | | +- Term | | | `- number '4' | | `- ' ' | +- operator '-' | +- ' ' | `- Factor | `- Term | `- number '5' `- ')'""") class ListTest(TestCase): def test_list(self): #basicConfig(level=DEBUG) expression = Delayed() number = Digit()[1:,...] > 'number' term = (number | '(' / expression / ')') > list muldiv = Any('*/') > 'operator' factor = (term / (muldiv / term)[0:]) > list addsub = Any('+-') > 'operator' expression += (factor / (addsub / factor)[0:]) > list ast = expression.parse_string('1 + 2 * (3 + 4 - 5)') assert ast == [[[[('number', '1')], ' '], ('operator', '+'), ' ', [[('number', '2')], ' ', ('operator', '*'), ' ', ['(', [[[('number', '3')], ' '], ('operator', '+'), ' ', [[('number', '4')], ' '], ('operator', '-'), ' ', [[('number', '5')]]], ')']]]], ast class ErrorTest(TestCase): def test_error(self): #basicConfig(level=INFO) class Term(Node): pass class Factor(Node): pass class Expression(Node): pass expression = Delayed() number = Digit()[1:,...] > 'number' term = Or( AnyBut(Space() | Digit() | '(')[1:,...] ^ 'unexpected text: {results[0]}', number > Term, number ** make_error("no ( before {out_rest}") / ')' >> node_throw, '(' / expression / ')' > Term, ('(' / expression / Eos()) ** make_error("no ) for {in_rest}") >> node_throw) muldiv = Any('*/') > 'operator' factor = (term / (muldiv / term)[0:,r'\s*']) > Factor addsub = Any('+-') > 'operator' expression += (factor / (addsub / factor)[0:,r'\s*']) > Expression line = expression / Eos() parser = line.get_parse_string() try: parser('1 + 2 * 3 + 4 - 5)')[0] assert False, 'expected error' except SyntaxError as e: assert e.msg == "no ( before ')'", e.msg try: parser('1 + 2 * (3 + 4 - 5') assert False, 'expected error' except SyntaxError as e: assert e.msg == "no ) for '(3 + 4 - 5'", e.msg try: parser('1 + 2 * foo') assert False, 'expected error' except SyntaxError as e: assert e.msg == "unexpected text: foo", e.msg class EqualityTest(TestCase): def test_object_eq(self): a = Node('a') b = Node('a') assert a != b assert b != a assert a is not b assert b is not a assert a == a assert b == b assert a is a assert b is b def test_recursive_eq(self): a = Node('a', Node('b')) b = Node('a', Node('b')) c = Node('a', Node('c')) assert a._recursively_eq(b) assert not a._recursively_eq(c) class ChildrenTest(TestCase): def test_children(self): a = Node('a') for c in a: assert c == 'a', c class OrderTest(TestCase): def tree(self): return Node('a', Node('b', Node('c', Node('d'), Node('e')), Node('f')), Node('g'), Node('h', Node('i', Node('j'), Node('k')), Node('l'))) def order(self, tree, flags): return list(map(lambda x: x[0], order(tree, flags, Node, LEAF))) def test_orders(self): tree = self.tree() ordered = self.order(tree, PREORDER) assert ordered == ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l'], ordered ordered = self.order(tree, POSTORDER) assert ordered == ['d', 'e', 'c', 'f', 'b', 'g', 'j', 'k', 'i', 'l', 'h', 'a'], ordered def test_str(self): text = str(self.tree()) assert text == """Node +- 'a' +- Node | +- 'b' | +- Node | | +- 'c' | | +- Node | | | `- 'd' | | `- Node | | `- 'e' | `- Node | `- 'f' +- Node | `- 'g' `- Node +- 'h' +- Node | +- 'i' | +- Node | | `- 'j' | `- Node | `- 'k' `- Node `- 'l'""", text class NestedNamedTest(TestCase): def tree(self): return Node(('a', Node('A')), ('b', Node('B'))) def test_str(self): text = str(self.tree()) assert text == """Node +- a | `- 'A' `- b `- 'B'""", text class NodeEqualityTest(TestCase): def test_equals(self): a = Node('abc') b = Node('abc') assert a == a assert not (a != a) assert not (a == b) assert a._recursively_eq(b) assert Node(a) != a assert Node(a)._recursively_eq(Node(a)) assert not Node(a)._recursively_eq(a) LEPL-5.1.3/src/lepl/support/_test/lib.py0000644000175000001440000000665311731117151020420 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.support.lib module. ''' from unittest import TestCase from lepl.support.lib import assert_type, CircularFifo # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, R0201, R0913 # (dude this is just a test) class AssertTypeTestCase(TestCase): def test_ok(self): assert_type('', 1, int) assert_type('', '', str) assert_type('', None, int, none_ok=True) def test_bad(self): self.assert_bad('The foo attribute in Bar', '', int, False, "The foo attribute in Bar (value '') must be of type int.") self.assert_bad('The foo attribute in Bar', None, int, False, "The foo attribute in Bar (value None) must be of type int.") def assert_bad(self, name, value, type_, none_ok, msg): try: assert_type(name, value, type_, none_ok=none_ok) assert False, 'Expected failure' except TypeError as e: assert e.args[0] == msg, e.args[0] class CircularFifoTest(TestCase): def test_expiry(self): fifo = CircularFifo(3) assert None == fifo.append(1) assert None == fifo.append(2) assert None == fifo.append(3) for i in range(4,10): assert i-3 == fifo.append(i) def test_pop(self): fifo = CircularFifo(3) for i in range(1,3): for j in range(i): assert None == fifo.append(j) for j in range(i): popped = fifo.pop() assert j == popped, '{0} {1} {2}'.format(i, j, popped) for i in range(4): fifo.append(i) assert 1 == fifo.pop() def test_list(self): fifo = CircularFifo(3) for i in range(7): fifo.append(i) assert [4,5,6] == list(fifo) fifo.append(7) assert [5,6,7] == list(fifo) fifo.append(8) assert [6,7,8] == list(fifo) fifo.append(9) assert [7,8,9] == list(fifo) LEPL-5.1.3/src/lepl/support/_test/timer.py0000644000175000001440000000773511731117151020774 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.support.timer module. ''' from unittest import TestCase from lepl import * from lepl.support.lib import StringIO class TimerTest(TestCase): def test_luca(self): ''' See mailing list. ''' integer = Token(Integer()) >> int uletter = Token(Upper()) real = Token(Real()) >> float data_line = Line(integer & uletter & real[6]) table = data_line[1:] source = '''1 G 0.0 0.0 0.0 0.0 0.0 0.0 2 G 0.0 0.0 0.0 0.0 0.0 0.0 3 G 0.0 0.0 0.0 0.0 0.0 0.0 4 G 0.0 0.0 0.0 0.0 0.0 0.0 5 G 0.0 0.0 0.0 0.0 0.0 0.0 6 G 0.0 0.0 0.0 0.0 0.0 0.0 7 G 0.0 0.0 0.0 0.0 0.0 0.0 8 G 0.0 0.0 0.0 0.0 0.0 0.0 9 G 0.0 0.0 -9.856000E-05 -1.444699E-17 1.944000E-03 0.0 10 G 0.0 0.0 -9.856000E-05 -1.427843E-17 1.944000E-03 0.0 11 G 0.0 0.0 -1.085216E-02 -2.749537E-16 1.874400E-02 0.0 12 G 0.0 0.0 -1.085216E-02 -2.748317E-16 1.874400E-02 0.0 13 G 0.0 0.0 -3.600576E-02 -6.652665E-16 3.074400E-02 0.0 14 G 0.0 0.0 -3.600576E-02 -6.717988E-16 3.074400E-02 0.0 15 G 0.0 0.0 -7.075936E-02 -8.592844E-16 3.794400E-02 0.0 16 G 0.0 0.0 -7.075936E-02 -8.537008E-16 3.794400E-02 0.0 17 G 0.0 0.0 -1.103130E-01 -9.445027E-16 4.034400E-02 0.0 18 G 0.0 0.0 -1.103130E-01 -9.538811E-16 4.034400E-02 0.0 100 G 0.0 0.0 0.0 0.0 0.0 0.0 200 G 0.0 0.0 0.0 0.0 0.0 0.0 ''' out = StringIO() print_timing(source, {'Real()': table.clone().config.lines().matcher, 'Real() no memoize': table.clone().config.lines().no_memoize().matcher}, count_compile=1, out=out) table = out.getvalue() print(table) assert 'Timing Results' in table, table LEPL-5.1.3/src/lepl/support/graph.py0000644000175000001440000006431711731117215017637 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Graph traversal - supports generic Python classes, but has extensions for classes that record their own constructor arguments (and so allow deep cloning of graphs). The fundamental `dfs_edges` routine will traverse over (ie provides an iterator that returns (flag, node) pairs, where flag describes the type of node and ordering) a graph made of iterable Python objects. Only the __iter__ method (implemented by all containers) is required. However, in general this is too broad (for example, Strings are iterable, and single character strings contain themselves), so a type can be specified which identifies those nodes to be treated as "interior" nodes. Children of interior nodes are returned as "leaf" nodes, but are not iterated over themselves. The `order` function provides a simpler interface to this traversal, allowing a particular order to be generated, and, for example, optionally excluding leaf nodes. `ConstructorGraphNode` is motivated by data constructors and exposes its constructor arguments (this is important because if we are cloning a graph we want to replace constructor arguments that correspond to child nodes with their clones). This currently has a single implementation, `ArgAsAttributeMixin` (there used to be another, but it was equivalent to the generic case with `SimpleWalker`). The 'Walker' (`SimpleWalker` and `ConstructorWalker`) and `Visitor` classes provide a different approach to traversing the graph (compared to the simple sequences of nodes provided by `dfs_edges` et al), that reflects the emphasis on constructors described above: the walker takes a visitor sub-class and calls it in a way that replicates the original calls to the node constructors. ''' from collections import Sequence, deque from lepl.support.lib import compose, safe_in, safe_add, empty, fmt,\ fallback_add FORWARD = 1 # forward edge BACKWARD = 2 # backward edge NONTREE = 4 # cyclic edge ROOT = 8 # root node (not an edge) NODE = 16 # child is a 'normal' node (of the given type) LEAF = 32 # child is a leaf node (not the given type) POSTORDER = BACKWARD | NONTREE PREORDER = FORWARD | NONTREE # pylint: disable-msg=R0911 # many yields appropriate here def dfs_edges(node, type_): ''' Iterative DFS, based on http://www.ics.uci.edu/~eppstein/PADS/DFS.py Returns forward and reverse edges. Also returns root node in correct order for pre- (FORWARD) and post- (BACKWARD) ordering. ``type_`` selects which values are iterated over. Children which are not of this type are flagged with LEAF. ''' while isinstance(node, type_): try: stack = [(node, iter(node), ROOT)] yield node, node, FORWARD | ROOT visited = set() visited = fallback_add(visited, node) while stack: parent, children, ptype = stack[-1] try: child = next(children) if isinstance(child, type_): if safe_in(child, visited, False): yield parent, child, NONTREE else: stack.append((child, iter(child), NODE)) yield parent, child, FORWARD | NODE visited = fallback_add(visited, child) else: stack.append((child, empty(), LEAF)) yield parent, child, FORWARD | LEAF except StopIteration: stack.pop() if stack: yield stack[-1][0], parent, BACKWARD | ptype yield node, node, BACKWARD | ROOT return except Reset: yield # in response to the throw (ignored by caller) class Reset(Exception): ''' An exception that can be passed to dfs_edges to reset the traversal. ''' pass def reset(generator): ''' Reset the traversal by raising Reset. ''' generator.throw(Reset()) def order(node, include, type_, exclude=0): ''' An ordered sequence of nodes. The ordering is given by 'include' (see the constants PREORDER etc above). ''' while True: try: for (_parent, child, direction) in dfs_edges(node, type_): if (direction & include) and not (direction & exclude): yield child return except Reset: yield # in response to the throw (ignored by caller) def preorder(node, type_, exclude=0): ''' The nodes in preorder. ''' return order(node, PREORDER, type_, exclude=exclude) def postorder(node, type_, exclude=0): ''' The nodes in postorder. ''' return order(node, POSTORDER, type_, exclude=exclude) def leaves(node, type_=None): ''' The leaf nodes. ''' if type_ is None: type_ = type(node) return order(node, FORWARD, type_, exclude=NODE|ROOT) def loops(node, type_): ''' Return all loops from the given node. Each loop is a list that starts and ends with the given node. ''' stack = [[node]] known = set([node]) # avoid getting lost in sub-loops while stack: ancestors = stack.pop() parent = ancestors[-1] if isinstance(parent, type_): for child in parent: family = list(ancestors) family.append(child) if child is node: yield family else: if not safe_in(child, known): stack.append(family) known = fallback_add(known, child) # pylint: disable-msg=R0903 # interface class ConstructorGraphNode(object): ''' An interface that provides information on constructor arguments. This is used by `ConstructorWalker` to provide the results of walking child nodes in the same fmt as those nodes were provided in the constructor. The main advantage is that the names of named arguments are associated with the appropriate results. For this to work correctly there is assumed to be a close relationship between constructor arguments and children (there is a somewhat implicit link between Python object constructors and type constructors in, say, Haskell). Exactly how constructor argmuents and children match depends on the implementation, but `ConstructorWalker` assumes that child nodes (from __iter__()) are visited before the same nodes appear in constructor arguments during depth-first postorder traversal. ''' # pylint: disable-msg=R0201 # interface def _constructor_args(self): ''' Regenerate the constructor arguments (returns (args, kargs)). ''' raise Exception('Not implemented') class ArgAsAttributeMixin(ConstructorGraphNode): ''' Constructor arguments are stored as attributes; their names are also stored in order so that the arguments can be constructed. This assumes that all names are unique. '*args' are named "without the *". ''' def __init__(self): super(ArgAsAttributeMixin, self).__init__() self.__arg_names = [] self.__karg_names = [] def __set_attribute(self, name, value): ''' Add a single argument as a simple property. ''' setattr(self, name, value) return name def _arg(self, **kargs): ''' Set a single named argument as an attribute (the signature uses kargs so that the name does not need to be quoted). The attribute name is added to self.__arg_names. ''' assert len(kargs) == 1 for name in kargs: self.__arg_names.append(self.__set_attribute(name, kargs[name])) def _karg(self, **kargs): ''' Set a single keyword argument (ie with default) as an attribute (the signature uses kargs so that the name does not need to be quoted). The attribute name is added to self.__karg_names. ''' assert len(kargs) == 1 for name in kargs: self.__karg_names.append(self.__set_attribute(name, kargs[name])) def _args(self, **kargs): ''' Set a *arg as an attribute (the signature uses kargs so that the attribute name does not need to be quoted). The name (without '*') is added to self.__arg_names. ''' assert len(kargs) == 1 for name in kargs: assert isinstance(kargs[name], Sequence), kargs[name] self.__arg_names.append('*' + self.__set_attribute(name, kargs[name])) def _kargs(self, kargs): ''' Set **kargs as attributes. The attribute names are added to self.__arg_names. ''' for name in kargs: self.__karg_names.append(self.__set_attribute(name, kargs[name])) def __args(self): ''' All (non-keyword) arguments. ''' args = [getattr(self, name) for name in self.__arg_names if not name.startswith('*')] for name in self.__arg_names: if name.startswith('*'): args.extend(getattr(self, name[1:])) return args def __kargs(self): ''' All keyword argmuents. ''' return dict((name, getattr(self, name)) for name in self.__karg_names) def _constructor_args(self): ''' Regenerate the constructor arguments. ''' return (self.__args(), self.__kargs()) def __iter__(self): ''' Return all children, in order. ''' for arg in self.__args(): yield arg for name in self.__karg_names: yield getattr(self, name) class Visitor(object): ''' The interface required by the walkers. ``loop`` is value returned when a node is re-visited. ``type_`` is set with the node type before constructor() is called. This allows constructor() itself to be invoked with the Python arguments used to construct the original graph. ''' def loop(self, value): ''' Called on nodes that belong to a loop (eg. in the `ConstructorWalker` nodes are visited in postorder, and this is called when a node is *first* found as a constructor argument (before bing found in the "postorder" traversal)). By default, do nothing. ''' pass def node(self, node): ''' Called when first visiting a node. By default, do nothing. ''' pass def constructor(self, *args, **kargs): ''' Called for node instances. The args and kargs are the values for the corresponding child nodes, as returned by this visitor. By default, do nothing. ''' pass def leaf(self, value): ''' Called for children that are not node instances. By default, do nothing. ''' pass # pylint: disable-msg=R0201 # interface def postprocess(self, result): ''' Called after walking, passed the match to the initial node. ''' return result class ConstructorWalker(object): ''' Tree walker (it handles cyclic graphs by ignoring repeated nodes). This is based directly on the catamorphism of the graph. The visitor encodes the type information. It may help to see the constructor arguments as type constructors. Nodes should be subclasses of `ConstructorGraphNode`. ''' def __init__(self, root, type_): self.__root = root self.__type = type_ def __call__(self, visitor): ''' Apply the visitor to each node in turn. ''' results = {} for node in postorder(self.__root, self.__type, exclude=LEAF): visitor.node(node) (args, kargs) = self.__arguments(node, visitor, results) # pylint: disable-msg=W0142 results[node] = visitor.constructor(*args, **kargs) return visitor.postprocess(results[self.__root]) def __arguments(self, node, visitor, results): ''' Collect arguments for the constructor. ''' # pylint: disable-msg=W0212 # (this is the ConstructorGraphNode interface; it's purposefully # like that to avoid conflicting with Node attributes) (old_args, old_kargs) = node._constructor_args() (new_args, new_kargs) = ([], {}) for arg in old_args: new_args.append(self.__value(arg, visitor, results)) for name in old_kargs: new_kargs[name] = self.__value(old_kargs[name], visitor, results) return (new_args, new_kargs) def __value(self, node, visitor, results): ''' Get a value for a particular constructor argument. ''' if isinstance(node, self.__type): if node in results: return results[node] else: return visitor.loop(node) else: return visitor.leaf(node) class SimpleWalker(object): ''' This works like `ConstructorWalker` for generic classes. Since it has no knowledge of constructor arguments it assumes that all children are passed like '*args'. This allows visitors written for `ConstructorGraphNode` trees to be used with arbitrary objects (as long as they follow the convention described above). ''' def __init__(self, root, type_): ''' Create a walker for the graph starting at the given node. ''' self.__root = root self.__type = type_ def __call__(self, visitor): ''' Apply the visitor to the nodes in the graph, in postorder. ''' # pylint: disable-msg=W0142 # (*args) pending = {} for (parent, node, kind) in dfs_edges(self.__root, self.__type): if kind & POSTORDER: if safe_in(node, pending): args = pending[node] del pending[node] else: args = [] if parent not in pending: pending[parent] = [] visitor.node(node) if kind & LEAF: pending[parent].append(visitor.leaf(node)) elif kind & NONTREE: pending[parent].append(visitor.loop(node)) else: pending[parent].append(visitor.constructor(*args)) return pending[self.__root][0] class PostorderWalkerMixin(object): ''' Add a 'postorder' method. ''' def __init__(self): super(PostorderWalkerMixin, self).__init__() self.__postorder = None self.__postorder_type = None def postorder(self, visitor, type_): ''' A shortcut that allows a visitor to be applied postorder. ''' if self.__postorder is None or self.__postorder_type != type_: self.__postorder = ConstructorWalker(self, type_) self.__postorder_type = type_ return self.__postorder(visitor) class _LineOverflow(Exception): ''' Used internally in `ConstructorStr`. ''' pass class ConstructorStr(Visitor): ''' Reconstruct the constructors used to generate the graph as a string (useful for repr). Internally, data is stored as a list of (indent, line) pairs. ''' def __init__(self, line_length=80): super(ConstructorStr, self).__init__() self.__line_length = line_length self.__name = None def node(self, node): ''' Store the node's class name for later use. ''' try: self.__name = node.delegate.__class__.__name__ except AttributeError: self.__name = node.__class__.__name__ def loop(self, value): ''' Replace loop nodes by a marker. ''' return [[0, '']] def constructor(self, *args, **kargs): ''' Build the constructor string, given the node and arguments. ''' contents = [] for arg in args: if contents: contents[-1][1] += ', ' contents.extend([indent+1, line] for (indent, line) in arg) for name in kargs: if contents: contents[-1][1] += ', ' arg = kargs[name] contents.append([arg[0][0]+1, name + '=' + arg[0][1]]) contents.extend([indent+1, line] for (indent, line) in arg[1:]) lines = [[0, self.__name + '(']] + contents lines[-1][1] += ')' return lines def leaf(self, value): ''' Non-node nodes (attributes) are displayed using repr. ''' return [[0, repr(value)]] def postprocess(self, lines): ''' This is an ad-hoc algorithm to make the final string reasonably compact. It's ugly, bug-prone and completely arbitrary, but it seems to work.... ''' sections = deque() (scan, indent) = (0, -1) while scan < len(lines): (i, _) = lines[scan] if i > indent: indent = i sections.append((indent, scan)) elif i < indent: (scan, indent) = self.__compress(lines, sections.pop()[1], scan) scan = scan + 1 while sections: self.__compress(lines, sections.pop()[1], len(lines)) return self.__fmt(lines) def __compress(self, lines, start, stop): ''' Try a compact version first. ''' try: return self.__all_on_one_line(lines, start, stop) except _LineOverflow: return self.__bunch_up(lines, start, stop) def __bunch_up(self, lines, start, stop): ''' Scrunch adjacent lines together. ''' (indent, _) = lines[start] while start+1 < stop: if indent == lines[start][0] and \ (start+1 >= stop or indent == lines[start+1][0]) and \ (start+2 >= stop or indent == lines[start+2][0]) and \ indent + len(lines[start][1]) + len(lines[start+1][1]) < \ self.__line_length: lines[start][1] += lines[start+1][1] del lines[start+1] stop -= 1 else: start += 1 return (stop, indent-1) def __all_on_one_line(self, lines, start, stop): ''' Try all on one line. ''' if start == 0: raise _LineOverflow() (indent, text) = lines[start-1] size = indent + len(text) for (_, extra) in lines[start:stop]: size += len(extra) if size > self.__line_length: raise _LineOverflow() text += extra lines[start-1] = [indent, text] del lines[start:stop] return (start-1, indent) @staticmethod def __fmt(lines): ''' Join lines together, given the indent. ''' return '\n'.join(' ' * indent + line for (indent, line) in lines) class GraphStr(Visitor): ''' Generate an ASCII graph of the nodes. This should be used with `ConstructorWalker` and works rather like cloning, except that instead of generating a new set of nodes we generate a nested set of functions. This set of functions has the same structure as the tree of nodes (we break cycles via loop). The leaf functions take prefixes and return an ASCII picture of what the leaf values should look like (including the prefixes). Functions higher up the tree are similar, except instead of returning a picture directly they extend the prefix and then call the functions that are their children. Once we have an entire tree of functions, we can call the root with an empty prefix and the functions will "cascade" down, building the prefixes necessary and passing them to the root functions that generate the final ASCII data. ''' def __init__(self): super(GraphStr, self).__init__() self._type = None def loop(self, value): ''' Mark loops (what else could we do?) ''' return lambda first, rest, name: \ [first + name + (' ' if name else '') + ''] def node(self, node): ''' Store the class name. ''' self._type = node.__class__.__name__ def constructor(self, *args, **kargs): ''' Generate a function that can construct the local section of the graph when given the appropriate prefixes. ''' def fun(first, rest, name, type_=self._type): ''' Build the ASCII picture; this is rather terse... First is the prefix to the first line; rest is the prefix to the rest. Args and Kargs are the equivalent functions for the constructor arguments; we evaluate them here as we "expend" the ASCII picture. Does this need to be so complex - see my answer at https://www.quora.com/Is-there-an-easy-way-to-print-trees-with-nodes-and-lines-maybe ''' spec = [] for arg in args: spec.append((' +- ', ' | ', '', arg)) for arg in kargs: spec.append((' +- ', ' | ', arg, kargs[arg])) # fix the last branch if spec: spec[-1] = (' `- ', ' ', spec[-1][2], spec[-1][3]) yield first + name + (' ' if name else '') + type_ for (first_, rest_, name_, fun_) in spec: for line in fun_(first_, rest_, name_): yield rest + line return fun def leaf(self, value): ''' Generate a function that can construct the local section of the graph when given the appropriate prefixes. ''' return lambda first, rest, name: \ [first + name + (' ' if name else '') + repr(value)] def postprocess(self, fun): ''' Invoke the functions generated above and join the resulting lines. ''' return '\n'.join(fun('', '', '')) class Proxy(object): ''' A simple proxy that allows us to re-construct cyclic graphs. Used via `make_proxy`. Note - this is only used locally (in this module). When cloning LEPL matcher graphs a different approach is used, based on `Delayed`. ''' def __init__(self, mutable_delegate): self.__mutable_delegate = mutable_delegate def __getattr__(self, name): return getattr(self.__mutable_delegate[0], name) def make_proxy(): ''' Generate (setter, Proxy) pairs. The setter will supply the value to be proxied later; the proxy itself can be place in the graph immediately. ''' mutable_delegate = [None] def setter(value): ''' This is called later to "tie the knot". ''' mutable_delegate[0] = value return (setter, Proxy(mutable_delegate)) def clone(node, args, kargs): ''' The basic clone function that is supplied to `Clone`. This recreates an instance based on its type and arguments. ''' try: # pylint: disable-msg=W0142 return type(node)(*args, **kargs) except TypeError as err: raise TypeError(fmt('Error cloning {0} with ({1}, {2}): {3}', type(node), args, kargs, err)) class Clone(Visitor): ''' Clone the graph, applying a particular clone function. ''' def __init__(self, clone_=clone): super(Clone, self).__init__() self._clone = clone_ self._proxies = {} self._node = None def loop(self, node): ''' Wrap loop nodes in proxies. ''' if node not in self._proxies: self._proxies[node] = make_proxy() return self._proxies[node][1] def node(self, node): ''' Store the current node. ''' self._node = node def constructor(self, *args, **kargs): ''' Clone the node, back-patching proxies as necessary. ''' node = self._clone(self._node, args, kargs) if self._node in self._proxies: self._proxies[self._node][0](node) return node def leaf(self, value): ''' Don't clone leaf nodes. ''' return value def post_clone(function): ''' Generate a clone function that applies the given function to the newly constructed node (so, when used with `Clone`, effectively performs a map on the graph). ''' return compose(function, clone) LEPL-5.1.3/src/lepl/support/__init__.py0000644000175000001440000000271711731117215020271 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Various support classes. ''' LEPL-5.1.3/src/lepl/support/list.py0000644000175000001440000001452311731117215017503 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Support for S-expression ASTs using subclasses of Python's list class. The general support works with any nested iterables (except strings). ''' from functools import reduce from lepl.support.lib import fmt, basestring from lepl.support.node import Node class List(list): ''' Extend a list for use in ASTs. Note that the argument is treated in exactly the same way as list(). That means it takes a single list or generator as an argument, so to use literally you might type List([1,2,3]) - note the "extra" list. ''' def __repr__(self): return self.__class__.__name__ + '(...)' def __str__(self): return sexpr_to_tree(self) def clone_iterable(type_, items): ''' Clone a class that wraps data in an AST. ''' if issubclass(type_, Node): return type_(*list(items)) elif issubclass(type_, basestring): return type_('').join(items) else: return type_(items) def sexpr_fold(per_list=None, per_item=None, exclude=lambda x: isinstance(x, basestring)): ''' We need some kind of fold-like procedure for generalising operations on arbitrarily nested iterables. We can't use a normal fold because Python doesn't have the equivalent of cons, etc; this tries to be more Pythonic (see comments later). We divide everything into iterables ("lists") and atomic values ("items"). per_list is called with a generator over the (transformed) top-most list, in order. Items (ie atomic values) in that list, when requested from the generator, will be processed by per_item; iterables will be processed by a separate call to per_list (ie recursively). So this is more like a recursive map than a fold, but with Python's mutable state and lack of typing it appears to be equally powerful. Note that per_list is passed the previous type, which can be used for dispatching operations. ''' if per_list is None: per_list = clone_iterable if per_item is None: per_item = lambda x: x def items(iterable): for item in iterable: try: if not exclude(item): if isinstance(item, dict): yield per_list(type(item), items(item.items())) else: yield per_list(type(item), items(iter(item))) continue except TypeError: pass yield per_item(item) return lambda list_: per_list(type(list_), items(iter(list_))) clone_sexpr = sexpr_fold() ''' Clone a set of listed iterables. ''' count_sexpr = sexpr_fold(per_list=lambda type_, items: sum(items), per_item=lambda item: 1) ''' Count the number of value nodes in an AST. (Note that size(List) gives the number of entries in that list, counting each sublist as "1", while this descends embedded lists, counting their non-iterable contents. ''' join = lambda items: reduce(lambda x, y: x+y, items, []) ''' Flatten a list of lists by one level, so [[1],[2, [3]]] becomes [1,2,[3]]. Note: this will *only* work correctly if all entries are lists. ''' sexpr_flatten = sexpr_fold(per_list=lambda type_, items: join(items), per_item=lambda item: [item]) ''' Flatten a list completely, so [[1],[2, [3]]] becomes [1,2,3] ''' _fmt={} _fmt[list] = '[{1}]' _fmt[tuple] = '({1})' sexpr_to_str = sexpr_fold(per_list=lambda type_, items: fmt(_fmt.get(type_, '{0}([{1}])'), type_.__name__, ','.join(items)), per_item=lambda item: repr(item)) ''' A flat representation of nested lists (a set of constructors). ''' def sexpr_to_tree(list_): ''' Generate a tree using the same "trick" as `GraphStr`. The initial fold returns a function (str, str) -> list(str) at each level. ''' def per_item(item): def fun(first, _rest): return [first + repr(item)] return fun def per_list(type_, list_): def fun(first, rest): yield [first + str(type_.__name__)] force = list(list_) # need to access last item explicitly if force: for item in force[:-1]: yield item(rest + ' +- ', rest + ' | ') yield force[-1](rest + ' `- ', rest + ' ') return lambda first, rest: join(list(fun(first, rest))) fold = sexpr_fold(per_list, per_item) return '\n'.join(fold(list_)('', '')) def sexpr_throw(node): ''' Raise an error, if one exists in the results (AST trees are traversed). Otherwise, the results are returned (invoke with ``>>``). ''' def throw_or_copy(type_, items): clone = clone_iterable(type_, items) if isinstance(clone, Exception): raise clone else: return clone return sexpr_fold(per_list=throw_or_copy)(node) LEPL-5.1.3/src/lepl/support/node.py0000644000175000001440000002256211731117215017457 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Base classes for AST nodes (and associated functions). ''' from lepl.support.graph import GraphStr, ConstructorGraphNode, ConstructorWalker,\ postorder from lepl.support.lib import LogMixin, basestring, fmt class NodeException(Exception): ''' Exception raised when we have problems dynamically creating nodes. ''' def is_named(arg): ''' Is this is "named tuple"? ''' return (isinstance(arg, tuple) or isinstance(arg, list)) \ and len(arg) == 2 and isinstance(arg[0], basestring) def new_named_node(name, node): ''' Generate a sub-class of Node, with the given name as type, as long as it is not already a subclass. ''' if type(node) != Node: raise NodeException( fmt('Will not coerce a node subclass ({0}) to {1}', type(node), name)) class_ = type(name, (Node,), {}) (args, kargs) = node._constructor_args() return class_(*args, **kargs) def coerce(arg): ''' Convert named nodes to nodes with that name. ''' if is_named(arg) and isinstance(arg[1], Node): return new_named_node(arg[0], arg[1]) else: return arg # pylint: disable-msg=R0903 # it's not supposed to have public attributes, because it exposes contents class Node(LogMixin, ConstructorGraphNode): ''' A base class for AST nodes. It is designed to be applied to a list of results, via ``>``. Nodes support both simple list--like behaviour:: >>> abc = Node('a', 'b', 'c') >>> abc[1] 'b' >>> abc[1:] ['b', 'c'] >>> abc[:-1] ['a', 'b'] and dict--like behaviour through attributes:: >>> fb = Node(('foo', 23), ('bar', 'baz')) >>> fb.foo [23] >>> fb.bar ['baz'] Both mixed together:: >>> fb = Node(('foo', 23), ('bar', 'baz'), 43, 'zap', ('foo', 'again')) >>> fb[:] [23, 'baz', 43, 'zap', 'again'] >>> fb.foo [23, 'again'] Note how ``('name', value)`` pairs have a special meaning in the constructor. This is supported by the creation of "named pairs":: >>> letter = Letter() > 'letter' >>> digit = Digit() > 'digit' >>> example = (letter | digit)[:] > Node >>> n = example.parse('abc123d45e')[0] >>> n.letter ['a', 'b', 'c', 'd', 'e'] >>> n.digit ['1', '2', '3', '4', '5'] However, a named pair with a Node as a value is coerced into a subclass of Node with the given name (this keeps Nodes connected into a single tree and so simplifies traversal). ''' def __init__(self, *args): ''' Expects a single list of arguments, as will be received if invoked with the ``>`` operator. ''' super(Node, self).__init__() self.__postorder = ConstructorWalker(self, Node) self.__children = [] self.__paths = [] self.__names = set() for arg in map(coerce, args): if is_named(arg): self.__add_named_child(arg[0], arg[1]) elif isinstance(arg, Node): self.__add_named_child(arg.__class__.__name__, arg) else: self.__add_anon_child(arg) def __add_named_child(self, name, value): ''' Add a value associated with a name (either a named pair or the class of a Node subclass). ''' index = self.__add_attribute(name, value) self.__children.append(value) self.__paths.append((name, index)) def __add_anon_child(self, value): ''' Add a nameless value. ''' index = len(self.__children) self.__children.append(value) self.__paths.append(index) def __add_attribute(self, name, value): ''' Attributes are associated with lists of (named) values. ''' if name not in self.__names: self.__names.add(name) setattr(self, name, []) attr = getattr(self, name) index = len(attr) attr.append(value) return index def __dir__(self): ''' The names of all the attributes constructed from the results. ''' # this must return a list, not an iterator (Python requirement) return list(self.__names) def __getitem__(self, index): return self.__children[index] def __iter__(self): return iter(self.__children) def __str__(self): visitor = NodeTreeStr() return self.__postorder(visitor) def __repr__(self): return self.__class__.__name__ + '(...)' def __len__(self): return len(self.__children) def __bool__(self): return bool(self.__children) # Python 2.6 def __nonzero__(self): return self.__bool__() def _recursively_eq(self, other): ''' This compares two nodes by recursively comparing their contents. It may be useful for testing, for example, but care should be taken to avoid its use on cycles of objects. ''' try: siblings = iter(other) except TypeError: return False for child in self: try: sibling = next(siblings) try: # pylint: disable-msg=W0212 if not child._recursively_eq(sibling): return False except AttributeError: if child != sibling: return False except StopIteration: return False try: next(siblings) return False except StopIteration: return True def _constructor_args(self): ''' Regenerate the constructor arguments (returns (args, kargs)). ''' args = [] for (path, value) in zip(self.__paths, self.__children): if isinstance(path, int): args.append(value) else: name = path[0] if name == value.__class__.__name__: args.append(value) else: args.append((name, value)) return (args, {}) # pylint: disable-msg=R0903 # __ method class MutableNode(Node): ''' Extend `Node` to allow children to be set. ''' def __setitem__(self, index, value): self.__children[index] = value class NodeTreeStr(GraphStr): ''' Extend `GraphStr` to handle named pairs - this generates an 'ASCII tree' representation of the node graph. ''' def leaf(self, arg): ''' Leaf nodes are either named or simple values. ''' if is_named(arg): return lambda first, rest, name_: \ [first + arg[0] + (' ' if arg[0] else '') + repr(arg[1])] else: return super(NodeTreeStr, self).leaf(arg) def node_throw(node): ''' Raise an error, if one exists in the results (AST trees are traversed). Otherwise, the results are returned (invoke with ``>>``). ''' for child in postorder(node, Node): if isinstance(child, Exception): raise child return node # Below unrelated to nodes - move? def make_dict(contents): ''' Construct a dict from a list of named pairs (other values in the list will be discarded). Invoke with ``>`` after creating named pairs with ``> string``. ''' return dict(entry for entry in contents if isinstance(entry, tuple) and len(entry) == 2 and isinstance(entry[0], basestring)) def join_with(separator=''): ''' Join results together (via separator.join()) into a single string. Invoke as ``> join_with(',')``, for example. ''' def fun(results): ''' Delay evaluation. ''' return separator.join(results) return fun LEPL-5.1.3/src/lepl/support/state.py0000644000175000001440000000672211731117215017652 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Encapsulate global (per thread) state. This is for state that can affect the current parse. It's probably simplest to explain an example of what it can be used for. Memoization records results for a particular state to avoid repeating matches needlessly. The state used to identify when "the same thing is happening" is based on: - the matcher being called - the stream passed to the matcher - this state So a good shorthand for deciding whether or not this state should be used is to ask whether the state will affect whether or not memoisation will work correctly. For example, with offside parsing, the current indentation level should be stored here, because a (matcher, stream) combination that has previously failed may work correctly when it changes. ''' from threading import local from lepl.support.lib import singleton class State(local): ''' A thread local map from key (typically calling class) to value. The hash attribute is updated on each mutation and cached for rapid access. ''' def __init__(self): ''' Do not call directly - use the singleton. ''' super(State, self).__init__() self.__map = {} self.hash = self.__hash() @classmethod def singleton(cls): ''' Get a singleton instance. ''' return singleton(cls) def __hash(self): ''' Calculate the hash for the current dict. ''' value = 0 for key in self.__map: value ^= hash(key) ^ hash(self.__map[key]) return value def __getitem__(self, key): return self.__map[key] def get(self, key, default=None): ''' As for dict (lookup with default). ''' return self.__map.get(key, default) def __setitem__(self, key, value): self.__map[key] = value self.hash = self.__hash() def __delitem__(self, key): del self.__map[key] self.hash = self.__hash() def __hash__(self): return self.hash LEPL-5.1.3/src/lepl/support/context.py0000644000175000001440000001611111731117215020207 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Allow global per-thread values to be defined within a certain scope in a way that supports multiple values, temporary changes inside with contexts, etc. This is implemented in two layers. The base layer is a map from keys to values which isolates different, broad, functionalities. Despite the name, a NamespaceMap can map from any key to any value - it's just a thread-local map. However, typically is it used with Namespaces because they have support for some useful idioms. A Namespace is, as described above, associated with a name in the thread's NamespaceMap. It manages state for some functionality, so is another map, forming the second layer. The motivating example of a Namespace is the OperatorNamespace, which maps from operators to matchers. This uses the support in Namespace that allows values to be over-ridden within a certain scope to support overriding matchers for matching spaces. ''' from collections import deque #from logging import getLogger from threading import local from lepl.support.lib import singleton, fmt class ContextError(Exception): ''' Exception raised on problems with context. ''' pass # pylint: disable-msg=R0903 class NamespaceMap(local): ''' A store for namespaces. This subclasses threading.local so each thread effectively has its own instance (see test). ''' def __init__(self): super(NamespaceMap, self).__init__() self.__map = {} def get(self, name, default=None): ''' This gets the namespace associated with the name, creating a new namespace from the second argument if necessary. ''' from lepl.matchers.operators import OperatorNamespace if default is None: default = OperatorNamespace if name not in self.__map: self.__map[name] = default() return self.__map[name] class Namespace(object): ''' A store for global definitions. ''' def __init__(self, base=None): self.__stack = deque([{} if base is None else base]) def push(self, extra=None): ''' Copy the current state to the stack and modify it. Values in extra that map to None are ignored. ''' self.__stack.append(dict(self.current())) extra = {} if extra is None else extra for name in extra: self.set_if_not_none(name, extra[name]) def pop(self): ''' Return the previous state from the stack. ''' self.__stack.pop() def __enter__(self): ''' Allow use within a with context by duplicating the current state and saving to the stack. Returns self to allow manipulation via set. ''' self.push() return self def __exit__(self, *_args): ''' Restore the previous state from the stack on leaving the context. ''' self.pop() def current(self): ''' The current state (a map from names to operator implementations). ''' return self.__stack[-1] def set(self, name, value): ''' Set a value. ''' self.current()[name] = value def set_if_not_none(self, name, value): ''' Set a value if it is not None. ''' if value != None: self.set(name, value) def get(self, name, default): ''' Get a value if defined, else the default. ''' return self.current().get(name, default) class OnceOnlyNamespace(Namespace): ''' Allow some values to be set only once. ''' def __init__(self, base=None, once_only=None): super(OnceOnlyNamespace, self).__init__(base) self.__once_only = set() if once_only is None else once_only def once_only(self, name): ''' The given name can be set only once. ''' self.__once_only.add(name) def set(self, name, value): ''' Set a value (if it has not already been set). ''' if name in self.__once_only and self.get(name, None) is not None: raise ContextError(fmt('{0} can only be set once', name)) else: super(OnceOnlyNamespace, self).set(name, value) # pylint: disable-msg=C0103, W0603 def Global(name, default=None): ''' Global (per-thread) binding from operator name to implementation, by namespace. ''' # Delay creation to handle circular dependencies. assert name namespace_map = singleton(NamespaceMap) return namespace_map.get(name, default) class NamespaceMixin(object): ''' Allow access to global (per-thread) values. ''' def __init__(self, name, namespace): super(NamespaceMixin, self).__init__() self.__name = name self.__namespace = namespace def _lookup(self, name, default=None): ''' Retrieve the named namespace from the global (per thread) store. ''' return Global(self.__name, self.__namespace).get(name, default) class Scope(object): ''' Base class supporting dedicated syntax for particular options. ''' def __init__(self, name, namespace, frame): self.__name = name self.__namespace = namespace self.__frame = frame def __enter__(self): ''' On entering the context, add the new definitions. ''' Global(self.__name, self.__namespace).push(self.__frame) def __exit__(self, *_args): ''' On leaving the context, return to previous definition. ''' Global(self.__name, self.__namespace).pop() LEPL-5.1.3/src/lepl/support/lib.py0000644000175000001440000002103711731377113017301 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Library routines / utilities (some unused). ''' from logging import getLogger # this is an attempt to make 2.6 and 3 function equally with strings try: chr = unichr str = unicode basestring = basestring file = file from StringIO import StringIO reduce = reduce except NameError: from io import IOBase, StringIO chr = chr str = str basestring = str file = IOBase from functools import reduce def assert_type(name, value, type_, none_ok=False): ''' If the value is not of the given type, raise a syntax error. ''' if none_ok and value is None: return if isinstance(value, type_): return raise TypeError(fmt('{0} (value {1}) must be of type {2}.', name, repr(value), type_.__name__)) class CircularFifo(object): ''' A FIFO queue with a fixed maximum size that silently discards data on overflow. It supports iteration for reading current contents and so can be used for a "latest window". Might be able to use deque instead? This may be more efficient if the entire contents are read often (as is the case when depth gets deeper)? ''' def __init__(self, size): ''' Stores up to size entries. Once full, appending a further value will discard (and return) the oldest still present. ''' self.__size = 0 self.__next = 0 self.__buffer = [None] * size def append(self, value): ''' This returns a value on overflow, otherwise None. ''' capacity = len(self.__buffer) if self.__size == capacity: dropped = self.__buffer[self.__next] else: dropped = None self.__size += 1 self.__buffer[self.__next] = value self.__next = (self.__next + 1) % capacity return dropped def pop(self, index=0): ''' Remove and return the next item. ''' if index != 0: raise IndexError('FIFO is only a FIFO') if self.__size < 1: raise IndexError('FIFO empty') popped = self.__buffer[(self.__next - self.__size) % len(self.__buffer)] self.__size -= 1 return popped def __len__(self): return len(self.__buffer) def __iter__(self): capacity = len(self.__buffer) index = (self.__next - self.__size) % capacity for _ in range(self.__size): yield self.__buffer[index] index = (index + 1) % capacity def clear(self): ''' Clear the data (we just set the size to zero - this doesn't release any references). ''' self.__size = 0 def open_stop(spec): ''' In Python 2.6 open [] appears to use maxint or similar, which is not available in Python 3. This uses a minimum value for maxint I found somewhere; hopefully no-one ever wants finite repeats larger than this. ''' return spec.stop == None or spec.stop > 2147483647 def lmap(function, values): ''' A map that returns a list rather than an iterator. ''' # pylint: disable-msg=W0141 return list(map(function, values)) def compose(fun_a, fun_b): ''' Functional composition (assumes fun_a takes a single argument). ''' def fun(*args, **kargs): ''' This assumes fun_a takes a single argument. ''' return fun_a(fun_b(*args, **kargs)) return fun def compose_tuple(fun_a, fun_b): ''' Functional composition (assumes fun_b returns a sequence which is supplied to fun_a via *args). ''' def fun(*args, **kargs): ''' Supply result from fun_b as *arg. ''' # pylint: disable-msg=W0142 return fun_a(*fun_b(*args, **kargs)) return fun def empty(): ''' An empty generator. ''' if False: yield None def count(value=0, step=1): ''' Identical to itertools.count for recent Python, but 2.6 lacks the step. ''' while True: yield value value += step class LogMixin(object): ''' Add standard Python logging to a class. ''' def __init__(self, *args, **kargs): super(LogMixin, self).__init__(*args, **kargs) self._log = getLogger(self.__module__ + '.' + self.__class__.__name__) self._debug = self._log.debug self._info = self._log.info self._warn = self._log.warn self._error = self._log.error def safe_in(value, container, default=False): ''' Test for membership without an error for unhashable items. ''' try: return value in container except TypeError: log = getLogger('lepl.support.safe_in') log.debug(fmt('Cannot test for {0!r} in collection', value)) return default def safe_add(container, value): ''' Add items to a container, if they are hashable. ''' try: container.add(value) except TypeError: log = getLogger('lepl.support.safe_add') log.warn(fmt('Cannot add {0!r} to collection', value)) def fallback_add(container, value): ''' Add items to a container. Call initially with a set, but accept the returned collection, which will fallback to a list of necessary (if the contents are unhashable). ''' try: container.add(value) return container except AttributeError: container.append(value) return container except TypeError: if isinstance(container, list): raise else: container = list(container) return fallback_add(container, value) def fold(fun, start, sequence): ''' Fold over a sequence. ''' for value in sequence: start = fun(start, value) return start def sample(prefix, rest, size=40): ''' Provide a small sample of a string. ''' text = prefix + rest if len(text) > size: text = prefix + rest[0:size-len(prefix)-3] + '...' return text __SINGLETONS = {} ''' Map from factory (constructor/class) to singleton. ''' def singleton(key, factory=None): ''' Manage singletons for various types. ''' if key not in __SINGLETONS: if not factory: factory = key __SINGLETONS[key] = factory() return __SINGLETONS[key] def fmt(template, *args, **kargs): ''' Guarantee that template is always unicode, as embedding unicode in ascii can cause errors. ''' return str(template).format(*args, **kargs) def identity(x): return x def document(destn, source, text=None): ''' Copy function name and docs. ''' if text: destn.__name__ = text else: destn.__name__ = source.__name__ destn.__doc__ = source.__doc__ # module used in auto-linking for docs destn.__module__ = source.__module__ return destn def add_defaults(original, defaults, prefix=''): ''' Add defaults to original dict if not already present. ''' for (name, value) in defaults.items(): if prefix + name not in original: original[prefix + name] = value return original LEPL-5.1.3/src/lepl/support/timer.py0000644000175000001440000001337611731117215017655 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Support for measuring the speed of different parsers and configurations. ''' from time import time from sys import stdout from gc import collect from lepl.support.lib import fmt DEFAULT = 'default' def print_timing(input, matchers, count=10, count_compile=None, best_of=3, parse_all=False, out=None, reference=None): ''' Generate timing information for the given input and parsers. `input` can be a sequence or a function that generates sequences (this is useful if you need to subvert caching). Note that a function is evaluated once before the timing starts. `matchers` is a dict that maps names to matchers. The names are used in the output. Alternatively, a single matcher can be given (which will be called "default"). A typical use might look like: matcher = .... time("...", {'default': matcher.clone().config.default().matcher, 'clear': matcher.clone().config.clear().matcher}) `count` is the number of parses to make when measuring a single time. This is to make sure that the time taken is long enough to measure accurately (times less that 10ms or so will be imprecise). The final time reported is adjusted to be for a single parse, no matter what the value of `count`. By default 10 matches are made. `count_compile` allows a different `count` value for timing when the compiler parser is not re-used. By default this is the same as `count`. `best_of` repeats the test this many times and takes the shortest result. This corrects for any slow-down caused by other programs running. `parse_all`, if True, evaluates all parses through backtracking (otherwise a single parse is made) `out` is the destination for the printing (stdout by default). `reference` names the matcher against which others are compared. By default any matcher called "default" will be used; otherwise the first matcher when sorted alphabetically is used. ''' try: input() source = input except TypeError: source = lambda: input if not isinstance(matchers, dict): matchers = {DEFAULT: matchers} if out is None: out = stdout if count_compile is None: count_compile = count # python 2 has no support for print(..., file=...) def prnt(msg='', end='\n'): out.write(msg + end) prnt('\n\nTiming Results (ms)') prnt('-------------------') prnt(fmt('\nCompiling: best of {1:d} averages over {0:d} repetition(s)', count_compile, best_of)) prnt(fmt('Parse only: best of {1:d} averages over {0:d} repetition(s)', count, best_of)) prnt('\n Matcher Compiling | Parse only') prnt('-----------------------------------------+------------------') references = [None, None] names = sorted(matchers.keys()) if not reference: if DEFAULT in names: reference = DEFAULT else: reference = names[0] assert reference in names, 'Reference must be in names' names = [reference] + [name for name in names if name != reference] for name in names: prnt(fmt('{0:>20s} ', name), end='') matcher = matchers[name] for (compile, n, end, r) in ((True, count_compile, ' |', 0), (False, count, '\n', 1)): times = [] for i in range(best_of): collect() t = time() for j in range(n): if compile: matcher.config.clear_cache() if parse_all: list(matcher.parse_all(source())) else: matcher.parse(source()) times.append(time() - t) times.sort() best = 1000 * times[0] / float(n) prnt(fmt('{0:7.2f}', best), end='') ref = references[r] if ref is None: references[r] = best prnt(' ', end=end) elif ref: ratio = best / ref prnt(fmt(' (x {0:5.1f})', ratio), end=end) else: prnt(' (x -----)', end=end) prnt() LEPL-5.1.3/src/lepl/support/warn.py0000644000175000001440000000570411731117215017500 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' A mechanism to associate warnings with certain classes (eg for deprecation) and to disable those warnings. ''' from logging import getLogger from lepl.support.lib import fmt _WARNINGS = {} LOG = getLogger(__name__) class Warning(object): def __init__(self, name, message): super(Warning, self).__init__() self.silent = False self.displayed = False self.name = name self.message = message assert self.name not in _WARNINGS, (self.name, _WARNINGS) _WARNINGS[self.name] = self def warn(self): if not self.silent and not self.displayed: print(self.message) print(fmt('To disable this message call ' 'lepl.support.warn.silence({0!r})', self.name)) self.displayed = True def silence(name): ''' Silence the warning. Obviously, don't do this until you have addressed the underlying issue... ''' if name in _WARNINGS: _WARNINGS[name].silent = True else: LOG.warn(fmt('No warning registered for {0}', name)) def warn_on_use(message, name=None): ''' A decorator that warns when the function is used. ''' def decorator(func, name=name): if name is None: name = func.__name__ warning = Warning(name, message) def wrapper(*args, **kargs): warning.warn() return func(*args, **kargs) wrapper.__name__ = func.__name__ return wrapper return decorator LEPL-5.1.3/src/lepl/core/0000755000175000001440000000000011764776700015405 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/core/_test/0000755000175000001440000000000011764776700016523 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/core/_test/__init__.py0000644000175000001440000000353411740102237020617 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.core package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.core._test.clone import lepl.core._test.config import lepl.core._test.dynamic import lepl.core._test.manager import lepl.core._test.parser import lepl.core._test.rewrite_delayed_bug import lepl.core._test.rewrite_repeat_bug import lepl.core._test.rewriters LEPL-5.1.3/src/lepl/core/_test/rewriters.py0000644000175000001440000002565611731117151021120 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.core.rewriters module. ''' #from logging import basicConfig, DEBUG from re import sub from unittest import TestCase from lepl.support.node import Node from lepl.matchers.core import Any, Delayed from lepl.matchers.derived import Optional, Drop, And, Join, Digit from lepl.matchers.combine import Or from lepl.support.graph import preorder from lepl.matchers.matcher import Matcher, is_child from lepl.matchers.support import TransformableWrapper # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, C0321 # (dude this is just a test) def str26(value): ''' Hack 2.6 string conversion ''' string = str(value) return string.replace("u'", "'") #class DelayedCloneTest(TestCase): # # def assert_clone(self, matcher): # _copy = matcher.postorder(DelayedClone(), Matcher) # # def _assert_clone(self, matcher, copy): # original = preorder(matcher, Matcher) # duplicate = preorder(copy, Matcher) # try: # while True: # o = next(original) # d = next(duplicate) # assert type(o) == type(d), (str(o), str(d), o, d) # if isinstance(o, Matcher): # assert o is not d, (str(o), str(d), o, d) # else: # assert o is d, (str(o), str(d), o, d) # except StopIteration: # self.assert_empty(original, 'original') # self.assert_empty(duplicate, 'duplicate') # # def assert_relative(self, matcher): # copy = matcher.postorder(DelayedClone(), Matcher) # def pairs(matcher): # for a in preorder(matcher, Matcher): # for b in preorder(matcher, Matcher): # yield (a, b) # for ((a,b), (c,d)) in zip(pairs(matcher), pairs(copy)): # if a is b: # assert c is d # else: # assert c is not d # if type(a) is type(b): # assert type(c) is type(d) # else: # assert type(c) is not type(d) # # def assert_empty(self, generator, name): # try: # next(generator) # assert False, name + ' not empty' # except StopIteration: # pass # # def test_no_delayed(self): # matcher = Any('a') | Any('b')[1:2,...] # self.assert_clone(matcher) # self.assert_relative(matcher) # # def test_simple_loop(self): # delayed = Delayed() # matcher = Any('a') | Any('b')[1:2,...] | delayed # self.assert_clone(matcher) # self.assert_relative(matcher) # # def test_complex_loop(self): # delayed1 = Delayed() # delayed2 = Delayed() # line1 = Any('a') | Any('b')[1:2,...] | delayed1 # line2 = delayed1 & delayed2 # matcher = line1 | line2 | delayed1 | delayed2 > 'foo' # self.assert_clone(matcher) # self.assert_relative(matcher) # # def test_common_child(self): # a = Any('a') # b = a | Any('b') # c = a | b | Any('c') # matcher = a | b | c # self.assert_clone(matcher) # self.assert_relative(matcher) # # def test_full_config_loop(self): # matcher = Delayed() # matcher += Any() & matcher # matcher.config.no_full_first_match().no_memoize() # copy = matcher.get_parse_string().matcher # self._assert_clone(matcher, copy) # # def test_transformed_etc(self): # class Term(Node): pass # class Factor(Node): pass # class Expression(Node): pass # # expression = Delayed() # number = Digit()[1:,...] > 'number' # term = (number | '(' / expression / ')') > Term # muldiv = Any('*/') > 'operator' # factor = (term / (muldiv / term)[0::]) > Factor # addsub = Any('+-') > 'operator' # expression += (factor / (addsub / factor)[0::]) > Expression # # self.assert_clone(expression) # self.assert_relative(expression) # expression.config.no_full_first_match().no_compile_to_regexp() # expression.config.no_compose_transforms().no_direct_eval() # expression.config.no_flatten().no_memoize() # copy = expression.get_parse_string().matcher # self._assert_clone(expression, copy) def append(x): return lambda l: l[0] + x class ComposeTransformsTest(TestCase): def test_null(self): matcher = Any() > append('x') matcher.config.clear() parser = matcher.get_parse() result = parser('a')[0] assert result == 'ax', result def test_simple(self): matcher = Any() > append('x') matcher.config.clear().compose_transforms() parser = matcher.get_parse() result = parser('a')[0] assert result == 'ax', result def test_double(self): matcher = (Any() > append('x')) > append('y') matcher.config.clear().compose_transforms() parser = matcher.get_parse() result = parser('a')[0] assert result == 'axy', result assert isinstance(parser.matcher, TransformableWrapper) assert len(parser.matcher.wrapper.functions) == 2 # And is no longer transformable # def test_and(self): # matcher = (Any() & Optional(Any())) > append('x') # matcher.config.clear().compose_transforms() # parser = matcher.get_parse() # result = parser('a')[0] # assert result == 'ax', result # assert is_child(parser.matcher, And), type(parser.matcher) def test_loop(self): matcher = Delayed() matcher += (Any() | matcher) > append('x') matcher.config.clear().compose_transforms() parser = matcher.get_parse() result = parser('a')[0] assert result == 'ax', result assert isinstance(parser.matcher, Delayed) def test_node(self): class Term(Node): pass number = Any('1') > 'number' term = number > Term factor = term | Drop(Optional(term)) factor.config.clear().compose_transforms() p = factor.get_parse_string() ast = p('1')[0] assert type(ast) == Term, type(ast) assert ast[0] == '1', ast[0] assert str26(ast) == """Term `- number '1'""", ast class OptimizeOrTest(TestCase): def test_conservative(self): matcher = Delayed() matcher += matcher | Any() assert isinstance(matcher.matcher.matchers[0], Delayed) matcher.config.clear().optimize_or(True) matcher.get_parse_string() # TODO - better test assert isinstance(matcher.matcher.matchers[0], TransformableWrapper) def test_liberal(self): matcher = Delayed() matcher += matcher | Any() assert isinstance(matcher.matcher.matchers[0], Delayed) matcher.config.clear().optimize_or(False) matcher.get_parse_string() # TODO - better test assert isinstance(matcher.matcher.matchers[0], TransformableWrapper) class AndNoTrampolineTest(TestCase): def test_replace(self): #basicConfig(level=DEBUG) matcher = And('a', 'b') matcher.config.clear().direct_eval() parser = matcher.get_parse() text = str(parser.matcher) assert "AndNoTrampoline(Literal, Literal)" == text, text result = parser('ab') assert result == ['a', 'b'], result class FlattenTest(TestCase): def test_flatten_and(self): matcher = And('a', And('b', 'c')) matcher.config.clear().flatten() parser = matcher.get_parse() text = str(parser.matcher) assert text == "And(Literal, Literal, Literal)", text result = parser('abcd') assert result == ['a', 'b', 'c'], result def test_no_flatten_and(self): matcher = And('a', Join(And('b', 'c'))) matcher.config.clear().flatten() parser = matcher.get_parse() text = str(parser.matcher) assert text == "And(Literal, Transform)", text result = parser('abcd') assert result == ['a', 'bc'], result def test_flatten_and_transform(self): matcher = Join(And('a', And('b', 'c'))) matcher.config.clear().flatten() parser = matcher.get_parse() text = sub('<.*>', '<>', str(parser.matcher)) assert text == "Transform(And, TransformationWrapper(<>))", text result = parser('abcd') assert result == ['abc'], result def test_flatten_or(self): matcher = Or('a', Or('b', 'c')) matcher.config.clear().flatten() parser = matcher.get_parse() text = str(parser.matcher) assert text == "Or(Literal, Literal, Literal)", text result = parser('abcd') assert result == ['a'], result def test_no_flatten_or(self): matcher = Or('a', Join(Or('b', 'c'))) matcher.config.clear().flatten() parser = matcher.get_parse() text = str(parser.matcher) assert text == "Or(Literal, Transform)", text result = parser('abcd') assert result == ['a'], result def test_bug(self): matcher = Any()[:,...] > 'bar' parser = matcher.get_parse_string() result = parser('foo') assert result == [('bar', 'foo')], result LEPL-5.1.3/src/lepl/core/_test/clone.py0000644000175000001440000000666511731117151020171 0ustar andrewusers00000000000000from lepl.matchers.variables import TraceVariables from lepl.lexer.matchers import Token from lepl.support.list import List # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for cloning matchers. ''' from unittest import TestCase from difflib import Differ from lepl.matchers.core import Literal, Delayed from lepl.matchers.derived import Drop, UnsignedReal, Optional from lepl.core.rewriters import clone_matcher class CloneTest(TestCase): def assert_clone_ok(self, matcher): copy = clone_matcher(matcher) mtree = matcher.tree() ctree = copy.tree() if mtree != ctree: print(mtree) print(ctree) diff = Differ() print('\n'.join(diff.compare(mtree.split('\n'), ctree.split('\n')))) assert mtree == ctree assert matcher is not copy def test_single(self): self.assert_clone_ok(Literal('foo')) def test_no_loop(self): self.assert_clone_ok(Literal('a') | 'b' & Drop('c')) def test_delayed(self): d = Delayed() d += Literal('foo') self.assert_clone_ok(d) self.assert_clone_ok(Literal('bar') & d) def test_loop(self): d = Delayed() a = d | 'b' d += a self.assert_clone_ok(d) def test_loops(self): with TraceVariables(): value = Token(UnsignedReal()) symbol = Token('[^0-9a-zA-Z \t\r\n]') number = (Optional(symbol('-')) + value) >> float group2, group3c = Delayed(), Delayed() parens = symbol('(') & group3c & symbol(')') group1 = parens | number mul = (group2 & symbol('*') & group2) > List # changed div = (group2 & symbol('/') & group2) > List # changed group2 += (mul | div | group1) add = (group3c & symbol('+') & group3c) > List # changed sub = (group3c & symbol('-') & group3c) > List # changed group3c += (add | sub | group2) self.assert_clone_ok(group3c) LEPL-5.1.3/src/lepl/core/_test/rewrite_delayed_bug.py0000644000175000001440000000402611731117151023063 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for bug reprted on the mailing list. ''' from lepl._test.base import BaseTest from lepl.matchers.core import Delayed, Literal class LiteralTest(BaseTest): def test_as_given(self): c = Delayed() a = Literal("a") + c b = Literal("b") c += a | b self.assert_literal('ab', c) def test_workaround(self): c = Delayed() a = Literal("a") + c b = Literal("b") c += (a | b) >= list self.assert_literal('ab', c) LEPL-5.1.3/src/lepl/core/_test/rewrite_repeat_bug.py0000644000175000001440000000363711731117151022743 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for bug reprted on the mailing list. ''' from lepl._test.base import BaseTest from lepl import * class RepeatTest(BaseTest): def test_no_regexp(self): xy = ('x' & Drop('y'))[1] xy.config.no_compile_to_regexp() assert xy.parse('xy') == ['x'] def test_error(self): xy = ('x' & Drop('y'))[1] m = xy.get_parse().matcher #print(m.tree()) result = xy.parse('xy') assert result == ['x'], result LEPL-5.1.3/src/lepl/core/_test/config.py0000644000175000001440000001533311731117151020326 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.core.config module. ''' #from logging import basicConfig, DEBUG from unittest import TestCase from lepl.matchers.variables import TraceVariables from lepl.matchers.operators import DroppedSpace from lepl.matchers.derived import Drop, Word, String from lepl.matchers.core import Any, Lookahead from lepl._test.base import assert_str from lepl.stream.maxdepth import FullFirstMatchException from lepl.regexp.core import RegexpError from lepl.support.lib import StringIO class ParseTest(TestCase): def run_test(self, name, text, parse, match2, match3, error, config=lambda x: None, **kargs): matcher = Any()[:, ...] config(matcher) parser = getattr(matcher, 'parse' + name) result = str(parser(text, **kargs)) assert_str(result, parse) matcher = Any()[2, ...] matcher.config.no_full_first_match() config(matcher) parser = getattr(matcher, 'match' + name) result = str(list(parser(text, **kargs))) assert_str(result, match2) matcher = Any()[3, ...] matcher.config.no_full_first_match() config(matcher) parser = getattr(matcher, 'match' + name) result = str(list(parser(text, **kargs))) assert_str(result, match3) matcher = Any() config(matcher) parser = getattr(matcher, 'parse' + name) try: parser(text, **kargs) except FullFirstMatchException as e: assert_str(e, error) def test_string(self): self.run_test('_string', 'abc', "['abc']", "[(['ab'], (2, ))]", "[(['abc'], (3, ))]", "The match failed in at 'bc' (line 1, character 2).") self.run_test('', 'abc', "['abc']", "[(['ab'], (2, ))]", "[(['abc'], (3, ))]", "The match failed in at 'bc' (line 1, character 2).") self.run_test('_sequence', 'abc', "['abc']", "[(['ab'], (2, ))]", "[(['abc'], (3, ))]", "The match failed in at 'bc' (offset 1, value 'b').") def test_string_list(self): self.run_test('_list', ['a', 'b', 'c'], "[['a', 'b', 'c']]", "[([['a', 'b']], (2, ))]", "[([['a', 'b', 'c']], (3, ))]", "The match failed in > at ['b', 'c'] (offset 1, value 'b').", config=lambda m: m.config.no_compile_to_regexp()) def test_int_list(self): #basicConfig(level=DEBUG) try: # this fails for python2 because it relies on # comparison between types failing self.run_test('_list', [1, 2, 3], [], [], [], """""") except RegexpError as e: assert 'no_compile_to_regexp' in str(e), str(e) self.run_test('_list', [1, 2, 3], "[[1, 2, 3]]", "[([[1, 2]], (2, ))]", "[([[1, 2, 3]], (3, ))]", "The match failed in > at [2, 3] (offset 1, value 2).", config=lambda m: m.config.no_compile_to_regexp()) class BugTest(TestCase): def test_bug(self): matcher = Any('a') matcher.config.clear() result = list(matcher.match_list(['b'])) assert result == [], result class TraceVariablesTest(TestCase): def test_trace(self): buffer = StringIO() with TraceVariables(out=buffer): word = ~Lookahead('OR') & Word() phrase = String() with DroppedSpace(): text = (phrase | word)[1:] > list query = text[:, Drop('OR')] expected = ''' phrase failed stream = 'spicy meatballs OR... word = ['spicy'] stream = ' meatballs OR "el ... phrase failed stream = 'meatballs OR "el b... word = ['meatballs'] stream = ' OR "el bulli rest... phrase failed stream = 'OR "el bulli resta... word failed stream = 'OR "el bulli resta... phrase failed stream = ' OR "el bulli rest... word failed stream = ' OR "el bulli rest... text = [['spicy', 'meatballs']] stream = ' OR "el bulli rest... phrase = ['el bulli restaurant'] stream = '' phrase failed stream = '' word failed stream = '' text = [['el bulli restaurant']] stream = '' query = [['spicy', 'meatballs'], ['el... stream = '' ''' query.config.auto_memoize(full=True) query.parse('spicy meatballs OR "el bulli restaurant"') trace = buffer.getvalue() assert trace == expected, '"""' + trace + '"""' # check caching works query.parse('spicy meatballs OR "el bulli restaurant"') trace = buffer.getvalue() assert trace == expected, '"""' + trace + '"""' LEPL-5.1.3/src/lepl/core/_test/parser.py0000644000175000001440000000751111731117151020354 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.core.parser module. ''' from traceback import format_exc from types import MethodType from unittest import TestCase from lepl.matchers.core import Any, Literal from lepl.matchers.support import function_matcher # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, C0102, E1101 # (dude this is just a test) class InstanceMethodTest(TestCase): class Foo(object): class_attribute = 1 def __init__(self): self.instance_attribute = 2 def bar(self): return (self.class_attribute, self.instance_attribute, hasattr(self, 'baz')) def test_method(self): foo = self.Foo() assert foo.bar() == (1, 2, False) def my_baz(myself): return (myself.class_attribute, myself.instance_attribute, hasattr(myself, 'baz')) # pylint: disable-msg=W0201 foo.baz = MethodType(my_baz, foo) assert foo.baz() == (1, 2, True) assert foo.bar() == (1, 2, True) class FlattenTest(TestCase): def test_flatten(self): matcher = Literal('a') & Literal('b') & Literal('c') assert str(matcher) == "And(And, Literal)", str(matcher) matcher.config.clear().flatten() parser = matcher.get_parse_string() assert str(parser.matcher) == "And(Literal, Literal, Literal)", str(parser.matcher) class RepeatTest(TestCase): def test_depth(self): matcher = Any()[:,...] matcher.config.clear() matcher = matcher.get_match_string() #print(repr(matcher.matcher)) results = [m for (m, _s) in matcher('abc')] assert results == [['abc'], ['ab'], ['a'], []], results def test_breadth(self): matcher = Any()[::'b',...] matcher.config.clear() matcher = matcher.get_match_string() results = [m for (m, _s) in matcher('abc')] assert results == [[], ['a'], ['ab'], ['abc']], results class ErrorTest(TestCase): def test_error(self): class TestException(Exception): pass @function_matcher def Error(supprt, stream): raise TestException('here') matcher = Error() matcher.config.clear() try: matcher.parse('a') except TestException: trace = format_exc() assert "TestException('here')" in trace, trace LEPL-5.1.3/src/lepl/core/_test/dynamic.py0000644000175000001440000000431311740101740020476 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Initial, minimal support for dynamic variables. This allows you to use a value found in matching as part of another matcher IN SOME RESTRICTED CASES. ''' from unittest import TestCase from lepl import Apply, UnsignedInteger, Repeat, Any from lepl.core.dynamic import IntVar class DynamicTest(TestCase): def test_lt(self): three = IntVar(3) assert three < 4 assert 2 < three assert 3 == three def test_dynamic(self): size = IntVar() header = Apply(UnsignedInteger(), size.setter()) body = Repeat(Any(), stop=size, add_=True) matcher = ~header & body matcher.config.no_compile_to_regexp().no_full_first_match() result = next(matcher.match_string("3abcd"))[0] assert result == ['abc'], result LEPL-5.1.3/src/lepl/core/_test/manager.py0000644000175000001440000000657511731117151020503 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.core.manager module. ''' from logging import basicConfig, DEBUG from unittest import TestCase from lepl.matchers.derived import Eos from lepl.matchers.core import Literal from lepl.support.lib import LogMixin # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, C0102, C0321, W0141, R0201, R0904 # (dude this is just a test) class LimitedDepthTest(LogMixin, TestCase): ''' The test here takes '****' and divides it amongst the matchers, all of which will take 0 to 4 matches. The number of different permutations depends on backtracking and varies depending on the queue length available. ''' def test_limited_depth(self): ''' These show something is happening. Whether they are exactly correct is another matter altogether... ''' #basicConfig(level=DEBUG) # there was a major bug here that made this test vary often # it should now be fixed self.assert_range(3, 'g', [15,1,1,1,3,3,6,6,10,10,10,15], 4) self.assert_range(3, 'b', [15,0,1,1,5,5,5,5,5,5,5,5,5,15], 4) self.assert_range(3, 'd', [15,1,1,3,3,6,6,10,10,10,15], 4) def assert_range(self, n_match, direcn, results, multiplier): for index in range(len(results)): queue_len = index * multiplier expr = Literal('*')[::direcn,...][n_match] & Eos() expr.config.clear().low_memory(queue_len) matcher = expr.get_match_string() self.assert_count(matcher, queue_len, index, results[index]) def assert_count(self, matcher, queue_len, index, count): results = list(matcher('****')) found = len(results) assert found == count, (queue_len, index, found, count) def test_single(self): #basicConfig(level=DEBUG) expr = Literal('*')[:,...][3] expr.config.clear().low_memory(5) match = expr.get_match_string()('*' * 4) list(match) LEPL-5.1.3/src/lepl/core/__init__.py0000644000175000001440000000274411731117151017504 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Core classes, related to parsing, config etc. ''' LEPL-5.1.3/src/lepl/core/rewriters.py0000644000175000001440000007276311731117215020004 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Rewriters modify the graph of matchers before it is used to generate a parser. ''' from lepl.matchers.memo import LMemo, RMemo from lepl.support.graph import Visitor, preorder, loops, order, NONTREE, \ dfs_edges, LEAF from lepl.matchers.combine import DepthFirst, DepthNoTrampoline, \ BreadthFirst, BreadthNoTrampoline, And, AndNoTrampoline, \ Or, OrNoTrampoline from lepl.matchers.core import Delayed, Lookahead from lepl.matchers.derived import add from lepl.matchers.matcher import Matcher, is_child, FactoryMatcher, \ matcher_type, MatcherTypeException, canonical_matcher_type from lepl.matchers.support import NoTrampoline, Transformable from lepl.support.lib import lmap, fmt, LogMixin, empty, count class Rewriter(LogMixin): ''' base class for rewriters, supporting a fixed ordering. ''' # ordering (SET_ARGUMENTS, COMPOSE_TRANSFORMS, FLATTEN, COMPILE_REGEXP, OPTIMIZE_OR, LEXER, DIRECT_EVALUATION, # memoize must come before anything that wraps a delayed node. this is # because the left-recursive memoizer uses delayed() instances as markers # for where to duplicate state for different paths through the call # graph; if these are wrapped or replaced then the assumptions made there # fail (and left-recursive parsers fail to match). MEMOIZE, TRACE_VARIABLES, FULL_FIRST_MATCH) = range(10, 110, 10) def __init__(self, order_, name=None, exclusive=True): super(Rewriter, self).__init__() self.order = order_ self.name = name if name else self.__class__.__name__ self.exclusive = exclusive def __eq__(self, other): if not isinstance(other, self.__class__): return False return self.exclusive or self is other def __ne__(self, other): return not self.__eq__(other) def __hash__(self): if self.exclusive: return hash(self.__class__) else: return super(Rewriter, self).__hash__() def __lt__(self, other): if not isinstance(other, Rewriter): return True elif self.exclusive or self.order != other.order: return self.order < other.order else: return hash(self) < hash(other) def __ge__(self, other): return not self.__lt__(other) def __gt__(self, other): if not isinstance(other, Rewriter): return True elif self.exclusive or self.order != other.order: return self.order > other.order else: return hash(self) > hash(other) def __le__(self, other): return not self.__gt__(other) def __call__(self, matcher): return matcher def __str__(self): return self.name def clone(i, j, node, args, kargs): ''' Clone a single node, including matcher-specific attributes. ''' from lepl.support.graph import clone as old_clone copy = old_clone(node, args, kargs) copy_standard_attributes(node, copy) return copy def copy_standard_attributes(node, copy): ''' Handle the additional attributes that matchers may have. ''' if isinstance(node, Transformable): copy.wrapper = node.wrapper if isinstance(node, FactoryMatcher): copy.factory = node.factory if hasattr(node, 'trace_variables'): copy.trace_variables = node.trace_variables def linearise_matcher(node): ''' Return `[(head, reversed), ...]` where each tuple describes a tree of matchers without loops. The first head is the root node. The reversed list contains nodes ordered children-first (except for `Delayed()` instances, whose children are other `head` elements). This allows us to clone a DAG of matchers in the same way as it was first created - by creating linear trees and then connecting the `Delayed()` instances. The postorder ordering is used to match the ordering in the more general iteration over matchers based on the graph support classes and helps keep things consistent (there was a strange issue where the `.tree()` display of a cloned graph differed from the original that, I think, was due to a different ordering). ''' linear = [] pending = [node] heads = set() while pending: node = pending.pop() if node not in heads: stack = [] def append(child): if isinstance(child, Matcher): if isinstance(child, Delayed): child.assert_matcher() pending.append(child.matcher) stack.append((child, empty())) else: stack.append((child, iter(child))) heads.add(node) append(node) # init stack def postorder(): while stack: (node, children) = stack[-1] try: append(next(children)) except StopIteration: yield stack.pop()[0] linear.append((node, list(postorder()))) return linear def clone_tree(i, head, reversed, mapping, delayed, clone, duplicate=False): ''' Clone a tree of matchers. This clones all the matchers in a linearised set, except for the `Delayed()` instances, which are re-created without their contents (these are set later, to connect the trees into the final matcher DAG). `i` is the index of the tree (0 for the first tree, which cannot be part of a loop itself). It is passed to the clone function. `head` is the root of the tree. `reversed` are the tree nodes in postorder `mapping` is a map from old to new node of all the nodes created. For `Delayed()` instances, if `duplicate=True`, then the new node is just one of possibly many copies. `clone` is the function used to create a new node instance. `duplicate` controls how `Delayed()` instances are handled. If true then a new instance is created for each one. This does not preserve the graph, but is used by memoisation. ''' def rewrite(value): try: if value in mapping: return mapping[value] except TypeError: pass return value n = len(reversed) for (j, node) in zip(count(n, -1), reversed): if isinstance(node, Delayed): if duplicate or node not in mapping: mapping[node] = clone(i, -j, node, (), {}) delayed.append((node, mapping[node])) else: if node not in mapping: (args, kargs) = node._constructor_args() args = lmap(rewrite, args) kargs = dict((name, rewrite(value)) for (name, value) in kargs.items()) copy = clone(i, j, node, args, kargs) mapping[node] = copy def clone_matcher(node, clone=clone, duplicate=False): ''' This used to be implemented using the graph support classes (`ConstructorWalker()` etc). But the left-recursive handling was unreliable and that was too opaque to debug easily. It's possible this code could now be moved back to that approach, as not everything here is used (the `j` index turned out not to be useful, for example). But this approach is easier to understand and I am not 100% sure that the code is correct, so I may need to continue working on this. `node` is the root of the matcher graph. `clone` is a function used to create new instances. `duplicate` controls how `Delayed()` instances are handled. If true then a new instance is created for each one. This does not preserve the graph, but is used by memoisation. ''' from lepl.regexp.rewriters import RegexpContainer trees = linearise_matcher(node) all_nodes = {} all_delayed = [] for (i, (head, reversed)) in enumerate(trees): clone_tree(i, head, reversed, all_nodes, all_delayed, clone, duplicate=duplicate) for (delayed, clone) in all_delayed: # this lets us delay forcing to matcher until last moment # we had bugs where this ended up being delegated to + clone.__iadd__(RegexpContainer.to_matcher(all_nodes[delayed.matcher])) return RegexpContainer.to_matcher(all_nodes[node]) def post_clone(function): ''' Generate a clone function that applies the given function to the newly constructed node, except for Delayed instances (which are effectively proxies and so have no functionality of their own) (so, when used with `DelayedClone`, effectively performs a map on the graph). ''' def new_clone(i, j, node, args, kargs): ''' Apply function as well as clone. ''' copy = clone(i, j, node, args, kargs) # ignore Delayed since that would (1) effectively duplicate the # action and (2) they come and go with each cloning. if not isinstance(node, Delayed): copy = function(copy) return copy return new_clone class Flatten(Rewriter): ''' A rewriter that flattens `And` and `Or` lists. ''' def __init__(self): super(Flatten, self).__init__(Rewriter.FLATTEN) def __call__(self, graph): def new_clone(i, j, node, old_args, kargs): ''' The flattening cloner. ''' new_args = [] type_ = matcher_type(node, fail=False) if type_ in map(canonical_matcher_type, [And, Or]): for arg in old_args: if matcher_type(arg, fail=False) is type_ and \ (not hasattr(arg, 'wrapper') or ((not arg.wrapper and not node.wrapper) or \ (arg.wrapper.functions == node.wrapper.functions and node.wrapper.functions == [add]))): new_args.extend(arg.matchers) else: new_args.append(arg) if not new_args: new_args = old_args return clone(i, j, node, new_args, kargs) return clone_matcher(graph, new_clone) class ComposeTransforms(Rewriter): ''' A rewriter that joins adjacent transformations into a single operation, avoiding trampolining in some cases. ''' def __init__(self): super(ComposeTransforms, self).__init__(Rewriter.COMPOSE_TRANSFORMS) def __call__(self, graph): from lepl.matchers.transform import Transform, Transformable def new_clone(i, j, node, args, kargs): ''' The joining cloner. ''' # must always clone to expose the matcher (which was cloned # earlier - it is not node.matcher) copy = clone(i, j, node, args, kargs) if isinstance(copy, Transform) \ and isinstance(copy.matcher, Transformable): return copy.matcher.compose(copy.wrapper) else: return copy return clone_matcher(graph, new_clone) class TraceVariables(Rewriter): ''' A rewriter needed for TraceVariables which adds the trace_variables attribute to untransformable matchers that need a transform. ''' def __init__(self): super(TraceVariables, self).__init__(Rewriter.TRACE_VARIABLES) def __call__(self, graph): from lepl.matchers.transform import Transform def new_clone(i, j, node, args, kargs): ''' The joining cloner. ''' # must always clone to expose the matcher (which was cloned # earlier - it is not node.matcher) copy = clone(i, j, node, args, kargs) if hasattr(node, 'trace_variables') and node.trace_variables: return Transform(copy, node.trace_variables) else: return copy return clone_matcher(graph, new_clone) class RightMemoize(Rewriter): ''' A rewriter that adds RMemo to all nodes in the matcher graph. ''' def __init__(self): super(RightMemoize, self).__init__(Rewriter.MEMOIZE, 'Right memoize') def __call__(self, graph): return clone_matcher(graph, post_clone(RMemo)) class LeftMemoize(Rewriter): ''' A rewriter that adds LMemo to all nodes in the matcher graph. ''' def __init__(self, d=0): super(LeftMemoize, self).__init__(Rewriter.MEMOIZE, 'Left memoize') self.d = d def __call__(self, graph): def new_clone(i, j, node, args, kargs): copy = clone(i, j, node, args, kargs) return self.memoize(i, j, self.d, copy, LMemo) return clone_matcher(graph, new_clone, duplicate=True) @staticmethod def memoize(i, j, d, copy, memo): if j > 0: def open(depth, length): return False curtail = open elif d: def fixed(depth, length): return depth > i * d curtail = fixed else: def slen(depth, length): return depth > i * length curtail = slen return memo(copy, curtail) class AutoMemoize(Rewriter): ''' Apply two different memoizers, one to left recursive loops and the other elsewhere (either can be omitted). `conservative` refers to the algorithm used to detect loops: `None` will use the left memoizer on all nodes except the initial tree `True` will detect all possible loops (should be very similar to `None`) `False` detects only left-most loops and may miss some loops. `d` is a parameter that controls the depth to which repeated left-recursion may occur. If `None` then the length of the remaining input is used. If set, parsers are more efficient, but less likely to match input correctly. ''' def __init__(self, conservative=None, left=None, right=None, d=0): super(AutoMemoize, self).__init__(Rewriter.MEMOIZE, fmt('AutoMemoize({0}, {1}, {2})', conservative, left, right)) self.conservative = conservative self.left = left self.right = right self.d = d def __call__(self, graph): dangerous = set() for head in order(graph, NONTREE, Matcher): for loop in either_loops(head, self.conservative): for node in loop: dangerous.add(node) def new_clone(i, j, node, args, kargs): ''' Clone with the appropriate memoizer (cannot use post_clone as need to test original) ''' copy = clone(i, j, node, args, kargs) if (self.conservative is None and i) or node in dangerous: if self.left: return LeftMemoize.memoize(i, j, self.d, copy, self.left) else: return copy else: if self.right: return self.right(copy) else: return copy return clone_matcher(graph, new_clone, duplicate=True) def left_loops(node): ''' Return (an estimate of) all left-recursive loops from the given node. We cannot know for certain whether a loop is left recursive because we don't know exactly which parsers will consume data. But we can estimate by assuming that all matchers eventually (ie via their children) consume something. We can also improve that slightly by ignoring `Lookahead`. So we estimate left-recursive loops as paths that start and end at the given node, and which are first children of intermediate nodes unless the node is `Or`, or the preceding matcher is a `Lookahead`. Each loop is a list that starts and ends with the given node. ''' stack = [[node]] known = set([node]) # avoid getting lost in embedded loops while stack: ancestors = stack.pop() parent = ancestors[-1] if isinstance(parent, Matcher): for child in parent: family = list(ancestors) + [child] if child is node: yield family else: try: if child not in known: stack.append(family) known.add(child) # random attribute that is list, etc except TypeError: pass if not is_child(parent, Or, fail=False) and \ not is_child(child, Lookahead, fail=False): break def either_loops(node, conservative): ''' Select between the conservative and liberal loop detection algorithms. ''' if conservative: return loops(node, Matcher) else: return left_loops(node) class OptimizeOr(Rewriter): ''' A rewriter that re-arranges `Or` matcher contents for left--recursive loops. When a left-recursive rule is used, it is much more efficient if it appears last in an `Or` statement, since that forces the alternates (which correspond to the terminating case in a recursive function) to be tested before the LMemo limit is reached. This rewriting may change the order in which different results for an ambiguous grammar are returned. `conservative` refers to the algorithm used to detect loops; False may classify some left--recursive loops as right--recursive. ''' def __init__(self, conservative=True): super(OptimizeOr, self).__init__(Rewriter.OPTIMIZE_OR) self.conservative = conservative def __call__(self, graph): self._warn('Alternatives are being re-ordered to improve stability with left-recursion.\n' 'This will change the ordering of results.') #raise Exception('wtf') for delayed in [x for x in preorder(graph, Matcher) if isinstance(x, Delayed)]: for loop in either_loops(delayed, self.conservative): for i in range(len(loop)): if is_child(loop[i], Or, fail=False): # we cannot be at the end of the list here, since that # is a Delayed instance # copy from tuple to list loop[i].matchers = list(loop[i].matchers) matchers = loop[i].matchers target = loop[i+1] # move target to end of list index = matchers.index(target) del matchers[index] matchers.append(target) return graph class SetArguments(Rewriter): ''' Add/replace named arguments while cloning. This rewriter is not exclusive - several different instances canb be defined in parallel. ''' def __init__(self, type_, **extra_kargs): super(SetArguments, self).__init__(Rewriter.SET_ARGUMENTS, fmt('SetArguments({0}, {1})', type_, extra_kargs), False) self.type = type_ self.extra_kargs = extra_kargs def __call__(self, graph): def new_clone(i, j, node, args, kargs): ''' As clone, but add in any extra kargs if the node is an instance of the given type. ''' if isinstance(node, self.type): for key in self.extra_kargs: kargs[key] = self.extra_kargs[key] return clone(i, j, node, args, kargs) return clone_matcher(graph, new_clone) class DirectEvaluation(Rewriter): ''' Replace given matchers if all Matcher arguments are subclasses of `NoTrampolineWrapper` `spec` is a map from original matcher type to the replacement. ''' def __init__(self, spec=None): super(DirectEvaluation, self).__init__(Rewriter.DIRECT_EVALUATION, fmt('DirectEvaluation({0})', spec)) if spec is None: spec = {DepthFirst: DepthNoTrampoline, BreadthFirst: BreadthNoTrampoline, And: AndNoTrampoline, Or: OrNoTrampoline} self.spec = spec def __call__(self, graph): def new_clone(i, j, node, args, kargs): type_, ok = None, False for parent in self.spec: if is_child(node, parent): type_ = self.spec[parent] if type_: ok = True for arg in args: if isinstance(arg, Matcher) and not \ isinstance(arg, NoTrampoline): ok = False for name in kargs: arg = kargs[name] if isinstance(arg, Matcher) and not \ isinstance(arg, NoTrampoline): ok = False if not ok: type_ = type(node) try: copy = type_(*args, **kargs) copy_standard_attributes(node, copy) return copy except TypeError as err: raise TypeError(fmt('Error cloning {0} with ({1}, {2}): {3}', type_, args, kargs, err)) return clone_matcher(graph, new_clone) class FullFirstMatch(Rewriter): ''' If the parser fails, raise an error at the maxiumum depth. `eos` controls whether or not the entire input must be consumed for the parse to be considered a success. ''' def __init__(self, eos=False): super(FullFirstMatch, self).__init__(Rewriter.FULL_FIRST_MATCH, fmt('FullFirstMatch({0})', eos)) self.eos = eos def __call__(self, graph): from lepl.stream.maxdepth import FullFirstMatch return FullFirstMatch(graph, self.eos) class NodeStats(object): ''' Provide statistics and access by type to nodes. ''' def __init__(self, matcher=None): self.loops = 0 self.leaves = 0 self.total = 0 self.others = 0 self.duplicates = 0 self.unhashable = 0 self.types = {} self.__known = set() if matcher is not None: self.add_all(matcher) def add(self, type_, node): ''' Add a node of a given type. ''' try: node_type = matcher_type(node) except MatcherTypeException: node_type = type(node) if type_ & LEAF: self.leaves += 1 if type_ & NONTREE and is_child(node_type, Matcher, fail=False): self.loops += 1 try: if node not in self.__known: self.__known.add(node) if node_type not in self.types: self.types[node_type] = set() self.types[node_type].add(node) if is_child(node_type, Matcher): self.total += 1 else: self.others += 1 else: self.duplicates += 1 except: self.unhashable += 1 def add_all(self, matcher): ''' Add all nodes. ''' for (_parent, child, type_) in dfs_edges(matcher, Matcher): self.add(type_, child) def __str__(self): counts = fmt('total: {total:3d}\n' 'leaves: {leaves:3d}\n' 'loops: {loops:3d}\n' 'duplicates: {duplicates:3d}\n' 'others: {others:3d}\n' 'unhashable: {unhashable:3d}\n', **self.__dict__) keys = list(self.types.keys()) keys.sort(key=repr) types = '\n'.join([fmt('{0:40s}: {1:3d}', key, len(self.types[key])) for key in keys]) return counts + types def __eq__(self, other): ''' Quick and dirty equality ''' return str(self) == str(other) class NodeStats2(object): ''' Avoid using graph code (so we can check that...) ''' def __init__(self, node): self.total = 0 self.leaves = 0 self.duplicates = 0 self.types = {} known = set() stack = [node] while stack: node = stack.pop() if node in known: self.duplicates += 1 else: known.add(node) self.total += 1 type_ = type(node) if type_ not in self.types: self.types[type_] = 0 self.types[type_] += 1 children = [child for child in node if isinstance(child, Matcher)] if not children: self.leaves += 1 else: stack.extend(children) def __str__(self): counts = fmt('total: {total:3d}\n' 'leaves: {leaves:3d}\n' 'duplicates: {duplicates:3d}\n', **self.__dict__) keys = list(self.types.keys()) keys.sort(key=repr) types = '\n'.join([fmt('{0:40s}: {1:3d}', key, self.types[key]) for key in keys]) return counts + types def __eq__(self, other): ''' Quick and dirty equality ''' return str(self) == str(other) #class DelayedClone(Visitor): # ''' # A version of `Clone()` that uses `Delayed()` rather # that `Proxy()` to handle circular references. Also caches results to # avoid duplications. # ''' # # def __init__(self, clone_=clone): # super(DelayedClone, self).__init__() # self._clone = clone_ # self._visited = {} # self._loops = set() # self._node = None # # def loop(self, node): # ''' # This is called for nodes that are involved in cycles when they are # needed as arguments but have not themselves been cloned. # ''' # if node not in self._visited: # self._visited[node] = Delayed() # self._loops.add(node) # return self._visited[node] # # def node(self, node): # ''' # Store the current node. # ''' # self._node = node # # def constructor(self, *args, **kargs): # ''' # Clone the node, taking care to handle loops. # ''' # if self._node not in self._visited: # self._visited[self._node] = self.__clone_node(args, kargs) # # if this is one of the loops we replaced with a delayed instance, # # then we need to patch the delayed matcher # elif self._node in self._loops and \ # not self._visited[self._node].matcher: # self._visited[self._node] += self.__clone_node(args, kargs) # return self._visited[self._node] # # def __clone_node(self, args, kargs): # ''' # Before cloning, drop any Delayed from args and kargs. Afterwards, # check if this is a Delaed instance and, if so, return the contents. # This helps keep the number of Delayed instances from exploding. # ''' ## args = lmap(self.__drop, args) ## kargs = dict((key, self.__drop(kargs[key])) for key in kargs) ## return self.__drop(self._clone(self._node, args, kargs)) # return self._clone(self._node, args, kargs) # # # not needed now Delayed dynamically sets _match() # # also, will break new cloning ## @staticmethod ## def __drop(node): ## ''' ## Filter `Delayed` instances where possible (if they have the matcher ## defined and are nor transformed). ## ''' ## # delayed import to avoid dependency loops ## from lepl.matchers.transform import Transformable ## if isinstance(node, Delayed) and node.matcher and \ ## not (isinstance(node, Transformable) and node.wrapper): ## return node.matcher ## else: ## return node # # def leaf(self, value): # ''' # Leaf values are unchanged. # ''' # return value LEPL-5.1.3/src/lepl/core/monitor.py0000644000175000001440000002053111731117151017426 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Support for classes that monitor the execution process (for example, managing resources and tracing program flow). See `trampoline()`. ''' class ValueMonitor(object): ''' An interface expected by `trampoline()`, called to track data flow. ''' def next_iteration(self, epoch, value, exception, stack): ''' Called at the start of each iteration. ''' pass def before_next(self, generator): ''' Called before invoking ``next`` on a generator. ''' pass def after_next(self, value): ''' Called after invoking ``next`` on a generator. ''' pass def before_throw(self, generator, value): ''' Called before invoking ``throw`` on a generator. ''' pass def after_throw(self, value): ''' Called after invoking ``throw`` on a generator. ''' pass def before_send(self, generator, value): ''' Called before invoking ``send`` on a generator. ''' pass def after_send(self, value): ''' Called after invoking ``send`` on a generator. ''' pass def exception(self, value): ''' Called when an exception is caught (instead of any 'after' method). ''' pass def raise_(self, value): ''' Called before raising an exception to the caller. ''' pass def yield_(self, value): ''' Called before yielding a value to the caller. ''' pass class StackMonitor(object): ''' An interface expected by `trampoline()`, called to track stack growth. ''' def push(self, generator): ''' Called before adding a generator to the stack. ''' pass def pop(self, generator): ''' Called after removing a generator from the stack. ''' pass class ActiveMonitor(StackMonitor): ''' A `StackMonitor` implementation that allows matchers that implement the interface on_push/on_pop to be called. Generators can interact with active monitors if: 1. The monitor extends this class 2. The matcher has a monitor_class attribute whose value is equal to (or a subclass of) the monitor class it will interact with ''' def push(self, generator): ''' Called when a generator is pushed onto the trampoline stack. ''' if hasattr(generator.matcher, 'monitor_class') and \ isinstance(self, generator.matcher.monitor_class): generator.matcher.on_push(self) def pop(self, generator): ''' Called when a generator is popped from the trampoline stack. ''' if hasattr(generator.matcher, 'monitor_class') and \ isinstance(self, generator.matcher.monitor_class): generator.matcher.on_pop(self) class MultipleValueMonitors(ValueMonitor): ''' Combine several value monitors into one. ''' def __init__(self, monitors=None): super(MultipleValueMonitors, self).__init__() self._monitors = [] if monitors is None else monitors def append(self, monitor): ''' Add another monitor to the chain. ''' self._monitors.append(monitor) def __len__(self): return len(self._monitors) def next_iteration(self, epoch, value, exception, stack): ''' Called at the start of each iteration. ''' for monitor in self._monitors: monitor.next_iteration(epoch, value, exception, stack) def before_next(self, generator): ''' Called before invoking ``next`` on a generator. ''' for monitor in self._monitors: monitor.before_next(generator) def after_next(self, value): ''' Called after invoking ``next`` on a generator. ''' for monitor in self._monitors: monitor.after_next(value) def before_throw(self, generator, value): ''' Called before invoking ``throw`` on a generator. ''' for monitor in self._monitors: monitor.before_throw(generator, value) def after_throw(self, value): ''' Called after invoking ``throw`` on a generator. ''' for monitor in self._monitors: monitor.after_throw(value) def before_send(self, generator, value): ''' Called before invoking ``send`` on a generator. ''' for monitor in self._monitors: monitor.before_send(generator, value) def after_send(self, value): ''' Called after invoking ``send`` on a generator. ''' for monitor in self._monitors: monitor.after_send(value) def exception(self, value): ''' Called when an exception is caught (instead of any 'after' method). ''' for monitor in self._monitors: monitor.exception(value) def raise_(self, value): ''' Called before raising an exception to the caller. ''' for monitor in self._monitors: monitor.raise_(value) def yield_(self, value): ''' Called before yielding a value to the caller. ''' for monitor in self._monitors: monitor.yield_(value) class MultipleStackMonitors(StackMonitor): ''' Combine several stack monitors into one. ''' def __init__(self, monitors=None): super(MultipleStackMonitors, self).__init__() self._monitors = [] if monitors is None else monitors def append(self, monitor): ''' Add another monitor to the chain. ''' self._monitors.append(monitor) def __len__(self): return len(self._monitors) def push(self, value): ''' Called before adding a generator to the stack. ''' for monitor in self._monitors: monitor.push(value) def pop(self, value): ''' Called after removing a generator from the stack. ''' for monitor in self._monitors: monitor.pop(value) def prepare_monitors(monitor_factories): ''' Take a list of monitor factories and return an active and a passive monitor (or None, if none given). ''' stack, value = MultipleStackMonitors(), MultipleValueMonitors() monitor_factories = [] if monitor_factories is None else monitor_factories for monitor_factory in monitor_factories: monitor = monitor_factory() if isinstance(monitor, StackMonitor): stack.append(monitor) if isinstance(monitor, ValueMonitor): value.append(monitor) return (stack if stack else None, value if value else None) LEPL-5.1.3/src/lepl/core/config.py0000644000175000001440000010463411731117215017214 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' The main configuration object and various standard configurations. ''' # pylint bug? # pylint: disable-msg=W0404 from collections import namedtuple from lepl.core.parser import make_raw_parser, make_single, make_multiple from lepl.stream.factory import DEFAULT_STREAM_FACTORY Configuration = namedtuple('Configuration', 'rewriters monitors stream_factory stream_kargs') '''Carrier for configuration.''' class ConfigurationError(Exception): ''' Error raised for problems with configuration. ''' pass class ConfigBuilder(object): ''' Accumulate configuration through chained methods. ''' def __init__(self, matcher): # we need to delay startup, to avoid loops self.__started = False # this is set whenever any config is changed. it is cleared when # the configuration is read. so if is is false then the configuration # is the same as previously read self.__changed = True self.__rewriters = set() self.__monitors = [] self.__stream_factory = DEFAULT_STREAM_FACTORY self.__alphabet = None self.__stream_kargs = {} # this is set from the matcher. it gives a memory loop, but not a # very serious one, and allows single line configuration which is # useful for timing. self.matcher = matcher def __start(self): ''' Set default values on demand to avoid dependency loop. ''' if not self.__started: self.__started = True self.default() # raw access to basic components def add_rewriter(self, rewriter): ''' Add a rewriter that will be applied to the matcher graph when the parser is generated. ''' self.__start() self.clear_cache() # we need to remove before adding to ensure last added is the one # used (exclusive rewriters are equal) if rewriter in self.__rewriters: self.__rewriters.remove(rewriter) self.__rewriters.add(rewriter) return self def remove_rewriter(self, rewriter): ''' Remove a rewriter from the current configuration. ''' self.__start() self.clear_cache() self.__rewriters = set(r for r in self.__rewriters if r is not rewriter) return self def remove_all_rewriters(self, type_=None): ''' Remove all rewriters of a given type from the current configuration. ''' self.__start() self.clear_cache() if type_: self.__rewriters = set(r for r in self.__rewriters if not isinstance(r, type_)) else: self.__rewriters = set() return self def add_monitor(self, monitor): ''' Add a monitor to the current configuration. Monitors are called from within the trampolining process and can be used to track evaluation, control resource use, etc. ''' self.__start() self.clear_cache() self.__monitors.append(monitor) return self def remove_all_monitors(self): ''' Remove all monitors from the current configuration. ''' self.__start() self.clear_cache() self.__monitors = [] return self def stream_factory(self, stream_factory=DEFAULT_STREAM_FACTORY): ''' Specify the stream factory. This is used to generate the input stream for the parser. ''' self.__start() self.clear_cache() self.__stream_factory = stream_factory return self def add_stream_kargs(self, ** kargs): ''' Add a value for passing to the stream factory. ''' for name in kargs: self.__stream_kargs[name] = kargs[name] return self def remove_all_stream_kargs(self): ''' Remove all values passed to the stream factory. ''' self.__stream_kargs = {} @property def configuration(self): ''' The current configuration (rewriters, monitors, stream_factory). ''' self.__start() self.__changed = False rewriters = list(self.__rewriters) rewriters.sort() return Configuration(rewriters, list(self.__monitors), self.__stream_factory, dict(self.__stream_kargs)) def __get_alphabet(self): ''' Get the alphabet used. Typically this is Unicode, which is the default. It is needed for the generation of regular expressions. ''' from lepl.regexp.unicode import UnicodeAlphabet if not self.__alphabet: self.__alphabet = UnicodeAlphabet.instance() return self.__alphabet def alphabet(self, alphabet): ''' Set the alphabet used. It is needed for the generation of regular expressions, for example (but the default, for Unicode, is usually sufficient). ''' if alphabet: if self.__alphabet: if self.__alphabet != alphabet: raise ConfigurationError( 'Alphabet has changed during configuration ' '(perhaps the default was already used?)') else: self.__alphabet = alphabet self.__start() self.clear_cache() @property def changed(self): ''' Has the config been changed by the user since it was last returned via `configuration`? if not, any previously generated parser can be reused. ''' return self.__changed def clear_cache(self): ''' Force calculation of a new parser. ''' self.__changed = True # rewriters def set_arguments(self, type_, **kargs): ''' Set the given keyword arguments on all matchers of the given `type_` (ie class) in the grammar. ''' from lepl.core.rewriters import SetArguments return self.add_rewriter(SetArguments(type_, **kargs)) def no_set_arguments(self): ''' Remove all rewriters that set arguments. ''' from lepl.core.rewriters import SetArguments return self.remove_all_rewriters(SetArguments) def set_alphabet_arg(self, alphabet=None): ''' Set `alphabet` on various matchers. This is useful when using an unusual alphabet (most often when using line-aware parsing), as it saves having to specify it on each matcher when creating the grammar. ''' from lepl.regexp.matchers import BaseRegexp from lepl.lexer.matchers import BaseToken if alphabet: self.alphabet(alphabet) else: alphabet = self.__get_alphabet() if not alphabet: raise ValueError('An alphabet must be provided or already set') self.set_arguments(BaseRegexp, alphabet=alphabet) self.set_arguments(BaseToken, alphabet=alphabet) return self def full_first_match(self, eos=True): ''' Raise an error if the first match fails. If `eos` is True then this requires that the entire input is matched, otherwise it only requires that the matcher succeed. The exception includes information about the deepest read to the stream (which is a good indication of where any error occurs). This is part of the default configuration. It can be removed with `no_full_first_match()`. ''' from lepl.core.rewriters import FullFirstMatch return self.add_rewriter(FullFirstMatch(eos)) def no_full_first_match(self): ''' Disable the automatic generation of an error if the first match fails. ''' from lepl.core.rewriters import FullFirstMatch return self.remove_all_rewriters(FullFirstMatch) def flatten(self): ''' Combined nested `And()` and `Or()` matchers. This does not change the parser semantics, but improves efficiency. This is part of the default configuration. It can be removed with `no_flatten`. ''' from lepl.core.rewriters import Flatten return self.add_rewriter(Flatten()) def no_flatten(self): ''' Disable the combination of nested `And()` and `Or()` matchers. ''' from lepl.core.rewriters import Flatten return self.remove_all_rewriters(Flatten) def compile_to_dfa(self, force=False, alphabet=None): ''' Compile simple matchers to DFA regular expressions. This improves efficiency but may change the parser semantics slightly (DFA regular expressions do not provide backtracking / alternative matches). ''' from lepl.regexp.matchers import DfaRegexp from lepl.regexp.rewriters import CompileRegexp self.alphabet(alphabet) return self.add_rewriter( CompileRegexp(self.__get_alphabet(), force, DfaRegexp)) def compile_to_nfa(self, force=False, alphabet=None): ''' Compile simple matchers to NFA regular expressions. This improves efficiency and should not change the parser semantics. This is part of the default configuration. It can be removed with `no_compile_regexp`. ''' from lepl.regexp.matchers import NfaRegexp from lepl.regexp.rewriters import CompileRegexp self.alphabet(alphabet) return self.add_rewriter( CompileRegexp(self.__get_alphabet(), force, NfaRegexp)) def compile_to_re(self, force=False, alphabet=None): ''' Compile simple matchers to re (C library) regular expressions. This improves efficiency but may change the parser semantics slightly (DFA regular expressions do not provide backtracking / alternative matches). ''' from lepl.matchers.core import Regexp from lepl.regexp.rewriters import CompileRegexp def regexp_wrapper(regexp, _alphabet): ''' Adapt the Regexp matcher to the form needed (forcing Unicode). ''' return Regexp(str(regexp)) self.alphabet(alphabet) return self.add_rewriter( CompileRegexp(self.__get_alphabet(), force, regexp_wrapper)) def no_compile_to_regexp(self): ''' Disable compilation of simple matchers to regular expressions. ''' from lepl.regexp.rewriters import CompileRegexp return self.remove_all_rewriters(CompileRegexp) def optimize_or(self, conservative=False): ''' Rearrange arguments to `Or()` so that left-recursive matchers are tested last. This improves efficiency, but may alter the parser semantics (the ordering of multiple results with ambiguous grammars may change). `conservative` refers to the algorithm used to detect loops; False may classify some left--recursive loops as right--recursive. ''' from lepl.core.rewriters import OptimizeOr return self.add_rewriter(OptimizeOr(conservative)) def no_optimize_or(self): ''' Disable the re-ordering of some `Or()` arguments. ''' from lepl.core.rewriters import OptimizeOr return self.remove_all_rewriters(OptimizeOr) def lexer(self, alphabet=None, discard=None, lexer=None): ''' Detect the use of `Token()` and modify the parser to use the lexer. If tokens are not used, this has no effect on parsing. This is part of the default configuration. It can be disabled with `no_lexer`. ''' from lepl.lexer.rewriters import AddLexer self.alphabet(alphabet) return self.add_rewriter( AddLexer(alphabet=self.__get_alphabet(), discard=discard, lexer=lexer)) def no_lexer(self): ''' Disable support for the lexer. ''' from lepl.lexer.rewriters import AddLexer self.remove_all_rewriters(AddLexer) def direct_eval(self, spec=None): ''' Combine simple matchers so that they are evaluated without trampolining. This improves efficiency (particularly because it reduces the number of matchers that can be memoized). This is part of the default configuration. It can be removed with `no_direct_eval`. ''' from lepl.core.rewriters import DirectEvaluation return self.add_rewriter(DirectEvaluation(spec)) def no_direct_eval(self): ''' Disable direct evaluation. ''' from lepl.core.rewriters import DirectEvaluation return self.remove_all_rewriters(DirectEvaluation) def compose_transforms(self): ''' Combine transforms (functions applied to results) with matchers. This may improve efficiency. This is part of the default configuration. It can be removed with `no_compose_transforms`. ''' from lepl.core.rewriters import ComposeTransforms return self.add_rewriter(ComposeTransforms()) def no_compose_transforms(self): ''' Disable the composition of transforms. ''' from lepl.core.rewriters import ComposeTransforms return self.remove_all_rewriters(ComposeTransforms) def auto_memoize(self, conservative=None, full=True, d=0): ''' This configuration attempts to detect which memoizer is most effective for each matcher. As such it is a general "fix" for left-recursive grammars and is suggested in the warning shown when the right-only memoizer detects left recursion. Lepl does not guarantee that all left-recursive grammars are handled correctly. The corrections applied may be incomplete and can be inefficient. It is always better to re-write a grammar to avoid left-recursion. One way to improve efficiency, at the cost of less accurate matching, is to specify a non-zero ``d`` parameter - this is the maximum iteration depth that will be used (by default, when ``d`` is zero, it is the length of the remaining input, which can be very large). ''' from lepl.core.rewriters import AutoMemoize from lepl.matchers.memo import LMemo, RMemo self.no_memoize() return self.add_rewriter( AutoMemoize(conservative=conservative, left=LMemo, right=RMemo if full else None, d=d)) def left_memoize(self, d=0): ''' Add memoization that may detect and stabilise left-recursion. This makes the parser more robust (so it can handle more grammars) but also more complex (and probably slower). ``config.auto_memoize()`` will also add memoization, but will select left/right memoization depending on the path through the parser. Lepl does not guarantee that all left-recursive grammars are handled correctly. The corrections applied may be incomplete and can be inefficient. It is always better to re-write a grammar to avoid left-recursion. One way to improve efficiency, at the cost of less accurate matching, is to specify a non-zero ``d`` parameter - this is the maximum iteration depth that will be used (by default, when ``d`` is zero, it is the length of the remaining input, which can be very large). ''' from lepl.core.rewriters import LeftMemoize self.no_memoize() return self.add_rewriter(LeftMemoize(d)) def right_memoize(self): ''' Add memoization that can make some complex parsers (with a lot of backtracking) more efficient. This also detects left-recursive grammars and displays a suitable warning. This is included in the default configuration. For simple grammars it may make things slower; it can be disabled by ``config.no_memoize()``. ''' from lepl.core.rewriters import RightMemoize self.no_memoize() return self.add_rewriter(RightMemoize()) def no_memoize(self): ''' Remove memoization. To use the default configuration without memoization (which may be faster in some cases), specify `config.no_memoize()`. ''' from lepl.core.rewriters import AutoMemoize, LeftMemoize, RightMemoize self.remove_all_rewriters(LeftMemoize) self.remove_all_rewriters(RightMemoize) return self.remove_all_rewriters(AutoMemoize) def lines(self, discard=None, tabsize=8, block_policy=None, block_start=None): ''' Configure "offside parsing". This enables lexing and adds extra tokens to mark the start and end of lines. If block_policy is specified then the line start token will also include spaces which can be used by the ``Block()`` and ``BLine()`` matchers to do offside (whitespace-aware) parsing. `discard` is the regular expression to use to identify spaces between tokens (by default, spaces and tabs). The remaining parameters are used only if at least one of `block_policy` and `block_start` is given. `block_policy` decides how indentation if calculated. See `explicit` etc in lepl.lexer.blocks.matchers. `block_start` is the initial indentation (by default, zero). If set to lepl.lexer.lines.matchers.NO_BLOCKS indentation will not be checked (useful for tests). `tabsize` is used only if `block_policy` is given. It is the number of spaces used to replace a leading tab (no replacement if None). ''' from lepl.lexer.lines.lexer import make_offside_lexer from lepl.lexer.lines.matchers import Block, DEFAULT_POLICY, LineStart from lepl.lexer.lines.monitor import block_monitor blocks = block_policy is not None or block_start is not None if blocks: if block_start is None: block_start = 0 if block_policy is None: block_policy = DEFAULT_POLICY self.add_monitor(block_monitor(block_start)) self.set_arguments(Block, policy=block_policy) else: self.set_arguments(LineStart, indent=False) self.lexer(self.__get_alphabet(), discard, make_offside_lexer(tabsize, blocks)) return self # monitors def trace_stack(self, enabled=False): ''' Add a monitor to trace results using `TraceStack()`. This is not used by default as it has a cost at runtime. ''' from lepl.core.trace import TraceStack return self.add_monitor(TraceStack(enabled)) def trace_variables(self): ''' Add a monitor to correctly insert the transforms needed when using the `TraceVariables()` context: with TraceVariables(): ... This is used by default as it has no runtime cost (once the parser is created). ''' from lepl.core.rewriters import TraceVariables return self.add_rewriter(TraceVariables()) def low_memory(self, queue_len=100): ''' Reduce memory use (at the expense of backtracking). This will: - Add a monitor to manage resources. See `GeneratorManager()`. - Disable direct evaluation (more trampolining gives more scope for removing generators) - Disable the full first match error (which requires a copy of the input for the error message) - Disable memoisation (which would keep input in memory) This reduces memory usage, but makes the parser less reliable. Usually a value like 100 (the default) for the queue length will make memory use insignificant and still give a useful first parse. Note that, although the parser will use less memory, it may run more slowly (as extra work needs to be done to "clean out" the stored values). ''' from lepl.core.manager import GeneratorManager self.add_monitor(GeneratorManager(queue_len)) self.no_direct_eval() self.no_memoize() self.no_full_first_match() self.cache_level(-9) return self def cache_level(self, level=1): ''' Control when the stream can be cached internally (this is used for debugging and error messages) - streams are cached for debugging when the value is greater than zero. The value is incremented each time a new stream is constructed (eg when constructing tokens). A value of 1 implies that a stream would be always cached. A value of 0 might be used when iterating over a file with the lexer - the iteration is not cached, but individual tokens will be. ''' self.add_stream_kargs(cache_level=level) def record_deepest(self, n_before=6, n_results_after=2, n_done_after=2): ''' Add a monitor to record deepest match. See `RecordDeepest()`. ''' from lepl.core.trace import RecordDeepest return self.add_monitor( RecordDeepest(n_before, n_results_after, n_done_after)) # packages def clear(self): ''' Delete any earlier configuration and disable the default (so no rewriters or monitors are used). ''' self.__started = True self.clear_cache() self.__rewriters = set() self.__monitors = [] self.__stream_factory = DEFAULT_STREAM_FACTORY self.__alphabet = None return self def default(self): ''' Provide the default configuration (deleting what may have been configured previously). This is equivalent to the initial configuration. It provides a moderately efficient, stable parser. ''' self.clear() self.flatten() self.trace_variables() self.compose_transforms() self.lexer() self.right_memoize() self.direct_eval() self.compile_to_nfa() self.full_first_match() return self class ParserMixin(object): ''' Methods to configure and generate a parser or matcher. ''' def __init__(self, *args, **kargs): super(ParserMixin, self).__init__(*args, **kargs) self.config = ConfigBuilder(self) self.__raw_parser_cache = None self.__from = None # needed to check cache is valid def _raw_parser(self, from_=None): ''' Provide the parser. This underlies the "fancy" methods below. ''' if self.config.changed or self.__raw_parser_cache is None \ or self.__from != from_: config = self.config.configuration self.__from = from_ if from_: stream_factory = \ getattr(config.stream_factory, 'from_' + from_) else: stream_factory = config.stream_factory # __call__ self.__raw_parser_cache = \ make_raw_parser(self, stream_factory, config) return self.__raw_parser_cache def get_match_file(self): ''' Get a function that will parse the contents of a file, returning a sequence of (results, stream) pairs. The data will be read as required (using an iterator), so the file must remain open during parsing. To avoid this, read all data into a string and parse that. ''' return self._raw_parser('file') def get_match_iterable(self): ''' Get a function that will parse the contents of an iterable (eg. a generator), returning a sequence of (results, stream) pairs. The data will be read as required. ''' return self._raw_parser('iterable') def get_match_list(self): ''' Get a function that will parse the contents of a list returning a sequence of (results, stream) pairs. ''' return self._raw_parser('list') def get_match_string(self,): ''' Get a function that will parse the contents of a string returning a sequence of (results, stream) pairs. ''' return self._raw_parser('string') def get_match_sequence(self): ''' Get a function that will parse the contents of a generic sequence (with [] and len()) returning a sequence of (results, stream) pairs. ''' return self._raw_parser('sequence') def get_match(self): ''' Get a function that will parse input, returning a sequence of (results, stream) pairs. The type of stream is inferred from the input to the parser. ''' return self._raw_parser() def match_file(self, file_, **kargs): ''' Parse the contents of a file, returning a sequence of (results, stream) pairs. The data will be read as required (using an iterator), so the file must remain open during parsing. To avoid this, read all data into a string and parse that. ''' return self.get_match_file()(file_, **kargs) def match_iterable(self, iterable, **kargs): ''' Parse the contents of an iterable (eg. a generator), returning a sequence of (results, stream) pairs. The data will be read as required. ''' return self.get_match_iterable()(iterable, **kargs) def match_list(self, list_, **kargs): ''' Parse the contents of a list returning a sequence of (results, stream) pairs. ''' return self.get_match_list()(list_, **kargs) def match_string(self, string, **kargs): ''' Parse the contents of a string, returning a sequence of (results, stream) pairs. ''' return self.get_match_string()(string, **kargs) def match_sequence(self, sequence, **kargs): ''' Parse the contents of a generic sequence (with [] and len()) returning a sequence of (results, stream) pairs. ''' return self.get_match_sequence()(sequence, **kargs) def match(self, input_, **kargs): ''' Parse input, returning a sequence of (results, stream) pairs. The type of stream is inferred from the input. ''' return self.get_match()(input_, **kargs) def get_parse_file(self): ''' Get a function that will parse the contents of a file, returning a single match. The data will be read as required (using an iterator), so the file must remain open during parsing. To avoid this, read all data into a string and parse that. ''' return make_single(self.get_match_file()) def get_parse_iterable(self): ''' Get a function that will parse the contents of an iterable (eg. a generator), returning a single match. The data will be read as required. ''' return make_single(self.get_match_iterable()) def get_parse_list(self): ''' Get a function that will parse the contents of a list returning a single match. ''' return make_single(self.get_match_list()) def get_parse_string(self): ''' Get a function that will parse the contents of a string returning a single match. ''' return make_single(self.get_match_string()) def get_parse_sequence(self): ''' Get a function that will parse the contents of a generic sequence (with [] and len()) returning a single match. ''' return make_single(self.get_match_sequence()) def get_parse(self): ''' Get a function that will parse input, returning a single match. The type of stream is inferred from the input to the parser. ''' return make_single(self.get_match()) def parse_file(self, file_, **kargs): ''' Parse the contents of a file, returning a single match. The data will be read as required (using an iterator), so the file must remain open during parsing. To avoid this, read all data into a string and parse that. ''' return self.get_parse_file()(file_, **kargs) def parse_iterable(self, iterable, **kargs): ''' Parse the contents of an iterable (eg. a generator), returning a single match. The data will be read as required. ''' return self.get_parse_iterable()(iterable, **kargs) def parse_list(self, list_, **kargs): ''' Parse the contents of a list returning a single match. ''' return self.get_parse_list()(list_, **kargs) def parse_string(self, string, **kargs): ''' Parse the contents of a string, returning a single match. ''' return self.get_parse_string()(string, **kargs) def parse_sequence(self, sequence, **kargs): ''' Pparse the contents of a generic sequence (with [] and len()) returning a single match. ''' return self.get_parse_sequence()(sequence, **kargs) def parse(self, input_, **kargs): ''' Parse the input, returning a single match. The type of stream is inferred from the input. ''' return self.get_parse()(input_, **kargs) def get_parse_file_all(self): ''' Get a function that will parse the contents of a file, returning a sequence of matches. The data will be read as required (using an iterator), so the file must remain open during parsing. To avoid this, read all data into a string and parse that. ''' return make_multiple(self.get_match_file()) def get_parse_iterable_all(self): ''' Get a function that will parse the contents of an iterable (eg. a generator), returning a sequence of matches. The data will be read as required. ''' return make_multiple(self.get_match_iterable()) def get_parse_list_all(self): ''' Get a function that will parse the contents of a list returning a sequence of matches. ''' return make_multiple(self.get_match_list()) def get_parse_string_all(self): ''' Get a function that will parse a string, returning a sequence of matches. ''' return make_multiple(self.get_match_string()) def get_parse_sequence_all(self): ''' Get a function that will parse the contents of a generic sequence (with [] and len()) returning a sequence of matches. ''' return make_multiple(self.get_match_sequence()) def get_parse_all(self): ''' Get a function that will parse input, returning a sequence of matches. The type of stream is inferred from the input to the parser. ''' return make_multiple(self.get_match()) def parse_file_all(self, file_, **kargs): ''' Parse the contents of a file, returning a sequence of matches. The data will be read as required (using an iterator), so the file must remain open during parsing. To avoid this, read all data into a string and parse that. ''' return self.get_parse_file_all()(file_, **kargs) def parse_iterable_all(self, iterable, **kargs): ''' Parse the contents of an iterable (eg. a generator), returning a sequence of matches. The data will be read as required. ''' return self.get_parse_iterable_all()(iterable, **kargs) def parse_list_all(self, list_, **kargs): ''' Parse the contents of a list returning a sequence of matches. ''' return self.get_parse_list_all()(list_, **kargs) def parse_string_all(self, string, **kargs): ''' Parse the contents of a string, returning a sequence of matches. ''' return self.get_parse_string_all()(string, **kargs) def parse_sequence_all(self, sequence, **kargs): ''' Parse the contents of a generic sequence (with [] and len()) returning a sequence of matches. ''' return self.get_parse_sequence_all()(sequence, **kargs) def parse_all(self, input_, **kargs): ''' Parse input, returning a sequence of matches. The type of stream is inferred from the input to the parser. ''' return self.get_parse_all()(input_, **kargs) LEPL-5.1.3/src/lepl/core/parser.py0000644000175000001440000002335311731117151017240 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Create and evaluate parsers. Once a consistent set of matchers is constructed (that describes a grammar) they must be evaluated against some input. The code here supports that evaluation (via `trampoline()`) and allows the graph of matchers to be rewritten beforehand. ''' from collections import deque from logging import getLogger from traceback import format_exc from weakref import ref try: from itertools import imap except ImportError: imap = map from lepl.stream.core import s_debug, s_cacheable from lepl.core.monitor import prepare_monitors from lepl.support.lib import fmt def tagged(method): ''' Decorator for generators to add extra attributes. ''' def tagged_method(matcher, stream): ''' Wrap the result. ''' return GeneratorWrapper(method(matcher, stream), matcher, stream) return tagged_method def tagged_function(matcher, function): ''' Decorator for generators to add extra attributes. ''' def tagged_function(stream): ''' Wrap the result. ''' return GeneratorWrapper(function(matcher, stream), matcher, stream) return tagged_function class GeneratorWrapper(object): ''' Associate basic info about call that created the generator with the generator itself. This lets us manage resources and provide logging. It is also used by `trampoline()` to recognise generators that must be evaluated (rather than being treated as normal values). ''' __slots__ = ['generator', 'matcher', 'stream', '_GeneratorWrapper__cached_repr', '__weakref__'] def __init__(self, generator, matcher, stream): self.generator = generator self.matcher = matcher if s_cacheable(stream): self.stream = stream else: self.stream = '' self.__cached_repr = None def __repr__(self): ''' Lazily evaluated for speed - saves 1/3 of time spent in constructor ''' if not self.__cached_repr: try: s = s_debug(self.stream) except AttributeError: s = '' self.__cached_repr = fmt('{0}({1})', self.matcher, s) return self.__cached_repr def __str__(self): return self.__repr__() def trampoline(main, m_stack=None, m_value=None): ''' The main parser loop. Evaluates matchers as coroutines. A dedicated version for when monitor not present increased the speed of the nat_lang performance test by only around 1% (close to noise). Replacing stack append/pop with a manually allocated non-decreasing array and index made no significant difference (at around 1% level) ''' from lepl.stream.maxdepth import FullFirstMatchException stack = deque() push = stack.append pop = stack.pop try: value = main exception_being_raised = False epoch = 0 log = getLogger('lepl.parser.trampoline') while True: epoch += 1 try: if m_value: m_value.next_iteration(epoch, value, exception_being_raised, stack) # is the value a coroutine that should be added to our stack # and evaluated? if type(value) is GeneratorWrapper: if m_stack: m_stack.push(value) # add to the stack push(value) if m_value: m_value.before_next(value) # and evaluate value = next(value.generator) if m_value: m_value.after_next(value) # if we don't have a coroutine then we have a result that # must be passed up the stack. else: # drop top of the stack (which returned the value) popped = pop() if m_stack: m_stack.pop(popped) # if we still have coroutines left, pass the value in if stack: # handle exceptions that are being raised if exception_being_raised: exception_being_raised = False if m_value: m_value.before_throw(stack[-1], value) # raise it inside the coroutine value = stack[-1].generator.throw(value) if m_value: m_value.after_throw(value) # handle ordinary values else: if m_value: m_value.before_send(stack[-1], value) # inject it into the coroutine value = stack[-1].generator.send(value) if m_value: m_value.after_send(value) # otherwise, the stack is completely unwound so return # to main caller else: if exception_being_raised: if m_value: m_value.raise_(value) raise value else: if m_value: m_value.yield_(value) yield value # this allows us to restart with a new evaluation # (backtracking) if called again. value = main except StopIteration as exception: # this occurs when we need to exit the main loop if exception_being_raised: raise # otherwise, we will propagate this value value = exception exception_being_raised = True if m_value: m_value.exception(value) except Exception as exception: # do some logging etc before re-raising if not isinstance(exception, FullFirstMatchException): log.debug(fmt('Exception at epoch {0}, {1!s}: {2!s}', epoch, value, exception)) if stack: log.debug(fmt('Top of stack: {0}', stack[-1])) # going to raise original exception # pylint: disable-msg=W0702 try: log.debug(format_exc()) except: log.warn('Exception cannot be formatted!') for generator in stack: log.debug(fmt('Stack: {0}', generator)) raise finally: # record the remaining stack while m_stack and stack: m_stack.pop(pop()) def make_raw_parser(matcher, stream_factory, config): ''' Make a parser. Rewrite the matcher and prepare the input for a parser. This constructs a function that returns a generator that provides a sequence of matches (ie (results, stream) pairs). ''' for rewriter in config.rewriters: #print(rewriter) #print(matcher.tree()) matcher = rewriter(matcher) (m_stack, m_value) = prepare_monitors(config.monitors) # pylint bug here? (E0601) # pylint: disable-msg=W0212, E0601 # (_match is meant to be hidden) # pylint: disable-msg=W0142 def parser(arg, **kargs): stream_kargs = dict(config.stream_kargs) stream_kargs.update(kargs) return trampoline(matcher._match(stream_factory(arg, **stream_kargs)), m_stack=m_stack, m_value=m_value) parser.matcher = matcher return parser def make_multiple(raw): ''' Convert a raw parser to return a generator of results. ''' def multiple(arg, **kargs): ''' Adapt a raw parser to behave as expected for the matcher interface. ''' return imap(lambda x: x[0], raw(arg, **kargs)) multiple.matcher = raw.matcher return multiple def make_single(raw): ''' Convert a raw parser to return a single result or None. ''' def single(arg, **kargs): ''' Adapt a raw parser to behave as expected for the parser interface. ''' try: return next(raw(arg, **kargs))[0] except StopIteration: return None single.matcher = raw.matcher return single LEPL-5.1.3/src/lepl/core/dynamic.py0000644000175000001440000000153011740101740017356 0ustar andrewusers00000000000000 class BaseVar(object): def __init__(self, value=None): if value is not None: value = self._cast(value) self.value = value def _cast(self, value): return value def setter(self): def wrapper(results): self.value = self._cast(results[0]) return results return wrapper def __lt__(self, other): return self.value < other def __le__(self, other): return self.value <= other def __gt__(self, other): return self.value > other def __ge__(self, other): return self.value >= other def __eq__(self, other): return self.value == other def __hash__(self): return hash(self.value) class IntVar(BaseVar): def _cast(self, value): return int(value) def __int__(self): return self.value LEPL-5.1.3/src/lepl/core/manager.py0000644000175000001440000002510011731117215017347 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Manage resources. We can attempt to control resource consumption by closing generators - the problem is which generators to close? At first it seems that the answer is going to be connected to tree traversal, but after some thought it's not so clear exactly what tree is being traversed, and how that identifies what generators should be closed. In particular, an "imperative" implementation with generators does not have the same meaning of "depth" as a recursive functional implementation (but see the related discussion in the `manual <../advanced.html#search-and-backtracking>`_). A better approach seems to be to discard those that have not been used "for a long time". A variation on this - keep a maximum number of the youngest generators - is practical. But care is needed to both in identifying what is used, and when it starts being unused, and in implementing that efficiently. Here all generators are stored in a priority queue using weak references. The "real" priority is given by the "last used date" (epoch), but the priority in the queue is frozen when inserted. So on removing from the queue the priority must be checked to ensure it has not changed (and, if so, it must be updated with the real value and replaced). Note that the main aim here is to restrict resource consumption without damaging performance too much. The aim is not to control parse results by excluding certain matches. For efficiency, the queue length is increased (doubled) whenever the queue is filled by active generators. ''' from heapq import heappushpop, heappop, heappush from weakref import ref, WeakKeyDictionary from lepl.core.monitor import StackMonitor, ValueMonitor from lepl.support.lib import LogMixin, fmt, str # pylint: disable-msg=C0103 def GeneratorManager(queue_len): ''' A 'Monitor' (implements `MonitorInterface`, can be supplied to `Configuration`) that tracks (and can limit the number of) generators. It is also coupled to the size of stacks during search (via the generator_manager_queue_len property). This is a helper function that "escapes" the main class via a function to simplify configuration. ''' return lambda: _GeneratorManager(queue_len) class _GeneratorManager(StackMonitor, ValueMonitor, LogMixin): ''' A 'Monitor' (implements `MonitorInterface`, can be supplied to `Configuration`) that tracks (and can limit the number of) generators. ''' def __init__(self, queue_len): ''' `queue_len` is the number of generators that can exist. When the number is exceeded the oldest generators are closed, unless currently active (in which case the queue size is extended). If zero then no limit is applied. ''' super(_GeneratorManager, self).__init__() self.__queue = [] self.__initial_queue_len = queue_len self.queue_len = queue_len self.__known = WeakKeyDictionary() # map from generator to ref self.epoch = 0 def next_iteration(self, epoch, value, exception, stack): ''' Store the current epoch. ''' self.epoch = epoch def push(self, generator): ''' Add a generator if it is not already known, or increment it's ref count. ''' if generator not in self.__known: self.__add(generator) # this sets the attribute on everything, but most instances simply # don't care... (we can be "inefficient" here as the monitor is # only used when memory use is more important than cpu) if self.__initial_queue_len: generator.matcher.generator_manager_queue_len = \ self.__initial_queue_len self._debug(fmt('Clipping search depth to {0}', self.__initial_queue_len)) else: self.__known[generator].push() def pop(self, generator): ''' Decrement a ref's count and update the epoch. ''' self.__known[generator].pop(self.epoch) def __add(self, generator): ''' Add a generator, trying to keep the number of active generators to that given in the constructor. ''' reference = GeneratorRef(generator, self.epoch) self.__known[generator] = reference self._debug(fmt('Queue size: {0}/{1}', len(self.__queue), self.queue_len)) # if we have space, simply save with no expiry if self.queue_len == 0 or len(self.__queue) < self.queue_len: self.__add_unlimited(reference) else: self.__add_limited(reference) def __add_unlimited(self, reference): ''' Add the new reference and discard any unused candidates that happen to be on the top of the heap. ''' self._debug(fmt('Free space, so add {0}', reference)) candidate = heappushpop(self.__queue, reference) # clean out any unused references and make sure ordering correct while candidate: candidate.deletable(self.epoch) if candidate.gced: candidate = heappop(self.__queue) else: heappush(self.__queue, candidate) break def __add_limited(self, reference): ''' Add the new reference, discarding an old entry if possible. ''' while reference: candidate = heappushpop(self.__queue, reference) self._debug(fmt('Exchanged {0} for {1}', reference, candidate)) if candidate.order_epoch == self.epoch: # even the oldest generator is current break elif candidate.deletable(self.epoch): self._debug(fmt('Closing {0}', candidate)) generator = candidate.generator if generator: del self.__known[generator] candidate.close() return else: # try again (candidate has been updated) reference = candidate # if we are here, queue is too small heappush(self.__queue, candidate) # this is currently 1 too small, and zero means unlimited, so # doubling should always be sufficient. self.queue_len = self.queue_len * 2 self._warn(fmt('Queue is too small - extending to {0}', self.queue_len)) class GeneratorRef(object): ''' This contains the weak reference to the GeneratorWrapper and is stored in the GC priority queue. ''' def __init__(self, generator, epoch): self.__hash = hash(generator) self.__wrapper = ref(generator) self.__last_known_epoch = epoch self.order_epoch = epoch # readable externally self.__count = 1 # add with 1 as we test for discard immediately after self.gced = False self.__describe = str(generator) def __lt__(self, other): assert isinstance(other, GeneratorRef) return self.order_epoch < other.order_epoch def __eq__(self, other): return self is other def __hash__(self): return self.__hash @property def generator(self): ''' Provide access to the generator (or None, if it has been GCed). ''' return self.__wrapper() def pop(self, epoch): ''' When no longer used, safe epoch and decrement count. ''' self.__last_known_epoch = epoch self.__count -= 1 def push(self): ''' Added to stack, so increment count. ''' self.__count += 1 def deletable(self, epoch): ''' Check we can delete the wrapper. ''' if not self.__wrapper(): assert self.__count == 0, \ fmt('GCed but still on stack?! {0}', self.__describe) # already disposed by system # this never happens because the monitor contains a reference # in the value of the "known" dictionary self.gced = True return True else: # not on stack and ordering in queue was correct if self.__count == 0 \ and self.order_epoch == self.__last_known_epoch: return True # still on stack, or ordering was incorrect else: if self.__count: self.__last_known_epoch = epoch self.order_epoch = self.__last_known_epoch return False def close(self): ''' This terminates the enclosed generator. ''' generator = self.generator if generator: generator.stream = None generator.generator.close() def __str__(self): generator = self.generator if generator: return fmt('{0} ({1:d}/{2:d})', self.__describe, self.order_epoch, self.__last_known_epoch) else: return fmt('Empty ref to {0}', self.__describe) def __repr__(self): return str(self) LEPL-5.1.3/src/lepl/core/trace.py0000644000175000001440000002513111731117215017037 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tools for logging and tracing. ''' # we abuse conventions to give a consistent interface # pylint: disable-msg=C0103 from lepl.stream.core import s_delta, s_line, s_len from lepl.core.monitor import ActiveMonitor, ValueMonitor, StackMonitor from lepl.support.lib import CircularFifo, LogMixin, sample, fmt, str def TraceStack(enabled=False): ''' A basic logger (implemented as a monitor - `MonitorInterface`) that records the flow of control during parsing. It can be controlled by `Trace()`. This is a factory that "escapes" the main class via a function to simplify configuration. ''' return lambda: _TraceStack(enabled) class _TraceStack(ActiveMonitor, ValueMonitor, LogMixin): ''' A basic logger (implemented as a monitor - `MonitorInterface`) that records the flow of control during parsing. It can be controlled by `Trace()`. ''' def __init__(self, enabled=False): super(_TraceStack, self).__init__() self.generator = None self.depth = -1 self.action = None self.enabled = 1 if enabled else 0 self.epoch = 0 def next_iteration(self, epoch, value, exception, stack): ''' Store epoch and stack size. ''' self.epoch = epoch self.depth = len(stack) def before_next(self, generator): ''' Log when enabled. ''' if self.enabled > 0: self.generator = generator self.action = fmt('next({0})', generator) def after_next(self, value): ''' Log when enabled. ''' if self.enabled > 0: self._log_result(value, self.fmt_result(value)) def before_throw(self, generator, value): ''' Log when enabled. ''' if self.enabled > 0: self.generator = generator if type(value) is StopIteration: self.action = fmt('stop -> {0}', generator) else: self.action = fmt('{1!r} -> {0}', generator, value) def after_throw(self, value): ''' Log when enabled. ''' if self.enabled > 0: self._log_result(value, self.fmt_result(value)) def before_send(self, generator, value): ''' Log when enabled. ''' if self.enabled > 0: self.generator = generator self.action = fmt('{1!r} -> {0}', generator, value) def after_send(self, value): ''' Log when enabled. ''' if self.enabled > 0: self._log_result(value, self.fmt_result(value)) def exception(self, value): ''' Log when enabled. ''' if self.enabled > 0: if type(value) is StopIteration: self._log_done(self.fmt_done()) else: self._log_error(self.fmt_result(value)) def fmt_result(self, value): ''' Provide a standard fmt for the results. ''' (stream, depth, locn) = self.fmt_stream() return fmt('{0:05d} {1!r:11s} {2} ({3:04d}) {4:03d} ' '{5:s} -> {6!r}', self.epoch, stream, locn, depth, self.depth, self.action, value) def fmt_done(self): ''' Provide a standard fmt for failure. ''' (stream, depth, locn) = self.fmt_stream() return fmt('{0:05d} {1!r:11s} {2} ({3:04d}) {4:03d} ' '{5:s} -> stop', self.epoch, stream, locn, depth, self.depth, self.action) def fmt_stream(self): ''' Provide a standard fmt for location. ''' try: (offset, lineno, char) = s_delta(self.generator.stream) locn = fmt('{0}/{1}.{2}', offset, lineno, char) try: stream = sample('', s_line(self.generator.stream, False)[0], 9) except StopIteration: stream = '' return (stream, offset, locn) except StopIteration: return ('', -1, '') except TypeError: return (self.generator.stream, -1, '') def yield_(self, value): ''' Log when enabled. ''' if self.enabled > 0: self._info(self.fmt_final_result(value)) def raise_(self, value): ''' Log when enabled. ''' if self.enabled > 0: if type(value) is StopIteration: self._info(self.fmt_final_result(fmt('raise {0!r}', value))) else: self._warn(self.fmt_final_result(fmt('raise {0!r}', value))) def fmt_final_result(self, value): ''' Provide a standard fmt for the result. ''' return fmt('{0:05d} {1:03d} {2} {3}', self.epoch, self.depth, ' ' * 63, value) def _log_result(self, value, text): ''' Record a result. ''' (self._info if type(value) is tuple else self._debug)(text) def _log_error(self, text): ''' Record an error. ''' self._warn(text) def _log_done(self, text): ''' Record a "stop". ''' self._debug(text) def switch(self, increment): ''' Called by the `Trace` matcher to turn this on and off. ''' self.enabled += increment def RecordDeepest(n_before=6, n_results_after=2, n_done_after=2): ''' A logger (implemented as a monitor - `MonitorInterface`) that records the deepest match found during a parse. This is a helper function that "escapes" the main class via a function to simplify configuration. ''' return lambda: _RecordDeepest(n_before, n_results_after, n_done_after) class _RecordDeepest(_TraceStack): ''' A logger (implemented as a monitor - `MonitorInterface`) that records the deepest match found during a parse. ''' def __init__(self, n_before=6, n_results_after=2, n_done_after=2): super(_RecordDeepest, self).__init__(enabled=True) self.n_before = n_before self.n_results_after = n_results_after self.n_done_after = n_done_after self._limited = CircularFifo(n_before) self._before = [] self._results_after = [] self._done_after = [] self._deepest = -1e99 self._countdown_result = 0 self._countdown_done = 0 def _log_result(self, value, text): ''' Modify `TraceStack` to record the data. ''' if type(value) is tuple: self.record(True, text) def _log_error(self, text): ''' Modify `TraceStack` to record the data. ''' self.record(True, text) def _log_done(self, text): ''' Modify `TraceStack` to record the data. ''' self.record(False, text) def record(self, is_result, text): ''' Record the data. ''' try: stream = self.generator.stream try: depth = s_delta(stream)[0] except AttributeError: # no .depth() depth = -1 if depth >= self._deepest and is_result: self._deepest = depth self._countdown_result = self.n_results_after self._countdown_done = self.n_done_after self._before = list(self._limited) self._results_after = [] self._done_after = [] elif is_result and self._countdown_result: self._countdown_result -= 1 self._results_after.append(text) elif not is_result and self._countdown_done: self._countdown_done -= 1 self._done_after.append(text) self._limited.append(text) except StopIteration: # end of iterator stream pass def yield_(self, value): ''' Display the result and reset. ''' self._deepest = 0 self._limited.clear() self.__display() def raise_(self, value): ''' Display the result and reset. ''' self._deepest = 0 self._limited.clear() self.__display() def __display(self): ''' Display the result. ''' self._info(self.__fmt()) def __fmt(self): ''' fmt the result. ''' return fmt( '\nUp to {0} matches before and including longest match:\n{1}\n' 'Up to {2} failures following longest match:\n{3}\n' 'Up to {4} successful matches following longest match:\n{5}\n', self.n_before, '\n'.join(self._before), self.n_done_after, '\n'.join(self._done_after), self.n_results_after, '\n'.join(self._results_after)) LEPL-5.1.3/src/lepl/bin/0000755000175000001440000000000011764776700015225 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/bin/_test/0000755000175000001440000000000011764776700016343 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/bin/_test/__init__.py0000644000175000001440000000330211731117151020431 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.bin package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.bin._test.bits import lepl.bin._test.encode import lepl.bin._test.literal import lepl.bin._test.matchers LEPL-5.1.3/src/lepl/bin/_test/literal.py0000644000175000001440000001073411731117151020335 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.bin.literal module. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: #from logging import basicConfig, DEBUG from unittest import TestCase from lepl.bin import * from lepl.support.node import Node # pylint: disable-msg=C0103, C0111, C0301 # (dude this is just a test) class ParseTest(TestCase): ''' Test whether we correctly parse a spec. ''' def bassert(self, value, expected, length=None): x = BitString.from_int(expected, length) assert value == x, (value, x) def test_simple(self): b = parse('(0/1)') assert isinstance(b, Node), type(b) assert len(b) == 1, len(b) assert isinstance(b[0], BitString), type(b[0]) assert len(b[0]) == 1, len(b[0]) assert b[0].zero() b = parse('(0/1, 1/1)') assert isinstance(b, Node), type(b) assert len(b) == 2, len(b) assert isinstance(b[1], BitString), type(b[1]) assert len(b[1]) == 1, len(b[1]) assert not b[1].zero() b = parse('(a=0/1)') assert isinstance(b, Node), type(b) assert len(b) == 1, len(b) assert isinstance(b.a[0], BitString), type(b.a[0]) assert len(b.a[0]) == 1, len(b.a[0]) assert b.a[0].zero() b = parse('(a=(0/1))') assert isinstance(b, Node), type(b) assert len(b) == 1, len(b) assert isinstance(b.a[0], Node), type(b.a[0]) assert len(b.a[0]) == 1, len(b.a[0]) assert isinstance(b.a[0][0], BitString), type(b.a[0][0]) assert len(b.a[0][0]) == 1, len(b.a[0][0]) assert b.a[0][0].zero() b = parse('(0)') assert isinstance(b, Node), type(b) assert len(b) == 1, len(b) assert isinstance(b[0], BitString), type(b[0]) assert len(b[0]) == 32, len(b[0]) assert b[0].zero() def test_single(self): b = parse('''(123/8, foo=0x123/2.0,\nbar = 1111100010001000b0)''') self.bassert(b[0], '0x7b') self.bassert(b[1], '0x123', 16) self.bassert(b.foo[0], '0x123', 16) self.bassert(b[2], '1111100010001000b0') self.bassert(b.bar[0], '1111100010001000b0') def test_nested(self): b = parse('(123, (foo=123x0/2.))') self.bassert(b[0], 123) assert isinstance(b[1], Node), str(b) self.bassert(b[1].foo[0], 0x2301, 16) def test_named(self): #basicConfig(level=DEBUG) b = parse('A(B(1), B(2))') self.bassert(b.B[0][0], 1) self.bassert(b.B[1][0], 2) def test_repeat(self): b = parse('(1*3)') self.bassert(b[0], '010000000100000001000000x0') b = parse('(a=0x1234 * 3)') self.bassert(b.a[0], '341234123412x0') LEPL-5.1.3/src/lepl/bin/_test/matchers.py0000644000175000001440000001110011731117151020473 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.bin.matchers module. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: #from logging import basicConfig, DEBUG from unittest import TestCase from lepl.bin import * from lepl.support.node import Node from lepl.matchers.variables import TraceVariables # pylint: disable-msg=C0103, C0111, C0301 # (dude this is just a test) class MatcherTest(TestCase): ''' Test whether we correctly match some data. ''' def test_match(self): #basicConfig(level=DEBUG) # first, define some test data - we'll use a simple definition # language, but you could also construct this directly in Python # (Frame, Header etc are auto-generated subclasses of Node). mac = parse(''' Frame( Header( preamble = 0b10101010*7, start = 0b10101011, destn = 010203040506x0, source = 0708090a0b0cx0, ethertype = 0800x0 ), Data(1/8,2/8,3/8,4/8), CRC(234d0/4.) ) ''') # next, define a parser for the header structure # this is mainly literal values, but we make the two addresses # big-endian integers, which will be read from the data # this looks very like "normal" lepl because it is - there's # nothing in lepl that forces the data being parsed to be text. with TraceVariables(False): preamble = ~Const('0b10101010')[7] start = ~Const('0b10101011') destn = BEnd(6.0) > 'destn' source = BEnd(6.0) > 'source' ethertype = ~Const('0800x0') header = preamble & start & destn & source & ethertype > Node # so, what do the test data look like? # print(mac) # Frame # +- Header # | +- preamble BitString(b'\xaa\xaa\xaa\xaa\xaa\xaa\xaa', 56, 0) # | +- start BitString(b'\xab', 8, 0) # | +- destn BitString(b'\x01\x02\x03\x04\x05\x06', 48, 0) # | +- source BitString(b'\x07\x08\t\n\x0b\x0c', 48, 0) # | `- ethertype BitString(b'\x08\x00', 16, 0) # +- Data # | +- BitString(b'\x01', 8, 0) # | +- BitString(b'\x02', 8, 0) # | +- BitString(b'\x03', 8, 0) # | `- BitString(b'\x04', 8, 0) # `- CRC # `- BitString(b'\x00\x00\x00\xea', 32, 0) # we can serialize that to a BitString b = simple_serialiser(mac, dispatch_table()) assert str(b) == 'aaaaaaaaaaaaaaab123456789abc801234000eax0/240' # and then we can parse it header.config.no_full_first_match() p = header.parse(b)[0] # print(p) # Node # +- destn Int(1108152157446,48) # `- source Int(7731092785932,48) # the destination address assert hex(p.destn[0]) == '0x10203040506' # the source address assert hex(p.source[0]) == '0x708090a0b0c' LEPL-5.1.3/src/lepl/bin/_test/encode.py0000644000175000001440000000473411731117151020141 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.bin.encode module. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: from unittest import TestCase from lepl.bin import * # pylint: disable-msg=C0103, C0111, C0301 # (dude this is just a test) class EncodeTest(TestCase): ''' Test whether we correctly encode ''' def test_encode(self): mac = parse(''' Frame( Header( preamble = 0b10101010*7, start = 0b10101011, destn = 010203040506x0, source = 0708090a0b0cx0, ethertype = 0800x0 ), Data(1/8,2/8,3/8,4/8), CRC(234d0/4.) ) ''') serial = simple_serialiser(mac, dispatch_table()) bs = serial.bytes() for _index in range(7): b = next(bs) assert b == BitString.from_int('0b10101010').to_int(), b b = next(bs) assert b == BitString.from_int('0b10101011').to_int(), b LEPL-5.1.3/src/lepl/bin/_test/bits.py0000644000175000001440000002327311740076766017664 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.bin.bits module. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: #from logging import basicConfig, DEBUG from unittest import TestCase from lepl.bin import * # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324 # (dude this is just a test) class IntTest(TestCase): def test_int(self): one = Int(1, 1) assert type(one) == Int assert 1 == one assert len(one) == 1 assert str(one) == '1' assert repr(one) == 'Int(1,1)' assert 3 * one == 3 class BitStringTest(TestCase): def test_lengths(self): assert 0 == unpack_length(0), unpack_length(0) assert 1 == unpack_length(1), unpack_length(1) assert 7 == unpack_length(7), unpack_length(7) assert 8 == unpack_length(8), unpack_length(8) assert 9 == unpack_length(9), unpack_length(9) assert 0 == unpack_length(0.), unpack_length(0.) assert 1 == unpack_length(0.1), unpack_length(0.1) assert 7 == unpack_length(0.7), unpack_length(0.7) assert 8 == unpack_length(1.), unpack_length(1.) assert 8 == unpack_length(1.0), unpack_length(1.0) assert 9 == unpack_length(1.1), unpack_length(1.1) assert 15 == unpack_length(1.7), unpack_length(1.7) assert 16 == unpack_length(2.), unpack_length(2.) self.assert_error(lambda: unpack_length(0.8)) def assert_error(self, thunk): try: thunk() assert False, 'expected error' except: pass def assert_length_value(self, length, value, b): assert len(b) == length, (len(b), length) assert b.to_bytes() == value, (b.to_bytes(), value, b) def test_from_byte(self): self.assert_error(lambda: BitString.from_byte(-1)) self.assert_length_value(8, b'\x00', BitString.from_byte(0)) self.assert_length_value(8, b'\x01', BitString.from_byte(1)) self.assert_length_value(8, b'\xff', BitString.from_byte(255)) self.assert_error(lambda: BitString.from_byte(256)) def test_from_bytearray(self): self.assert_length_value(8, b'\x00', BitString.from_bytearray(b'\x00')) self.assert_length_value(16, b'ab', BitString.from_bytearray(b'ab')) self.assert_length_value(16, b'ab', BitString.from_bytearray(bytearray(b'ab'))) def test_from_int(self): self.assert_length_value(3, b'\x00', BitString.from_int('0o0')) self.assert_error(lambda: BitString.from_int('1o0')) self.assert_error(lambda: BitString.from_int('00o0')) self.assert_error(lambda: BitString.from_int('100o0')) self.assert_error(lambda: BitString.from_int('777o0')) self.assert_length_value(9, b'\x40\x00', BitString.from_int('0o100')) self.assert_length_value(9, b'\xfe\x01', BitString.from_int('0o776')) self.assert_length_value(12, b'\xff\x03', BitString.from_int('0x3ff')) self.assert_length_value(12, b'\xff\x03', BitString.from_int('0o1777')) self.assert_length_value(16, b'\x03\xff', BitString.from_int('03ffx0')) self.assert_length_value(3, b'\x04', BitString.from_int('0b100')) self.assert_length_value(1, b'\x01', BitString.from_int('1b0')) self.assert_length_value(2, b'\x02', BitString.from_int('01b0')) self.assert_length_value(9, b'\x00\x01', BitString.from_int('000000001b0')) self.assert_length_value(9, b'\x01\x01', BitString.from_int('100000001b0')) self.assert_length_value(16, b'\x0f\x33', BitString.from_int('1111000011001100b0')) def test_from_sequence(self): self.assert_length_value(8, b'\x01', BitString.from_sequence([1], BitString.from_byte)) self.assert_error(lambda: BitString.from_sequence([256], BitString.from_byte)) self.assert_length_value(16, b'\x01\x02', BitString.from_sequence([1,2], BitString.from_byte)) def test_from_int_with_length(self): self.assert_error(lambda: BitString.from_int(1, 0)) self.assert_error(lambda: BitString.from_int(0, 1)) self.assert_error(lambda: BitString.from_int(0, 7)) self.assert_length_value(8, b'\x00', BitString.from_int(0, 8)) self.assert_error(lambda: BitString.from_int(0, 0.1)) self.assert_length_value(8, b'\x00', BitString.from_int(0, 1.)) self.assert_length_value(1, b'\x00', BitString.from_int('0x0', 1)) self.assert_length_value(7, b'\x00', BitString.from_int('0x0', 7)) self.assert_length_value(8, b'\x00', BitString.from_int('0x0', 8)) self.assert_length_value(1, b'\x00', BitString.from_int('0x0', 0.1)) self.assert_length_value(8, b'\x00', BitString.from_int('0x0', 1.)) self.assert_length_value(16, b'\x34\x12', BitString.from_int(0x1234, 16)) self.assert_length_value(16, b'\x34\x12', BitString.from_int('0x1234', 16)) self.assert_length_value(16, b'\x12\x34', BitString.from_int('1234x0', 16)) self.assert_length_value(16, b'\x34\x12', BitString.from_int('4660', 16)) self.assert_length_value(16, b'\x34\x12', BitString.from_int('0d4660', 16)) self.assert_length_value(16, b'\x12\x34', BitString.from_int('4660d0', 16)) def test_str(self): b = BitString.from_int32(0xabcd1234) assert str(b) == '00101100 01001000 10110011 11010101b0/32', str(b) b = BitString.from_int('0b110') assert str(b) == '011b0/3', str(b) def test_invert(self): #basicConfig(level=DEBUG) self.assert_length_value(12, b'\x00\x0c', ~BitString.from_int('0x3ff')) def test_add(self): acc = BitString() for i in range(8): acc += BitString.from_int('0o' + str(i)) # >>> hex(0o76543210) # '0xfac688' self.assert_length_value(24, b'\x88\xc6\xfa', acc) acc = BitString() for i in range(7): acc += BitString.from_int('0o' + str(i)) self.assert_length_value(21, b'\x88\xc6\x1a', acc) def test_get_item(self): a = BitString.from_int('01001100011100001111b0') b = a[:] assert a == b, (a, b) b = a[0:] assert a == b, (a, b) b = a[-1::-1] assert BitString.from_int('11110000111000110010b0') == b, b b = a[0] assert BitString.from_int('0b0') == b, (b, str(b), BitString.from_int('0b0')) b = a[1] assert BitString.from_int('1b0') == b, b b = a[0:2] assert BitString.from_int('01b0') == b, b b = a[0:2] assert BitString.from_int('0b10') == b, b b = a[-5:] assert BitString.from_int('01111b0') == b, b b = a[-1:-6:-1] assert BitString.from_int('11110b0') == b, b b = a[1:-1] assert BitString.from_int('100110001110000111b0') == b, b def assert_round_trip(self, start, stop=None, length=None): if stop == None: stop = start result = BitString.from_int(start, length=length).to_int() assert result == stop, (result, stop) if length is not None: assert len(result) == length, (result, length) def test_to_int(self): self.assert_round_trip(0) self.assert_round_trip(1) self.assert_round_trip(1, length=1) self.assert_round_trip(467) self.assert_round_trip(467, length=16) self.assert_round_trip(467, length=19) class SwapTableTest(TestCase): def test_swap(self): table = swap_table() assert table[0x0f] == 0xf0, hex(table[0x0f]) assert table[0xff] == 0xff assert table[0] == 0 assert table[10] == 80, table[10] LEPL-5.1.3/src/lepl/bin/__init__.py0000644000175000001440000000412211731117151017314 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Parse (and construct) binary data. Only Python 3.0+ is supported. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: from lepl.bin.bits import Int, unpack_length, BitString, swap_table from lepl.bin.literal import parse from lepl.bin.encode import dispatch_table, simple_serialiser from lepl.bin.matchers import BEnd, Const __all__ = [ 'Int', 'BitString', 'unpack_length', 'swap_table', 'parse', 'dispatch_table', 'simple_serialiser', 'BEnd', 'Const' ] LEPL-5.1.3/src/lepl/bin/literal.py0000644000175000001440000002003611731117151017213 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Specify and construct binary structures. This is necessary for tests and may be useful in its own right. Note that it is also quite easy to construct `Node` instances with `BitString` data directly in Python. The construction of binary values is a two-stage process. First, we describe a Python structure. Then we encode that structure as a binary value. As is standard in LEPL, the Python construct consists of `Node` instances. The description of values has the form: Node(byte=0xff/8, 0*100, Node(...), (...)) In more detail: () is used for grouping, must exist outside the entire description, and defines a Node. If preceded by a name, then that is used to create a subclass of Node (unless it is "Node", in which case it is the default). For now, repeated names are not validated in any way for consistency. name=value/length is used for defining a value, in various ways: value anonymous value (byte or array) value/length anonymous value with specified length name=value named byte or array name=value/length named value with given length * repeats a value, so a*b repeats 'a', b number of times. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: def make_binary_parser(): ''' Create a parser for binary data. ''' # avoid import loops from lepl import Word, Letter, Digit, UnsignedInteger, \ Regexp, DfaRegexp, Drop, Separator, Delayed, Optional, Any, First, \ args, Trace, TraceVariables from lepl.bin.bits import BitString from lepl.support.node import Node classes = {} def named_class(name, *args): ''' Given a name and some args, create a sub-class of Binary and create an instance with the given content. ''' if name not in classes: classes[name] = type(name, (Node,), {}) return classes[name](*args) with TraceVariables(False): mult = lambda l, n: BitString.from_sequence([l] * int(n, 0)) # an attribute or class name name = Word(Letter(), Letter() | Digit() | '_') # lengths can be integers (bits) or floats (bytes.bits) # but if we have a float, we do not want to parse as an int # (or we will get a conversion error due to too small length) length = First(UnsignedInteger() + '.' + Optional(UnsignedInteger()), UnsignedInteger()) # a literal decimal decimal = UnsignedInteger() # a binary number (without pre/postfix) binary = Any('01')[1:] # an octal number (without pre/postfix) octal = Any('01234567')[1:] # a hex number (without pre/postfix) hex_ = Regexp('[a-fA-F0-9]')[1:] # the letters used for binary, octal and hex values #(eg the 'x' in 0xffee) # pylint: disable-msg=C0103 b, o, x, d = Any('bB'), Any('oO'), Any('xX'), Any('dD') # a decimal with optional pre/postfix dec = '0' + d + decimal | decimal + d + '0' | decimal # little-endian literals have normal prefix syntax (eg 0xffee) little = decimal | '0' + (b + binary | o + octal | x + hex_) # big-endian literals have postfix (eg ffeex0) big = (binary + b | octal + o | hex_ + x) + '0' # optional spaces - will be ignored # (use DFA here because it's multi-line, so \n will match ok) spaces = Drop(DfaRegexp('[ \t\n\r]*')) with Separator(spaces): # the grammar is recursive - expressions can contain expressions - # so we use a delayed matcher here as a placeholder, so that we can # use them before they are defined. expr = Delayed() # an implicit length value can be big or little-endian ivalue = big | little > args(BitString.from_int) # a value with a length can also be decimal lvalue = (big | little | dec) & Drop('/') & length \ > args(BitString.from_int) value = lvalue | ivalue repeat = value & Drop('*') & little > args(mult) # a named value is also a tuple named = name & Drop('=') & (expr | value | repeat) > tuple # an entry in the expression could be any of these entry = named | value | repeat | expr # and an expression itself consists of a comma-separated list of # one or more entries, surrounded by paremtheses entries = Drop('(') & entry[1:, Drop(',')] & Drop(')') # the Binary node may be explicit or implicit and takes the list of # entries as an argument list node = Optional(Drop('Node')) & entries > Node # alternatively, we can give a name and create a named sub-class other = name & entries > args(named_class) # and finally, we "tie the knot" by giving a definition for the # delayed matcher we introduced earlier, which is either a binary # node or a subclass expr += spaces & (node | other) & spaces #expr = Trace(expr) # this changes order, making 0800x0 parse as binary expr.config.no_compile_to_regexp() # use sequence to force regexp over multiple lines return expr.get_parse_sequence() __PARSER = None def parse(spec): ''' Use the parser. ''' #from logging import basicConfig, DEBUG #basicConfig(level=DEBUG) from lepl.stream.maxdepth import FullFirstMatchException # pylint: disable-msg=W0603 # global global __PARSER if __PARSER is None: __PARSER = make_binary_parser() try: result = __PARSER(spec) except FullFirstMatchException: result = None if result: return result[0] else: raise ValueError('Cannot parse: {0!r}'.format(spec)) LEPL-5.1.3/src/lepl/bin/matchers.py0000644000175000001440000002270711731117151017374 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Matchers specifically for binary data (most LEPL matchers can be used with binary data, but additional support is needed when the matching involves a literal comparison or generation of a binary result). ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: from lepl.bin.bits import unpack_length, BitString, STRICT from lepl.matchers.support import OperatorMatcher from lepl.core.parser import tagged from lepl.stream.core import s_next # pylint: disable-msg=C0103, R0901, R0904 # lepl conventions # pylint: disable-msg=R0201 # (allow over-riding in sub-classes) class _Constant(OperatorMatcher): ''' Support class for matching constant values. ''' # pylint: disable-msg=E1101 # (using _arg to set attributes dynamically) def __init__(self, value): ''' Match a given bit string. This is typically not used directly, but via the functions below (which specify a value as integer, bytes, etc). ''' super(_Constant, self).__init__() self._arg(value=value) @tagged def _match(self, stream): ''' Do the matching (return a generator that provides successive (result, stream) tuples). Need to be careful here to use only the restricted functionality provided by the stream interface. ''' (value, next_stream) = s_next(stream, count=len(self.value)) if self.value == value: yield ([self.value], next_stream) class Const(_Constant): ''' Match a given value, which is parsed as for `BitString.from_int`. ''' def __init__(self, value, length=None): if not isinstance(value, BitString): value = BitString.from_int(value, length) super(Const, self).__init__(value) class _Variable(OperatorMatcher): ''' Support class for matching a given number of bits. ''' # pylint: disable-msg=E1101 # (using _arg to set attributes dynamically) def __init__(self, length): super(_Variable, self).__init__() self._arg(length=unpack_length(length)) @tagged def _match(self, stream): ''' Do the matching (return a generator that provides successive (result, stream) tuples). Need to be careful here to use only the restricted functionality provided by the stream interface. ''' (value, next_stream) = s_next(stream, count=self.length) yield ([self._convert(value)], next_stream) def _convert(self, bits): ''' By default, just return the bits. ''' return bits class _ByteArray(_Variable): ''' Support class for matching a given number of bytes. ''' def __init__(self, length): ''' Match a given number of bytes. ''' if not isinstance(length, int): raise TypeError('Number of bytes must be an integer') super(_ByteArray, self).__init__(length) def _convert(self, bits): ''' Convert from bits to bytes, ''' return bits.to_bytes() class BEnd(_Variable): ''' Convert a given number of bits (multiple of 8) to a big-endian number. ''' def __init__(self, length): ''' Match a given number of bits, converting them to a big-endian int. ''' length = unpack_length(length) if length % 8: raise ValueError('Big endian int must a length that is a ' 'multiple of 8.') super(BEnd, self).__init__(length) def _convert(self, bits): ''' Convert to int. ''' return bits.to_int(big_endian=True) class LEnd(_Variable): ''' Convert a given number of bits to a little-endian number. ''' def _convert(self, bits): ''' Convert to int. ''' return bits.to_int() def BitStr(value): ''' Match or read a bit string (to read a value, give the number of bits). ''' if isinstance(value, int): return _Variable(value) else: return _Constant(value) def Byte(value=None): ''' Match or read a byte (if a value is given, it must match). ''' if value is None: return BEnd(8) else: return _Constant(BitString.from_byte(value)) def ByteArray(value): ''' Match or read an array of bytes (to read a value, give the number of bytes). ''' if isinstance(value, int): return _ByteArray(value) else: return _Constant(BitString.from_bytearray(value)) def _bint(length): ''' Factory method for big-endian values. ''' def matcher(value=None): ''' Generate the matcher, given a value. ''' if value is None: return BEnd(length) else: return _Constant(BitString.from_int(value, length=length, big_endian=True)) return matcher def _lint(length): ''' Factory method for little-endian values. ''' def matcher(value=None): ''' Generate the matcher, given a value. ''' if value is None: return LEnd(length) else: return _Constant(BitString.from_int(value, length=length, big_endian=False)) return matcher # pylint: disable-msg=W0105 BInt16 = _bint(16) ''' Match or read an 16-bit big-endian integer (if a value is given, it must match). ''' LInt16 = _lint(16) ''' Match or read an 16-bit little-endian integer (if a value is given, it must match). ''' BInt32 = _bint(32) ''' Match or read an 32-bit big-endian integer (if a value is given, it must match). ''' LInt32 = _lint(32) ''' Match or read an 32-bit little-endian integer (if a value is given, it must match). ''' BInt64 = _bint(64) ''' Match or read an 64-bit big-endian integer (if a value is given, it must match). ''' LInt64 = _lint(64) ''' Match or read an 64-bit little-endian integer (if a value is given, it must match). ''' class _String(_ByteArray): ''' Support class for reading a string. ''' # pylint: disable-msg=E1101 # (using _arg to set attributes dynamically) def __init__(self, length, encoding=None, errors=STRICT): super(_String, self).__init__(length) self._karg(encoding=encoding) self._karg(errors=errors) def _convert(self, bits): ''' Convert to string. ''' return bits.to_str(encoding=self.encoding, errors=self.errors) def String(value, encoding=None, errors=STRICT): ''' Match or read a string (to read a value, give the number of bytes). ''' if isinstance(value, int): return _String(value, encoding=encoding, errors=errors) else: return _Constant(BitString.from_str(value, encoding=encoding, errors=errors)) LEPL-5.1.3/src/lepl/bin/encode.py0000644000175000001440000001135711731117151017022 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Convert structured Python data to a binary stream. Writing a good API for binary encoding of arbitrary objects does not seem to be easy. In addition, this is my first attempt. My apologies in advance. This is a very basic library - the hope is that something like ASN.1 can then be built on this (if someone buys me a copy of the spec...!) The most obvious solution might be to require everything that must be encoded implement some method. Given Python's dynamic nature, ABCs, etc, this might be possible, but it does seem that it could require some rather ugly hacks in some cases, when using existing types. The next simplest approach seems to be to use some kind of separate dispatch (rather than the classes themselves) to convert things to a standard intermediate fmt. That is what I do here. The intermediate fmt is the pair (type, BitString), where "type" can be any value (but will be the type of the value in all implementations here - value could be used, but we're trying to give some impression of a layered approach). Encoding a structure then requires three steps: 1. Defining a serialisation of composite structures. Only acyclic structures are considered (I am more interested in network protocols than pickling, which already has a Python solution) 2. Converting individual values in the serial stream to the intermediate representation. 3. Encoding the intermediate representation into a final BitString. Support for each of these steps is provided by LEPL. Stage 1 comes from the graph and node modules; 2 is provided below (leveraging BitString's class methods); 3 is only supported in a simple way below, with the expectation that future modules might extend both encoding and matching to, for example, ASN.1. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: from functools import reduce as reduce_ from operator import add from lepl.bin.bits import BitString, STRICT from lepl.support.graph import leaves from lepl.support.node import Node def dispatch_table(big_endian=True, encoding=None, errors=STRICT): ''' Convert types appropriately. ''' # pylint: disable-msg=W0108 # consistency return {int: lambda n: BitString.from_int(n, ordered=big_endian), str: lambda s: BitString.from_str(s, encoding, errors), bytes: lambda b: BitString.from_bytearray(b), bytearray: lambda b: BitString.from_bytearray(b), BitString: lambda x: x} def make_converter(table): ''' Given a table, create the converter. ''' def converter(value): ''' The converter. ''' type_ = type(value) if type_ in table: return (type_, table[type_](value)) for key in table: if isinstance(value, key): return (type_, table[key](value)) raise TypeError('Cannot convert {0!r}:{1!r}'.format(value, type_)) return converter def simple_serialiser(node, table): ''' Serialize using the given table. ''' stream = leaves(node, Node) converter = make_converter(table) return reduce_(add, [converter(value)[1] for value in stream]) LEPL-5.1.3/src/lepl/bin/bits.py0000644000175000001440000006035511731117151016530 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Storage of binary values of arbitrary length. Endianness is an issue here because we want to naturally "do the right thing" and unfortunately this varies, depending on context. Most target hardware (x86) is little-endian, but network protocols are typically big-endian. I personally prefer big-endian for long hex strings - it seems obvious that 0x123456 should be encoded as [0x12, 0x34, 0x56]. On the other hand, it also seems reasonable that the integer 1193046 (=0x123456) should be stored small-endian as [0x56, 0x34, 0x12, 0x00] because that is how it is stored in memory. Unfortunately we cannot implement both because integer values do not contain any flag to say how the user specified them (hex or decimal). A very similar issue - that integers do not carry any information to say how many leading zeroes were entered by the user - suggests a solution to this problem. To solve the leading zeroes issue we accept integers as strings and do the conversion ourselves. Since we are dealing with strings we can invent an entirely new encoding to specify endianness. We will use little-endian for ints and the "usual" notation since this reflects the hardware (it appeals to the idea that we are simply taking the chunk of memory in which the integer existed and using it directly). For big endian, we will use a trailing type flag (ie change "ends") in strings. So 1193046, "1193046", 0x123456, "0x123456" all encode to [0x56, 0x34, 0x12] (module some questions about implicit/explicit lengths). But "123456x0" encodes to [0x12, 0x34, 0x56]. This does have a slight wrinkle - 100b0 looks like a hex value (but is not, as it does not start with 0x). Note: No attempt is made to handle sign (complements etc). ''' # pylint: disable-msg=R0903 # using __ methods if bytes is str: print('Binary parsing unsupported in this Python version') else: STRICT = 'strict' class Int(int): ''' An integer with a length (the number of bits). This extends Python's type system so that we can distinguish between different integer types, which may have different encodings. ''' def __new__(cls, value, length): return super(Int, cls).__new__(cls, str(value), 0) def __init__(self, value, length): super(Int, self).__init__() self.__length = length def __len__(self): return self.__length def __repr__(self): return 'Int({0},{1})'.format(super(Int, self).__str__(), self.__length) def swap_table(): ''' Table of reversed bit patterns for 8 bits. ''' # pylint: disable-msg=C0103 table = [0] * 256 power = [1 << n for n in range(8)] for n in range(8): table[1 << n] = 1 << (7 - n) for i in range(256): if not table[i]: for p in power: if i & p: table[i] |= table[p] table[table[i]] = i return table class BitString(object): ''' A sequence of bits, of arbitrary length. Has similar semantics to strings, in that a single index is itself a BitString (of unit length). This is intended as a standard fmt for arbitrary binary data, to help with conversion between other types. In other words, convert to and from this, and then chain conversions. BitStr are stored as a contiguous sequence in an array of bytes. Both bits and bytes are "little endian" - this allows arbitrary lengths of bits, at arbitrary offsets, to be given values without worrying about alignment. The bit sequence starts at bit 'offset' in the first byte and there are a total of 'length' bits. The number of bytes stored is the minimum implied by those two values, with zero padding. ''' __swap = swap_table() def __init__(self, value=None, length=0, offset=0): ''' value is a bytes() instance that contains the data. length is the number of valid bits. If given as a float it is the number of bytes (bits = int(float) * 8 + decimal(float) * 10) offset is the index of the first valid bit in the value. ''' if value is None: value = bytes() if not isinstance(value, bytes): raise TypeError('BitString wraps bytes: {0!r}'.format(value)) if length < 0: raise ValueError('Negative length: {0!r}'.format(length)) if not 0 <= offset < 8 : raise ValueError('Non-byte offset: {0!r}'.format(offset)) self.__bytes = value self.__length = unpack_length(length) self.__offset = offset if len(value) != bytes_for_bits(self.__length, self.__offset): raise ValueError('Inconsistent length: {0!r}/{1!r}' .format(value, length)) def bytes(self, offset=0): ''' Return a series of bytes values, which encode the data for len(self) bits when offset=0 (with final padding in the last byte if necessary). It is the caller's responsibility to discard any trailing bits. When 0 < offset < 8 then the data are zero-padded by offset bits first. ''' # if self.__offset and offset == 0: # # normalize our own value # self.__bytes = \ # bytes(ByteIterator(self.__bytes, self.__length, # self.__offset, offset)) # self.__offset = 0 return ByteIterator(self.__bytes, self.__length, self.__offset, offset) def bits(self): ''' Return a series of bits (encoded as booleans) that contain the contents. ''' return BitIterator(self.__bytes, 0, self.__length, 1, self.__offset) def __str__(self): ''' For 64 bits or less, show bits grouped by byte (octet), with bytes and bits running from left to right. This is a "picture" of the bits. For more than 64 bits, give a hex encoding of bytes (right padded with zeros), shown in big-endian fmt. In both cases, the length in bits is given after a trailing slash. Whatever the internal offset, values are displayed with no initial padding. ''' if self.__length > 64: hex_ = ''.join(hex(x)[2:] for x in self.bytes()) return '{0}x0/{1}'.format(hex_, self.__length) else: chars = [] byte = [] count = 0 for bit in self.bits(): if not count % 8: chars.extend(byte) byte = [] if count: chars.append(' ') if bit.zero(): byte.append('0') else: byte.append('1') count += 1 chars.extend(byte) return '{0}b0/{1}'.format(''.join(chars), self.__length) def __repr__(self): ''' An explicit display of internal state, including padding and offset. ''' return 'BitString({0!r}, {1!r}, {2!r})' \ .format(self.__bytes, self.__length, self.__offset) def __len__(self): return self.__length def zero(self): ''' Are all bits zero? ''' for byte in self.__bytes: if byte != 0: return False return True def offset(self): ''' The internal offset. This is not useful as an external API, but helps with debugging. ''' return self.__offset def __iter__(self): return self.bits() def __add__(self, other): ''' Combine two sequences, appending then together. ''' bbs = bytearray(self.to_bytes()) matching_offset = self.__length % 8 for byte in other.bytes(matching_offset): if matching_offset: bbs[-1] |= byte matching_offset = False else: bbs.append(byte) return BitString(bytes(bbs), self.__length + len(other)) def to_bytes(self, offset=0): ''' Return a bytes() object, right-padded with zero bits of necessary. ''' if self.__offset == offset: return self.__bytes else: return bytes(self.bytes(offset)) def to_int(self, big_endian=False): ''' Convert the entire bit sequence (of any size) to an integer. Big endian conversion is only possible if the bits form a whole number of bytes. ''' if big_endian and self.__length % 8: raise ValueError('Length is not a multiple of 8 bits, so big ' 'endian integer poorly defined: {0}' .format(self.__length)) bbs = self.bytes() if not big_endian: bbs = reversed(list(bbs)) value = 0 for byte in bbs: value = (value << 8) + byte return Int(value, self.__length) def to_str(self, encoding=None, errors='strict'): ''' Convert to string. ''' # do we really need to do this in two separate calls? if encoding: return bytes(self.bytes()).decode(encoding=encoding, errors=errors) else: return bytes(self.bytes()).decode(errors=errors) def __int__(self): return self.to_int() def __index__(self): return self.to_int() def __invert__(self): inv = bytearray([0xff ^ b for b in self.bytes()]) if self.__length % 8: inv[-1] &= 0xff >> self.__length % 8 return BitString(bytes(inv), self.__length) def __getitem__(self, index): if not isinstance(index, slice): index = slice(index, index+1, None) (start, stop, step) = index.indices(self.__length) if step == 1: start += self.__offset stop += self.__offset bbs = bytearray(self.__bytes[start // 8:bytes_for_bits(stop)]) if start % 8: bbs[0] &= 0xff << start % 8 if stop % 8: bbs[-1] &= 0xff >> 8 - stop % 8 return BitString(bytes(bbs), stop - start, start % 8) else: acc = BitString() for byte in BitIterator(self.__bytes, start, stop, step, self.__offset): acc += byte return acc def __eq__(self, other): # pylint: disable-msg=W0212 # (we check the type) if not isinstance(other, BitString) \ or self.__length != other.__length: return False for (bb1, bb2) in zip(self.bytes(), other.bytes()): if bb1 != bb2: return False return True def __hash__(self): return hash(self.__bytes) ^ self.__length @staticmethod def from_byte(value): ''' Create a BitString from a byte. ''' return BitString.from_int(value, 8) @staticmethod def from_int32(value, big_endian=None): ''' Create a BitString from a 32 bit integer. ''' return BitString.from_int(value, 32, big_endian) @staticmethod def from_int64(value, big_endian=None): ''' Create a BitString from a 64 bit integer. ''' return BitString.from_int(value, 64, big_endian) @staticmethod def from_int(value, length=None, big_endian=None): ''' Value can be an int, or a string with a leading or trailing tag. A plain int, or no tag, or leading tag, is byte little-endian by default. Length and big-endianness are inferred from the fmt for values given as strings, but explicit parameters override these. If no length is given, and none can be inferred, 32 bits is assumed (bit length cannot be inferred for decimal values, even as strings). The interpretation of big-endian values depends on the base and is either very intuitive and useful, or completely stupid. Use at your own risk. Big-endian hex values must specify an exact number of bytes (even number of hex digits). Each separate byte is assigned a value according to big-endian semantics, but with a byte small-endian order is used. This is consistent with the standard conventions for network data. So, for example, 1234x0 gives two bytes. The first contains the value 0x12, the second the value 0x34. Big-endian binary values are taken to be a "picture" of the bits, with the array reading from left to right. So 0011b0 specifies four bits, starting with two zeroes. Big-endian decimal and octal values are treated as hex values. ''' # order is very important below - edit with extreme care bits = None if isinstance(value, str): value.strip() # move postfix to prefix, saving endian hint if value.endswith('0') and len(value) > 1 and \ not value[-2].isdigit() \ and not (len(value) == 3 and value.startswith('0')): value = '0' + value[-2] + value[0:-2] if big_endian is None: big_endian = True # drop 0d for decimal if value.startswith('0d') or value.startswith('0D'): value = value[2:] # infer implicit length if len(value) > 1 and not value[1].isdigit() and length is None: bits = {'b':1, 'o':3, 'x':4}.get(value[1].lower(), None) if not bits: raise ValueError('Unexpected base: {0!r}'.format(value)) length = bits * (len(value) - 2) if big_endian and bits == 1: # binary value is backwards! value = value[0:2] + value[-1:1:-1] value = int(value, 0) if length is None: try: # support round-tripping of sized integers length = len(value) except TypeError: # assume 32 bits if nothing else defined length = 32 length = unpack_length(length) if length % 8 and big_endian and bits != 1: raise ValueError('A big-endian int with a length that ' 'is not an integer number of bytes cannot be ' 'encoded as a stream of bits: {0!r}/{1!r}' .format(value, length)) bbs, val = bytearray(), value for _index in range(bytes_for_bits(length)): bbs.append(val & 0xff) val >>= 8 if val > 0: raise ValueError('Value contains more bits than length: %r/%r' % (value, length)) # binary was swapped earlier if big_endian and bits != 1: bbs = reversed(bbs) return BitString(bytes(bbs), length) @staticmethod def from_sequence(value, unpack=lambda x: x): ''' Unpack is called for each item in turn (so should be, say, from_byte). ''' accumulator = BitString() for item in value: accumulator += unpack(item) return accumulator @staticmethod def from_bytearray(value): ''' Create a BitString from a bytearray. ''' if not isinstance(value, bytes): value = bytes(value) return BitString(value, len(value) * 8) @staticmethod def from_str(value, encoding=None, errors=STRICT): ''' Create a BitString from a string. ''' if encoding: return BitString.from_bytearray(value.encode(encoding=encoding, errors=errors)) else: return BitString.from_bytearray(value.encode(errors=errors)) def unpack_length(length): ''' Length is in bits, unless a decimal is specified, in which case it it has the structure bytes.bits. Obviously this is ambiguous with float values (eg 3.1 or 3.10), but since we only care about bits 0-7 we can avoid any issues by requiring that range. ''' if isinstance(length, str): try: length = int(length, 0) except ValueError: length = float(length) if isinstance(length, int): return length if isinstance(length, float): nbytes = int(length) bits = int(10 * (length - nbytes) + 0.5) if bits < 0 or bits > 7: raise ValueError('BitStr specification must be between 0 and 7') return nbytes * 8 + bits raise TypeError('Cannot infer length from %r' % length) def bytes_for_bits(bits, offset=0): ''' The number of bytes required to specify the given number of bits. ''' return (bits + 7 + offset) // 8 class BitIterator(object): ''' A sequence of bits (used by BitString). ''' def __init__(self, value, start, stop, step, offset): assert 0 <= offset < 8 self.__bytes = value self.__start = start self.__stop = stop self.__step = step self.__offset = offset self.__index = start def __iter__(self): return self def __next__(self): if (self.__step > 0 and self.__index < self.__stop) \ or (self.__step < 0 and self.__index > self.__stop): index = self.__index + self.__offset byte = self.__bytes[index // 8] >> index % 8 self.__index += self.__step return ONE if byte & 0x1 else ZERO else: raise StopIteration() class ByteIterator(object): ''' A sequence of bytes (used by BitString). ''' def __init__(self, value, length, existing, required): assert 0 <= required < 8 assert 0 <= existing < 8 self.__bytes = value self.__length = length self.__required = required self.__existing = existing if self.__required > self.__existing: self.__index = -1 else: self.__index = 0 self.__total = 0 def __iter__(self): return self def __next__(self): if self.__required == self.__existing: return self.__byte_aligned() elif self.__required > self.__existing: return self.__add_offset() else: return self.__correct_offset() def __byte_aligned(self): ''' Already aligned, so return next byte. ''' if self.__index < len(self.__bytes): byte = self.__bytes[self.__index] self.__index += 1 return byte else: raise StopIteration() def __add_offset(self): ''' No longer understand this. Replace with BitStream project? ''' if self.__index < 0: if self.__total < self.__length: # initial offset chunk byte = 0xff & (self.__bytes[0] << (self.__required - self.__existing)) self.__index = 0 self.__total = 8 - self.__required return byte else: raise StopIteration() else: if self.__total < self.__length: byte = 0xff & (self.__bytes[self.__index] >> (8 - self.__required + self.__existing)) self.__index += 1 self.__total += self.__required else: raise StopIteration() if self.__total < self.__length: byte |= 0xff & (self.__bytes[self.__index] << (self.__required - self.__existing)) self.__total += 8 - self.__required return byte def __correct_offset(self): ''' No longer understand this. Replace with BitStream project? ''' if self.__total < self.__length: byte = 0xff & (self.__bytes[self.__index] >> (self.__existing - self.__required)) self.__index += 1 self.__total += 8 - self.__existing + self.__required else: raise StopIteration() if self.__total < self.__length: byte |= 0xff & (self.__bytes[self.__index] << (8 - self.__existing + self.__required)) self.__total += self.__existing - self.__required return byte ONE = BitString(b'\x01', 1) ZERO = BitString(b'\x00', 1) LEPL-5.1.3/src/lepl/bin/_example/0000755000175000001440000000000011764776700017017 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/bin/_example/__init__.py0000644000175000001440000000316111731117151021110 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Examples for the lepl.bin package. ''' # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.bin._example.literal LEPL-5.1.3/src/lepl/bin/_example/literal.py0000644000175000001440000000717211731117151021013 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' A detailed worked example using the lepl.bin package. ''' if bytes is str: print('Binary parsing unsupported in this Python version') else: from lepl.bin import * from lepl._example.support import Example # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, R0201 # (dude this is just a test) class ParseExample(Example): def test_parse(self): ''' An 803.3 MAC frame - see http://en.wikipedia.org/wiki/Ethernet ''' _b = parse(''' Frame( Header( preamble = 0b10101010*7, start = 0b10101011, destn = 123456x0, source = 890abcx0, ethertype = 0800x0 ), Data(1/8,2/8,3/8,4/8), CRC(234d0/4.) ) ''') #print(_b) class RepresentationExample(Example): def test_representation(self): #@PydevCodeAnalysisIgnore # doesn't know base literals self._assert(0b101100, '00110100 00000000 00000000 00000000') self._assert('0b101100', '001101') self._assert('001101b0', '001101') self._assert(0o073, '11011100 00000000 00000000 00000000') self._assert('0o073', '11011100 0') self._assert('073o0', None) self._assert('0o01234567', '11101110 10011100 10100000') #! self._assert('01234567o0', '10100000 10011100 11101110') self._assert(1980, '00111101 11100000 00000000 00000000') self._assert('0d1980', '00111101 11100000 00000000 00000000') self._assert('1980', '00111101 11100000 00000000 00000000') self._assert('1980d0', '00000000 00000000 11100000 00111101') self._assert(0xfe01, '10000000 01111111 00000000 00000000') self._assert('0xfe01', '10000000 01111111') self._assert('fe01x0', '01111111 10000000') def _assert(self, repr_, value): try: b = BitString.from_int(repr_) assert str(b) == value + 'b0/' + str(len(b)), str(b) except ValueError: assert value is None LEPL-5.1.3/src/lepl/matchers/0000755000175000001440000000000011764776700016263 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/matchers/_test/0000755000175000001440000000000011764776700017401 5ustar andrewusers00000000000000LEPL-5.1.3/src/lepl/matchers/_test/combine.py0000644000175000001440000000754611731166640021371 0ustar andrewusers00000000000000# The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the combining matchers. ''' #from logging import basicConfig, DEBUG from unittest import TestCase from lepl.matchers.combine import DepthFirst, BreadthFirst, Difference, Limit from lepl.matchers.core import Any from lepl.matchers.derived import Integer, Real from lepl._test.base import BaseTest class DirectionTest1(TestCase): def matcher(self): return Any() def test_depth(self): #basicConfig(level=DEBUG) matcher = DepthFirst(self.matcher(), 1, 2) matcher.config.no_full_first_match() matcher = matcher.get_match() results = list(map(''.join, map(lambda x: x[0], matcher('123')))) assert results == ['12', '1'], results def test_breadth(self): matcher = BreadthFirst(self.matcher(), 1, 2) matcher.config.no_full_first_match() matcher = matcher.get_match() results = list(map(''.join, map(lambda x: x[0], matcher('123')))) assert results == ['1', '12'], results class DirectionTest2(TestCase): def matcher(self): return ~Any()[:] & Any() def test_depth(self): matcher = DepthFirst(self.matcher(), 1, 2) matcher.config.no_full_first_match() matcher = matcher.get_match() results = list(map(''.join, map(lambda x: x[0], matcher('123')))) assert results == ['3', '23', '2', '13', '12', '1'], results def test_breadth(self): matcher = BreadthFirst(self.matcher(), 1, 2) matcher.config.no_full_first_match() matcher = matcher.get_match() results = list(map(''.join, map(lambda x: x[0], matcher('123')))) assert results == ['3', '2', '1', '23', '13', '12'], results class DifferenceTest(BaseTest): def test_difference(self): #basicConfig(level=DEBUG) matcher = Difference(Real(), Integer()) self.assert_direct('12.3', matcher, [['12.3'], ['12.']]) def test_count(self): #basicConfig(level=DEBUG) matcher = Difference(Real(), Integer(), count=1) self.assert_direct('12.3', matcher, [['12.3'], ['12.'], ['1']]) class LimitTest(BaseTest): def test_limit(self): self.assert_direct('1.2', Real(), [['1.2'], ['1.'], ['1']]) self.assert_direct('1.2', Limit(Real(), 1), [['1.2']]) self.assert_direct('1.2', Limit(Real(), 2), [['1.2'], ['1.']]) self.assert_direct('1.2', Limit(Real(), 0), []) LEPL-5.1.3/src/lepl/matchers/_test/__init__.py0000644000175000001440000000375711731117151021505 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for the lepl.matchers package. ''' from sys import version # we need to import all files used in the automated self-test # pylint: disable-msg=E0611 #@PydevCodeAnalysisIgnore import lepl.matchers._test.combine import lepl.matchers._test.core import lepl.matchers._test.derived import lepl.matchers._test.error import lepl.matchers._test.float_bug import lepl.matchers._test.memo import lepl.matchers._test.operators import lepl.matchers._test.separators import lepl.matchers._test.support import lepl.matchers._test.transform import lepl.matchers._test.variables LEPL-5.1.3/src/lepl/matchers/_test/float_bug.py0000644000175000001440000001144511731117151021701 0ustar andrewusers00000000000000 # The contents of this file are subject to the Mozilla Public License # (MPL) Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at http://www.mozilla.org/MPL/ # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and # limitations under the License. # # The Original Code is LEPL (http://www.acooke.org/lepl) # The Initial Developer of the Original Code is Andrew Cooke. # Portions created by the Initial Developer are Copyright (C) 2009-2010 # Andrew Cooke (andrew@acooke.org). All Rights Reserved. # # Alternatively, the contents of this file may be used under the terms # of the LGPL license (the GNU Lesser General Public License, # http://www.gnu.org/licenses/lgpl.html), in which case the provisions # of the LGPL License are applicable instead of those above. # # If you wish to allow use of your version of this file only under the # terms of the LGPL License and not to allow others to use your version # of this file under the MPL, indicate your decision by deleting the # provisions above and replace them with the notice and other provisions # required by the LGPL License. If you do not delete the provisions # above, a recipient may use your version of this file under either the # MPL or the LGPL License. ''' Tests for a weird bug when writing the float (rational at the time) matcher. This came down to optional entries being mapped in NFA as empty transitions from a single node. If multiple choices were from the same node the empty transition order was incorrect (it's ordered by node number, and the node for the empty transition was created after other nodes, intead of before). The fix used was to change the node creation order. Other nodes appear to be created correctly. However, it would be better in the longer term, I suspect, to use an ordered dict or similar to store the empty transitions so that the numbering is not needed for order. ''' #from logging import basicConfig, DEBUG from lepl.matchers.derived import UnsignedFloat, SignedInteger, Join from lepl.matchers.combine import Or from lepl.matchers.core import Any from lepl._test.base import BaseTest # pylint: disable-msg=C0103, C0111, C0301, W0702, C0324, C0102, C0321, W0141, R0201, R0913, R0901, R0904 # (dude this is just a test) #basicConfig(level=DEBUG) class FloatTest(BaseTest): def test_first(self): self.assert_direct('1.e3', Join(UnsignedFloat(), Any('eE'), SignedInteger()), [['1.e3']]) def test_second(self): self.assert_direct('1.e3', UnsignedFloat(), [['1.']]) def test_all(self): first = Join(UnsignedFloat(), Any('eE'), SignedInteger()) second = UnsignedFloat() all = Or(first, second) all.config.default() # wrong order #all.config.compile_to_dfa() # gives 1.e3 only #all.config.compile_to_nfa() # wrong order #all.config.no_compile_to_regexp() # ok #all.config.clear() # ok self.assert_direct('1.e3', all, [['1.e3'], ['1.']]) def test_nfa(self): first = Join(UnsignedFloat(), Any('eE'), SignedInteger()) second = UnsignedFloat() all = Or(first, second) all.config.clear().compile_to_nfa() m = all.get_parse() #print(m.matcher) #print(m.matcher.regexp) # (?: # (?: # (?:[0-9] # (?:[0-9])* # )? # \.[0-9](?:[0-9])* # | # [0-9](?:[0-9])*(?:\.)? # ) # [Ee](?:[\+\-])?[0-9](?:[0-9])* # | # (?: # (?: # [0-9](?:[0-9])* # )?\.[0-9](?:[0-9])* # | # [0-9](?:[0-9])*\. # ) # ) #DEBUG:lepl.regexp.core.Compiler:compiling to nfa: (?P