SimpleParse-2.2.0/ 0000755 0001750 0001750 00000000000 12620710576 015341 5 ustar mcfletch mcfletch 0000000 0000000 SimpleParse-2.2.0/SimpleParse.egg-info/ 0000755 0001750 0001750 00000000000 12620710576 021257 5 ustar mcfletch mcfletch 0000000 0000000 SimpleParse-2.2.0/SimpleParse.egg-info/PKG-INFO 0000644 0001750 0001750 00000001655 12620710576 022363 0 ustar mcfletch mcfletch 0000000 0000000 Metadata-Version: 1.1 Name: SimpleParse Version: 2.2.0 Summary: A Parser Generator for Python (w/mxTextTools derivative) Home-page: http://simpleparse.sourceforge.net/ Author: Mike C. Fletcher Author-email: mcfletch@users.sourceforge.net License: UNKNOWN Description: A Parser Generator for Python (w/mxTextTools derivative) Provides a moderately fast parser generator for use with Python, includes a forked version of the mxTextTools text-processing library modified to eliminate recursive operation and fix a number of undesirable behaviours. Converts EBNF grammars directly to single-pass parsers for many largely deterministic grammars. Keywords: parse,parser,parsing,text,ebnf,grammar,generator Platform: Any Classifier: Programming Language :: Python Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Intended Audience :: Developers SimpleParse-2.2.0/SimpleParse.egg-info/SOURCES.txt 0000644 0001750 0001750 00000006354 12620710576 023153 0 ustar mcfletch mcfletch 0000000 0000000 MANIFEST.in license.txt setup.py tox.ini /home/mcfletch/OpenGL-dev/simpleparse/simpleparse/stt/TextTools/mxTextTools/mxTextTools.c /home/mcfletch/OpenGL-dev/simpleparse/simpleparse/stt/TextTools/mxTextTools/mxbmse.c /home/mcfletch/OpenGL-dev/simpleparse/simpleparse/stt/TextTools/mxTextTools/mxte.c SimpleParse.egg-info/PKG-INFO SimpleParse.egg-info/SOURCES.txt SimpleParse.egg-info/dependency_links.txt SimpleParse.egg-info/top_level.txt doc/common_problems.html doc/index.html doc/mxLicense.html doc/processing_result_trees.html doc/scanning_with_simpleparse.html doc/simpleparse_grammars.html doc/sitestyle.css simpleparse/__init__.py simpleparse/baseparser.py simpleparse/dispatchprocessor.py simpleparse/error.py simpleparse/generator.py simpleparse/objectgenerator.py simpleparse/parser.py simpleparse/printers.py simpleparse/processor.py simpleparse/simpleparsegrammar.py simpleparse/common/__init__.py simpleparse/common/calendar_names.py simpleparse/common/chartypes.py simpleparse/common/comments.py simpleparse/common/iso_date.py simpleparse/common/iso_date_loose.py simpleparse/common/numbers.py simpleparse/common/phonetics.py simpleparse/common/strings.py simpleparse/common/timezone_names.py simpleparse/stt/COPYRIGHT simpleparse/stt/LICENSE simpleparse/stt/__init__.py simpleparse/stt/mxLicense.html simpleparse/stt/Doc/eGenix-mx-Extensions.html simpleparse/stt/Doc/mxLicense.html simpleparse/stt/Doc/mxTextTools.html simpleparse/stt/TextTools/COPYRIGHT simpleparse/stt/TextTools/LICENSE simpleparse/stt/TextTools/Makefile.pkg simpleparse/stt/TextTools/README simpleparse/stt/TextTools/TextTools.py simpleparse/stt/TextTools/__init__.py simpleparse/stt/TextTools/Constants/Sets.py simpleparse/stt/TextTools/Constants/TagTables.py simpleparse/stt/TextTools/Constants/__init__.py simpleparse/stt/TextTools/mxTextTools/Makefile.pre.in simpleparse/stt/TextTools/mxTextTools/__init__.py simpleparse/stt/TextTools/mxTextTools/highcommands.h simpleparse/stt/TextTools/mxTextTools/lowlevelcommands.h simpleparse/stt/TextTools/mxTextTools/mx.h simpleparse/stt/TextTools/mxTextTools/mxTextTools.c simpleparse/stt/TextTools/mxTextTools/mxTextTools.c.~1~ simpleparse/stt/TextTools/mxTextTools/mxTextTools.def simpleparse/stt/TextTools/mxTextTools/mxTextTools.h simpleparse/stt/TextTools/mxTextTools/mxbmse.c simpleparse/stt/TextTools/mxTextTools/mxbmse.h simpleparse/stt/TextTools/mxTextTools/mxh.h simpleparse/stt/TextTools/mxTextTools/mxpyapi.h simpleparse/stt/TextTools/mxTextTools/mxstdlib.h simpleparse/stt/TextTools/mxTextTools/mxte.c simpleparse/stt/TextTools/mxTextTools/mxte_impl.h simpleparse/stt/TextTools/mxTextTools/recursecommands.h simpleparse/stt/TextTools/mxTextTools/speccommands.h simpleparse/xmlparser/__init__.py simpleparse/xmlparser/xml_parser.py tests/__init__.py tests/genericvalues.py tests/mx_flag.py tests/mx_high.py tests/mx_low.py tests/mx_recursive.py tests/mx_special.py tests/test_backup_on_subtable_failure.py tests/test_common_chartypes.py tests/test_common_comments.py tests/test_common_iso_date.py tests/test_common_numbers.py tests/test_common_strings.py tests/test_deep_nesting.py tests/test_erroronfail.py tests/test_grammarparser.py tests/test_objectgenerator.py tests/test_optimisation.py tests/test_printers.py tests/test_simpleparsegrammar.py tests/test_xml.py SimpleParse-2.2.0/SimpleParse.egg-info/dependency_links.txt 0000644 0001750 0001750 00000000001 12620710576 025325 0 ustar mcfletch mcfletch 0000000 0000000 SimpleParse-2.2.0/SimpleParse.egg-info/top_level.txt 0000644 0001750 0001750 00000000014 12620710576 024004 0 ustar mcfletch mcfletch 0000000 0000000 simpleparse SimpleParse-2.2.0/doc/ 0000755 0001750 0001750 00000000000 12620710576 016106 5 ustar mcfletch mcfletch 0000000 0000000 SimpleParse-2.2.0/doc/common_problems.html 0000644 0001750 0001750 00000014100 12037615407 022162 0 ustar mcfletch mcfletch 0000000 0000000
Describes common errors, anti-patterns and known bugs with the SimpleParse 2.0 engine.
Is extremely inefficient, it generates 4 new Python objects and a number of new object pointers for every match (figure > 100 bytes for each match), on top of the engine overhead in tracking the recursion, so if you have a 1-million character match that's “matching” for every character, you'll have hundreds of megabytes of memory used.
In addition, if you are not using the non-recursive rewrite of mx.TextTools, you can actually blow up the C stack with the recursive calls to tag(). Symptoms of this are a memory access error when attempting to parse.
a := 'b', a? # bad!
a := 'b'+ # good!
At present, there's no way for the engine to know whether a child has been satisfied (matched) because they are optional (or all of their children are optional), or because they actually matched. The problem with the obvious solution of just checking whether we've moved forward in the text is that many classes of match object may match depending on external (non-text-based) conditions, so if we do the check, all of those mechanisms suddenly fail. For now, make sure:
No child of a repeating FirstOfGroup (x/y/z)+ or (x/y/z)* can match a Null-string
At least one child of a repeating SequentialGroup (x,y,z)+ or (x,y,z)* must not match the Null-string
You can recognize this situation by the process going into an endless loop with little or no memory being consumed. To fix this one, I'd likely need to add another return value type to the mxTextTools engine.
The TextTools engine does not support backtracking as seen in RE engines and many parsers, so productions like this can never match:
a := (b/c)*, c
Because the 'c' productions will all have been consumed by the FirstOfGroup, so the last 'c' can never match. This is a fundamental limit of the current back-end, so unless a new back-end is created, the problem will not go away. You will need to design your grammars accordingly.
The production c := (a/b) produces a FirstOfGroup, that is, a group which
matches the first child to match. Many parsers and regex engines
use an algorithm that matches all children and chooses the longest successful
match. It would be possible to define a new TextTools tagging command to
support the longest-of semantics for Table/SubTable matches, but I haven't
felt the need to do so. If such a command is created, it will likely be
spelled '|' rather than '/' in the SimpleParse grammar.
Although not particularly likely, users of SimpleParse 1.0 may have relied
on the (extremely non-intuitive) grouping mechanism for element tokens in
their grammars. With that mechanism, the group:
a,b,c/d,e
was interpreted as:
a,b,(c/(d,e))
The new rule is simply that alternation binds closer than sequences, so
the same grammar becomes:
a,b,(c/d),e
which, though no more (or less) intuitive than:
(a,b,c)/(d,e) ### it doesn't work this way!!!
is certainly better than the original mechanism.
You will, if possible, want to use the non-recursive rewrite of the 2.1.0
mxTextTools engine (2.1.0nr). At the time of writing, the mainline 2.1.0b3
has some errors (which I'm told are fixed for 2.1.0final), while the non-recursive
rewrite passes all tests. The bugs in the (recursive) engine(s) that are
known (and not likely to be fixed in the case of 2.1.0 final) are:
A
Open Source project
Public License : Commercial License : Home | Version 1.0.0 |
The mx Extensions Series packages are brought to you by the eGenix.com Software, Skills and Services GmbH, Langenfeld, Germany. We are licensing our products under the following two different licenses:
The Public License is very similar to the Python 2.0 license and covers the open source software made available by eGenix.com which is free of charge even for commercial use.
The Commercial License is intended for covering commercial eGenix.com software, notably the mxODBC package. Only private and non-commercial use is free of charge.
If you have questions regarding these licenses, please contact Licenses@eGenix.com. If you would like to bundle the software with your commercial product, please write to Sales@eGenix.com for more information about the redistribution conditions and terms.
The eGenix.com Public License is similar to the Python 2.0 and considered an Open Source license (in the sense defined by the Open Source Intiative (OSI)) by eGenix.com.
The license should also be compatible to the GNU Public License in case that matters. The only part which is known to have caused some problems with Richard Stallmann in the past is the choice of law clause.
EGENIX.COM PUBLIC LICENSE AGREEMENT VERSION 1.0.0
1. IntroductionThis "License Agreement" is between eGenix.com Software, Skills and Services GmbH ("eGenix.com"), having an office at Pastor-Loeh-Str. 48, D-40764 Langenfeld, Germany, and the Individual or Organization ("Licensee") accessing and otherwise using this software in source or binary form and its associated documentation ("the Software").
2. LicenseSubject to the terms and conditions of this eGenix.com Public License Agreement, eGenix.com hereby grants Licensee a non-exclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use the Software alone or in any derivative version, provided, however, that the eGenix.com Public License Agreement is retained in the Software, or in any derivative version of the Software prepared by Licensee.
3. NO WARRANTYeGenix.com is making the Software available to Licensee on an "AS IS" basis. SUBJECT TO ANY STATUTORY WARRANTIES WHICH CAN NOT BE EXCLUDED, EGENIX.COM MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, EGENIX.COM MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
4. LIMITATION OF LIABILITYEGENIX.COM SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER PECUNIARY LOSS) AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THE ABOVE EXCLUSION OR LIMITATION MAY NOT APPLY TO LICENSEE.
5. TerminationThis License Agreement will automatically terminate upon a material breach of its terms and conditions.
6. GeneralNothing in this License Agreement affects any statutory rights of consumers that cannot be waived or limited by contract. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between eGenix.com and Licensee. If any provision of this License Agreement shall be unlawful, void, or for any reason unenforceable, such provision shall be modified to the extent necessary to render it enforceable without losing its intent, or, if no such modification is possible, be severed from this License Agreement and shall not affect the validity and enforceability of the remaining provisions of this License Agreement. This License Agreement shall be governed by and interpreted in all respects by the law of Germany, excluding conflict of law provisions. It shall not be governed by the United Nations Convention on Contracts for International Sale of Goods. This License Agreement does not grant permission to use eGenix.com trademarks or trade names in a trademark sense to endorse or promote products or services of Licensee, or any third party. The controlling language of this License Agreement is English. If Licensee has received a translation into another language, it has been provided for Licensee's convenience only.
14. AgreementBy downloading, copying, installing or otherwise using the Software, Licensee agrees to be bound by the terms and conditions of this License Agreement.
|
The eGenix.com Commercial License is covers commercial eGenix.com software, notably the mxODBC package. Only private and non-commercial use is free of charge. Usage of the software in commercial settings such as for implementing in-house applications in/for companies or consulting work where the software is used as tool requires a "Proof of Authorization" which can be bought from eGenix.com.
EGENIX.COM COMMERCIAL LICENSE AGREEMENT VERSION 1.0.0
1. IntroductionThis "License Agreement" is between eGenix.com Software, Skills and Services GmbH ("eGenix.com"), having an office at Pastor-Loeh-Str. 48, D-40764 Langenfeld, Germany, and the Individual or Organization ("Licensee") accessing and otherwise using this software in source or binary form and its associated documentation ("the Software").
2. Terms and DefinitionsThe "Software" covered under this License Agreement includes without limitation, all object code, source code, help files, publications, documentation and other programs, products or tools that are included in the official "Software Distribution" available from eGenix.com. The "Proof of Authorization" for the Software is a written and signed notice from eGenix.com providing evidence of the extent of authorizations the Licensee has acquired to use the Software and of Licensee's eligibility for future upgrade program prices (if announced) and potential special or promotional opportunities. As such, the Proof of Authorization becomes part of this License Agreement. Installation of the Software ("Installation") refers to the process of unpacking or copying the files included in the Software Distribution to an Installation Target. "Installation Target" refers to the target of an installation operation. Targets are defined as follows: 1) "CPU" refers to a central processing unit which is able to store and/or execute the Software (a server, personal computer, or other computer-like device) using at most two (2) processors, 2) "Site" refers to at most one hundred fifty (150) CPUs installed at a single site of a company, 3) "Corporate" refers to at most one thousand (1000) CPUs installed at an unlimited number of sites of the company, 4) "Developer CPU" refers to a single CPU used by at most one (1) developer. When installing the Software on a server CPU for use by other CPUs in a network, Licensee must obtain a License for the server CPU and for all client CPUs attached to the network which will make use of the Software by copying the Software in binary or source form from the server into their CPU memory. If a CPU makes use of more than two (2) processors, Licensee must obtain additional CPU licenses to cover the total number of installed processors. Likewise, if a Developer CPU is used by more than one developer, Licensee must obtain additional Developer CPU licenses to cover the total number of developers using the CPU. "Commercial Environment" refers to any application environment which is aimed at producing profit. This includes, without limitation, for-profit organizations, work as independent contractor, consultant and other profit generating relationships with organizations or individuals. "Non-Commercial Environments" are all those application environments which do not directly or indirectly generate profit. Educational and other officially acknowledged non-profit organizations are regarded as being a Non-Commercial Environment in the above sense.
3. License GrantSubject to the terms and conditions of this License Agreement, eGenix.com hereby grants Licensee a non-exclusive, world-wide license to 1) use the Software to the extent of authorizations Licensee has acquired and 2) distribute, make and install copies to support the level of use authorized, providing Licensee reproduces this License Agreement and any other legends of ownership on each copy, or partial copy, of the Software. If Licensee acquires this Software as a program upgrade, Licensee's authorization to use the Software from which Licensee upgraded is terminated. Licensee will ensure that anyone who uses the Software does so only in compliance with the terms of this License Agreement. Licensee may not 1) use, copy, install, compile, modify, or distribute the Software except as provided in this License Agreement; 2) reverse assemble, reverse engineer, reverse compile, or otherwise translate the Software except as specifically permitted by law without the possibility of contractual waiver; or 3) rent, sublicense or lease the Software.
4. AuthorizationsThe extent of authorization depends on the ownership of a Proof of Authorization for the Software. Usage of the Software for any other purpose not explicitly covered by this License Agreement or granted by the Proof of Authorization is not permitted and requires the written prior permission from eGenix.com.
4.1. Non-Commercial EnvironmentsThis section applies to all uses of the Software without a Proof of Authorization for the Software in a Non-Commercial Environment. Licensee may copy, install, compile, modify and use the Software under the terms of this License Agreement FOR NON-COMMERCIAL PURPOSES ONLY. Use of the Software in a Commercial Environment or for any other purpose, such as redistribution, IS NOT PERMITTED BY THIS LICENSE and requires a Proof of Authorization from eGenix.com.
4.2. Evaluation Period for Commercial EnvironmentsThis section applies to all uses of the Software without a Proof of Authorization for the Software in a Commercial Environment. Licensee may copy, install, compile, modify and use the Software under the terms of this License Agreement FOR EVALUATION AND TESTING PURPOSES and DURING A LIMITED EVALUATION PERIOD OF AT MOST THIRTY (30) DAYS AFTER INITIAL INSTALLATION ONLY. For use of the Software after the evaluation period or for any other purpose, such as redistribution, Licensee must obtain a Proof of Authorization from eGenix.com. If Licensee decides not to obtain a Proof of Authorization after the evaluation period, Licensee agrees to cease using and to remove all installed copies of the Software.
4.3. Usage under Proof of AuthorizationThis section applies to all uses of the Software provided that Licensee owns a Proof of Authorization for the Software. Licensee may copy, install, compile, modify, use and distribute the Software to the extent of authorization acquired by the Proof of Authorization and under the terms an conditions of this License Agreement.
5. Transfer of Rights and ObligationsLicensee may transfer all license rights and obligations under a Proof of Authorization for the Software to another party by transferring the Proof of Authorization and a copy of this License Agreement and all documentation. The transfer of Licensee's license rights and obligations terminates Licensee's authorization to use the Software under the Proof of Authorization.
6. ModificationsSoftware modifications may only be distributed in form of patches to the original files contained in the Software Distribution. The patches must be accompanied by a legend of origin and ownership and a visible message stating that the patches are not original Software delivered by eGenix.com, nor that eGenix.com can be held liable for possible damages related directly or indirectly to the patches if they are applied to the Software.
7. Experimental Code or FeaturesThe Software may include components containing experimental code or features which may be modified substantially before becoming generally available. These experimental components or features may not be at the level of performance or compatibility of generally available eGenix.com products. eGenix.com does not guarantee that any of the experimental components or features contained in the eGenix.com will ever be made generally available.
8. Expiration and License Control DevicesComponents of the Software may contain disabling or license control devices that will prevent them from being used after the expiration of a period of time or on Installation Targets for which no license was obtained. Licensee will not tamper with these disabling devices or the components. Licensee will take precautions to avoid any loss of data that might result when the components can no longer be used.
9. NO WARRANTYeGenix.com is making the Software available to Licensee on an "AS IS" basis. SUBJECT TO ANY STATUTORY WARRANTIES WHICH CAN NOT BE EXCLUDED, EGENIX.COM MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, EGENIX.COM MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
10. LIMITATION OF LIABILITYTO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL EGENIX.COM BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE SOFTWARE FOR (I) ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER PECUNIARY LOSS) AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF; OR (II) ANY AMOUNTS IN EXCESS OF THE AGGREGATE AMOUNTS PAID TO EGENIX.COM UNDER THIS LICENSE AGREEMENT DURING THE TWELVE (12) MONTH PERIOD PRECEEDING THE DATE THE CAUSE OF ACTION AROSE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THE ABOVE EXCLUSION OR LIMITATION MAY NOT APPLY TO LICENSEE.
11. TerminationThis License Agreement will automatically terminate upon a material breach of its terms and conditions if not cured within thirty (30) days of written notice by eGenix.com. Upon termination, Licensee shall discontinue use and remove all installed copies of the Software.
12. IndemnificationLicensee hereby agrees to indemnify eGenix.com against and hold harmless eGenix.com from any claims, lawsuits or other losses that arise out of Licensee's breach of any provision of this License Agreement.
13. Third Party RightsAny software or documentation in source or binary form provided along with the Software that is associated with a separate license agreement is licensed to Licensee under the terms of that license agreement. This License Agreement does not apply to those portions of the Software. Copies of the third party licenses are included in the Software Distribution.
14. High Risk ActivitiesThe Software is not fault-tolerant and is not designed, manufactured or intended for use or resale as on-line control equipment in hazardous environments requiring fail-safe performance, such as in the operation of nuclear facilities, aircraft navigation or communication systems, air traffic control, direct life support machines, or weapons systems, in which the failure of the Software, or any software, tool, process, or service that was developed using the Software, could lead directly to death, personal injury, or severe physical or environmental damage ("High Risk Activities"). Accordingly, eGenix.com specifically disclaims any express or implied warranty of fitness for High Risk Activities. Licensee agree that eGenix.com will not be liable for any claims or damages arising from the use of the Software, or any software, tool, process, or service that was developed using the Software, in such applications.
15. GeneralNothing in this License Agreement affects any statutory rights of consumers that cannot be waived or limited by contract. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between eGenix.com and Licensee. If any provision of this License Agreement shall be unlawful, void, or for any reason unenforceable, such provision shall be modified to the extent necessary to render it enforceable without losing its intent, or, if no such modification is possible, be severed from this License Agreement and shall not affect the validity and enforceability of the remaining provisions of this License Agreement. This License Agreement shall be governed by and interpreted in all respects by the law of Germany, excluding conflict of law provisions. It shall not be governed by the United Nations Convention on Contracts for International Sale of Goods. This License Agreement does not grant permission to use eGenix.com trademarks or trade names in a trademark sense to endorse or promote products or services of Licensee, or any third party. The controlling language of this License Agreement is English. If Licensee has received a translation into another language, it has been provided for Licensee's convenience only.
16. AgreementBy downloading, copying, installing or otherwise using the Software, Licensee agrees to be bound by the terms and conditions of this License Agreement.
For question regarding this license agreement, please write to: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str. 48 D-40764 Langenfeld Germany |
The following two sections give examples of the "Proof of Authorization" for a commercial use license of product under this license.
When you buy such a license, you will receive a signed "Proof of Authorization" by postal mail within a week or two. We will also send you the Proof of Authorization Key by e-mail to acknowledge acceptance of the payment.
EGENIX.COM PROOF OF AUTHORIZATION (Example: CPU License)
1. License GranteGenix.com Software, Skills and Services GmbH ("eGenix.com"), having an office at Pastor-Loeh-Str. 48, D-40764 Langenfeld, Germany, hereby grants the Individual or Organization ("Licensee") a non-exclusive, world-wide license to use the software listed below in source or binary form and its associated documentation ("the Software") under the terms and conditions of the eGenix.com Commercial License Agreement Version 1.0.0 and to the extent authorized by this Proof of Authorization.
2. Covered SoftwareSoftware Name: mxODBC Python ODBC Interface Software Version: Version 2.0.0 Software Distribution: mxODBC-2.0.0.zip Software Distribution MD5 Hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Operating System: any compatible operating system 3. AuthorizationseGenix.com hereby authorizes Licensee to copy, install, compile, modify and use the Software on the following Installation Targets. Installation Targets: one (1) CPURedistribution of the Software is not allowed under this Proof of Authorization.
4. ProofThis Proof of Authorization was issued by
__________________________________ Langenfeld, ______________________ Proof of Authorization Key: xxxx-xxxx-xxxx-xxxx-xxxx-xxxx
|
The next section gives an example of a "Developer CPU Licenses" which allows you to redistribute software built around the Software or integrating it. Please contact sales@eGenix.com for questions about the redistribution conditions.
EGENIX.COM PROOF OF AUTHORIZATION (Example: Developer License)
1. License GranteGenix.com Software, Skills and Services GmbH ("eGenix.com"), having an office at Pastor-Loeh-Str. 48, D-40764 Langenfeld, Germany, hereby grants the Individual or Organization ("Licensee") a non-exclusive, world-wide license to use and distribute the software listed below in source or binary form and its associated documentation ("the Software") under the terms and conditions of the eGenix.com Commercial License Agreement Version 1.0.0 and to the extent authorized by this Proof of Authorization.
2. Covered SoftwareSoftware Name: mxODBC Python ODBC Interface Software Version: Version 2.0.0 Software Distribution: mxODBC-2.0.0.zip Software Distribution MD5 Hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Operating System: any compatible operating system 3. Authorizations
3.1. Application DevelopmenteGenix.com hereby authorizes Licensee to copy, install, compile, modify and use the Software on the following Developer Installation Targets for the purpose of developing products using the Software as integral part. Developer Installation Targets: one (1) CPU 3.2. RedistributioneGenix.com hereby authorizes Licensee to redistribute the Software bundled with a products developed by Licensee on the Developer Installation Targets ("the Product") subject to the terms and conditions of the eGenix.com Commercial License Agreement for installation and use in combination with the Product on the following Redistribution Installation Targets, provided that: 1) Licensee shall not and shall not permit or assist any third party to sell or distribute the Software as a separate product; 2) Licensee shall not and shall not permit any third party to (i) market, sell or distribute the Software to any end user except subject to the eGenix Commercial License Agreement, (ii) rent, sell, lease or otherwise transfer the Software or any part thereof or use it for the benefit of any third party, (iii) use the Software outside the Product or for any other purpose not expressly licensed hereunder; 3) the Product does not provide functions or capabilities similar to those of the Software itself, i.e. the Product does not introduce commercial competition for the Software as sold by eGenix.com.
Redistribution Installation Targets: any number of CPUs capable of running the Product and the Software 4. ProofThis Proof of Authorization was issued by
__________________________________ Langenfeld, ______________________ Proof of Authorization Key: xxxx-xxxx-xxxx-xxxx-xxxx-xxxx
|
SimpleParse parsers generate tree structures describing the structure of your parsed content. This document briefly describes the structures, a simple mechanism for processing the structures, and ways to alter the structures as they are generated to accomplish specific goals.
Prerequisites:
SimpleParse uses the same result format as is used for the underlying mx.TextTools engine. The engine returns a three-item tuple from the parsing of the top-level (root) production like so:
success, resultTrees, nextCharacter = myParser.parse( someText, processor=None)
Success is a Boolean value indicating whether the production (by default
the root production) matched (was satisfied) at all. If success is
true, nextCharacter is an integer value indicating the next character
to be parsed in the text (i.e. someText[ startCharacter:nextCharacter
] was parsed).
[New in 2.0.0b2] Note: If success is false, then nextCharacter is
set to the (very ill-defined) "error position", which is the position reached
by the last TextTools command in the top-level production before the entire
table failed. This is a lower-level value than is usefully predictable within
SimpleParse (for instance, negative results which cause a failure will actually
report the position after the positive version of the element token succeeds).
You might, I suppose, use it as a hint to your users of where the error
occured, but using error-on-fail SyntaxErrors is by far the prefered
method. Basically, if success is false, consider nextCharacter to contain
garbage data.
When the processor argument to parse is false (or a non-callable object), the system does not attempt to use the default processing mechanism, and returns the result trees directly. The standard format for result-tree nodes is as follows:
(production_name, start, stop, children_trees)
Where start and stop represent indexes in the source text such that sourcetext
[ start: stop] is the text which matched this production. The list
of children is the list of a list of the result-trees for the child
productions within the production, or None (Note: that last is
important, you can't automatically do a "for" over the children_trees).
Expanded productions, as well as unreported productions (and the children of unreported productions), will not appear in the result trees, neither will the root production. See Understanding SimpleParse Grammars for details. However, LookAhead productions where the non-lookahead value would normally return results, will return their results in the position where the LookAhead is included in the grammar.
If the processor argument to parse is true and callable, the processor
object will be called with (success, resultTrees, nextCharacter) on completion
of parsing. The processor can then take whatever processing steps desired,
the return value from calling the processor with the results is returned
directly to the caller of parse.
SimpleParse 2.0 provides a simple mechanism for processing result trees, a recursive series of calls to attributes of a “Processor” object with functions to automate the call-by-name dispatching. This processor implementation is available for examination in the simpleparse.dispatchprocessor module. The main functions are:
def dispatch( source, tag, buffer ):
"""Dispatch on source for tag with buffer
Find the attribute or key "tag-object" (tag[0]) of source,
then call it with (tag, buffer)
"""
def dispatchList( source, taglist, buffer ):
"""Dispatch on source for each tag in taglist with buffer"""
def multiMap( taglist, source=None, buffer=None ):
"""Convert a taglist to a mapping from tag-object:[list-of-tags]
For instance, if you have items of 3 different types, in any order,
you can retrieve them all sorted by type with multimap( childlist)
then access them by tagobject key.
If source and buffer are specified, call dispatch on all items.
"""
def singleMap( taglist, source=None, buffer=None ):
"""Convert a taglist to a mapping from tag-object:tag,
overwritting early with late tags. If source and buffer
are specified, call dispatch on all items."""
def getString( (tag, left, right, sublist), buffer):
"""Return the string value of the tag passed"""
def lines( start=None, end=None, buffer=None ):
"""Return number of lines in buffer[start:end]"""
With a class DispatchProcessor, which provides a __call__ implementation
to trigger dispatching for both "called as root processor" and "called
to process an individual result element" cases.
You define a DispatchProcessor sub-class with methods named for each production
that will be processed by the processor, with signatures of:
from simpleparse.dispatchprocessor import *
class MyProcessorClass( DispatchProcessor ):
def production_name( self, (tag,start,stop,subtags), buffer ):
"""Process the given production and it's children"""
Within those production-handling methods, you can call the dispatch functions
to process the sub-tags of the current production (keep in mind that the
sub-tags "list" may be a None object). You can see examples of this processing
methodology in simpleparse.simpleparsegrammar, simpleparse.common.iso_date
and simpleparse.common.strings (among others).
For real-world Parsers, where you normally use the same processing class
for all runs of the parser, you can define a default Processor class like
so:
class MyParser( Parser ):
def buildProcessor( self ):
return MyProcessorClass()
so that if no processor is explicitly specified in the parse call, your
"MyProcessorClass" instance will be used for processing the results.
SimpleParse 2.0 introduced features which expose certain of the mx.TextTool library's features for producing non-standard result trees. Although not generally recommended for use in “normal” parsers, these features are useful for certain types of text processing, and their exposure was requested. Each flag has a different effect on the result tree, the particular effects are discussed below.
The exposure is through the Processor (or more precisely, a super-class
of Processor called “MethodSource”) object. To specify the use of one
of the flags, you set an attribute in your MethodSource object (your
Processor object) with the name _m_productionname (for the “method” to
use, which is either an actual callable object for use with CallTag,
or one of the other mx.TextTools flag constants above). In the case
of AppendTagobj , you will likely want to specify a particular tagobj
object to be appended, you do that by setting an attribute named _o_productionname
in your MethodSource. For AppendToTagobj, you must specify an _o_productionname object with an “append” method.
Note: you can use MethodSource as your direct ancestor
if you want to define a non-standard result tree, but don't want to do any
processing of the results (this is the reason for having seperate classes).
MethodSource does not define a __call__ method.
_m_productionname = callableObject(
taglist,
text,
left,
right,
subtags
)
The given object/method is called on a successful match with the values shown. The text argument is the entire text buffer being parsed, the rest of the values are what you're accustomed to seeing in result tuples.
Notes:
_m_productionname = AppendToTagobj
_o_productionname = objectWithAppendMethod
On a successful match, the system will call _o_productionname.append((None,l,r,subtags)) method. For some processing tasks, it's conceivable you might want to use this method to pull out all instances of a production from a larger (already-written) grammar where going through the whole results tree to find the deeply nested productions is considered too involved.
Notes:
_m_productionname = AppendMatch
On a successful match, the system will append the matched text to the result tree, rather than a tuple of results. In situations where you just want to extract the text, this can be useful. The downside is that your results tree has a non-standard format that you need to explicitly watch out for while processing the results.
_m_productionname = AppendTagobj
_o_productionname = any object
# object is optional, if omitted, the production name string is used
On a successful match, the system will append the tagobject to the result tree, rather than a tuple of results. In situations where you just want notification that the production has matched (and it doesn't matter what it matched), this can be useful. The downside, again, is that your results tree has a non-standard format that you need to explicitly watch out for while processing the results.
Up to index...A
Open Source project
SimpleParse 2.0 provides a parser generator which converts an EBNF grammar into a run-time parser for use in scanning/marking up texts. This document describes the process of developing and using an EBNF grammar to perform the text-scanning process.
Prerequisites:
The primary function of SimpleParse is to convert an EBNF grammar into an in-memory object which can do the work of scanning (and potentially processing) data which conforms to that grammar. Therefore, to use the system effectively, we need to be able to create grammars.
For our first experiment, we'll define a simple grammar for use in parsing an INI-file-like format. Users of SimpleParse 1.0 will recognise the format from the original documentation. This version uses somewhat more features (and is shorter as a result) than was easily accomplished with SimpleParse 1.0.
Here's the grammar definition:
____ simpleexample2_1.py ____
from simpleparse.common import numbers, strings, comments
declaration = r'''# note use of raw string when embedding in python code...
file := [ \t\n]*, section+
section := '[',identifier!,']'!, ts,'\n', body
body := statement*
statement := (ts,semicolon_comment)/equality/nullline
nullline := ts,'\n'
equality := ts, identifier,ts,'=',ts,identified,ts,'\n'
identifier := [a-zA-Z], [a-zA-Z0-9_]*
identified := string/number/identifier
ts := [ \t]*
'''
The first line incorporates a new feature of SimpleParse 2.0,
namely the
ability to automatically include (and build your own, incidentally)
libraries of commonly used productions (rules/patterns/grammars). By
importing these three modules, I've made the productions “string”,
“number” and “semicolon_comment” (among others) available to all the
Parser instances I create for the rest of this session.
New Feature Note: The identifier!
and ']'!
element tokens in the "section" production tell the parser generator to
report
a ParserSyntaxError if we attempt to parse these element tokens and
fail.
We could also have spelled this particular segment of the grammar:
section := '[',!,identifier,']', ts,'\n', body
which spelling is often easier to use in complex grammars.
If you are not familiar with EBNF grammars, or would like a reference to the various features of the SimpleParse grammar, please see: SimpleParse Grammars . We will assume that you understand the grammars being presented.
SimpleParse does not have a separate compilation step, but it's useful as you're writing your grammar to set up tests both for whether the grammar itself is syntactically correct, and for whether the productions match the values you expect them to (and don't match those you don't want them to).
To check that a grammar is syntactically correct, the easiest approach is to attempt to create a Parser with the grammar. The Parser will complain if your grammar is syntactically incorrect, generating a ValueError which reports the last line of the declaration which parsed correctly, and the remainder of the declaration.
from simpleparse.parser import Parser
parser = Parser( declaration)
If, for example, you had left out a comma in the “section” production between the literal ']' and ts, you would get an error like so:
S:\sp\simpleparse\examples>bad_declaration.py
Traceback (most recent call last):
File "S:\sp\simpleparse\examples\bad_declaration.py", line 21, in ?
parser = Parser( declaration, "file" ) # will raise ValueError
File "S:\sp\simpleparse\parser.py", line 34, in __init__
definitionSources = definitionSources,
File "S:\sp\simpleparse\simpleparsegrammar.py", line 380, in __init__
raise ValueError(
ValueError: Unable to complete parsing of the EBNF, stopped at line 3 (134 chars
of 467)
Unparsed:
ts,'\n', body
body := statement*
statement := (ts,semicolon_comment)/equality/nulll...
You can see this for yourself by running examples/bad_declaration.py .
If your grammar is correct, Parser( declaration) will simply create the underlying generator objects which can produce a parser for your grammar. If you want to check that particular production has all of it's required sub-productions, you can call myparser.buildTagger( productionname ), but I normally leave that test to be caught during the “production checking” phase below.
Now that we have our Parser object, and know that the grammar is syntactically correct, we can test that our productions match/don't match the values we expect. Depending on your particular philosophy, this may be done using the unittest module, or merely as informal tests during development.
In our grammar above, let's try checking that the equality production really does match some values we expect it to match:
testEquality = [
"s=3\n",
"s = 3\n",
''' s="three\\nthere"\n''',
''' s=three\n''',
]
production = "equality"
for testData in testEquality:
success, children, nextcharacter = parser.parse( testData, production=production)
assert success and nextcharacter==len(testData), """Wasn't able to parse %s as a %s (%s chars parsed of %s), returned value was %s"""%( repr(testData), production, nextcharacter, len(testData), (success, children, nextcharacter))
You should be prepared to have those tests fail a few times. It's easy to miss the effect of a particular feature of your grammar (such as the inclusion of “newline” in the equality production above). It took 3 tries before I got the tests above properly defined. Setting up your tests within an automated framework such as unittest is probably a good idea. It's also a good idea to set up tests that check that that values which shouldn't match don't.
Note: You may receive an error message from the parser.parse( ) call saying that a particular production name isn't defined within the grammar. You'll need to figure out why that name isn't there (did you include the common module you were planning to use, or did you mis-type a name somewhere?) and correct the problem before the tests will run. This error serves as a check that the production has all required sub-productions (as noted in the previous section).
You saw the basic approach to parsing in the section on testing above, but there are a few differences when you're creating a “real world” parser. The first is that you will likely want to define a default root production for the parser. In the examples above, the “root” was specified explicitly during the call to parse to allow us to test any of the productions in the grammar. In normal use, you don't want users of your parser to need to know what production is used for parsing a buffer, so you provide a default in the Parser's initialiser:
parser = Parser( declaration, "file" )
parser.parse( testData)
Note: the root is treated differently than all other productions, as it doesn't return a result-tuple in the results tree, but instead governs the overall operation of the parser, determining whether it “succeeds” or “fails” as a whole. The children of the root production produce the top-level results of the parsing pass.
You can see the result tree returned from the parse method by running examples/simpleexample2_3.py . You can read about how to process the results tree in “Processing Result Trees”.
Up to index...A
Open Source project
SimpleParse is a BSD-licensed Python package providing a simple and fast parser generator using a modified version of the mxTextTools text-tagging engine. SimpleParse allows you to generate parsers directly from your EBNF grammar.
Unlike most parser generators, SimpleParse generates single-pass parsers (there is no distinct tokenization stage), an approach taken from the predecessor project (mcf.pars) which attempted to create "autonomously parsing regex objects". The resulting parsers are not as generalized as those created by, for instance, the Earley algorithm, but they do tend to be useful for the parsing of computer file formats and the like (as distinct from natural language and similar "hard" parsing problems).
As of version 2.1.0 the SimpleParse project includes a patched copy
of the mxTextTools tagging library with the non-recursive rewrite of
the core parsing loop. This means that you will need to build the
extension module to use SimpleParse, but the effect is to provide a
uniform parsing platform where all of the features of a give
SimpleParse version are always available.
For those interested in working on the project, I'm actively interested in welcoming and supporting both new developers and new users. Feel free to contact me.
You will need a copy of Python 2.7, 3.3 or above. If you are compiling the package you'll also need a C compiler compatible with your Python.
To install the base SimpleParse engine:
$ pip install SimpleParse
New in 3.0.0:
New in 2.1.1:
New in 2.1.1a2:
New in 2.1.1a1:
New in 2.1.0a1:
New in 2.0.1:
diff -w -r1.4 error.py
32c32
< return '%s: %s'%( self.__class__.__name__, self.messageFormat(message) )
---
> return '%s: %s'%( self.__class__.__name__, self.messageFormat(self.message) )
New in 2.0:
General
Our (current) parsers are top-down, in that they work from the top
of the parsing graph (the root production). They are not, however,
tokenising parsers, so there is no appropriate LL(x) designation as far
as I can see, and there is an arbitrary lookahead mechanism that could
theoretically parse the entire rest of the file just to see if a
particular character matches). I would hazard a guess that they
are theoretically closest to a deterministic recursive-descent parser.
There are no backtracking facilities, so any ambiguity is handled by
choosing the first successful match of a grammar (not the longest, as
in most top-down parsers, mostly because without tokenisation, it would
be expensive to do checks for each possible match's length). As a
result of this, the parsers are entirely deterministic.
The time/memory characteristics are such that, in general, the time
to parse an input text varies with the amount of text to parse. There
are two major factors, the time to do the actual parsing (which, for
simple deterministic grammars should be close to linear with the length
of the text, though a pathalogical grammar might have radically
different operating characteristics) and the time to build the results
tree (which depends on the memory architecture of the machine, the
currently free memory, and the phase of the moon). As a rule,
SimpleParse parsers will be faster (for suitably limited grammars) than
anything you can code directly in Python. They will not generally
outperform grammar-specific parsers written in C.
mxTextTools Rewrite Enhancements
Alternate C Back-end?
NOTE: This section only applies to SimpleParse versions before 2.1.0, SimpleParse 2.1.0 and above include a patched version of mxTextTools already!
You will want an mxBase 2.1.0 distribution to run SimpleParse, preferably with the non-recursive rewrite. If you want to use the non-recursive implementation, you will need to get the source archive for mxTextTools. It is possible to use mxBase 2.0.3 with SimpleParse, but not to use it for building the non-recursive TextTools engine (2.0.3 also lacks a lot of features and bug-fixes found in the 2.1.0 versions).
Note: without the non-recursive rewrite of 2.1.0 (i.e. with the recursive version), the test suite will not pass all tests. I'm not sure why they fail with the recursive version, but it does argue for using the non-recursive rewrite.
To build the non-recursive TextTools engine, you'll need to
get the source distribution for the non-recursive implementation from
the SimpleParse
file repository. Note,
there are incompatabilities in the mxBase 2.1 versions that make it
necessary to use the versions specified below to build the
non-recursive versions.
This archive is intended to be expanded over the mxBase source archive from the top-level directory, replacing one file and adding four others.
cd egenix-mx-base-2.1.0
gunzip non-recursive-1.0.0b1.tar.gz
tar -xvf non-recursive-1.0.0b1.tar
(Or use WinZip on Windows). When you have completed that, run:
setup.py build --force install
in the top directory of the eGenix-mx-base source tree.
The 2.1.0 and greater releases include the eGenix mxTextTools extension:
Licensed under
the eGenix.com Public License see the mxLicense.html
file for details on
licensing terms for the original library, the eGenix extensions are:
Copyright (c) 1997-2000, Marc-Andre Lemburg
Copyright (c) 2000-2001, eGenix.com Software GmbH
Extensions to the eGenix extensions (most significantly the rewrite of the core loop) are copyright Mike Fletcher and released under the SimpleParse License below:
Copyright Å 2003-2006, Mike Fletcher
SimpleParse License:
Copyright Å 1998-2006, Copyright by
Mike C. Fletcher; All Rights Reserved.
mailto: mcfletch@users.sourceforge.net
Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee or royalty is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation or portions thereof, including modifications, that you make.
THE AUTHOR MIKE C. FLETCHER DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE!
A
Open Source project
SimpleParse uses a particular EBNF grammar which reflects the current set of features in the system. Though the system is modular enough that you could replace that grammar, most users will simply want to use the provided grammar. This document provides a quick reference for the various features of the grammar with examples of use and descriptions of their effects.
Prerequisites:
Here is an example of a basic SimpleParse grammar:
declaration = r'''# note use of raw string when embedding in python code...
file := [ \t\n]*, section+
section := '[',identifier!,']'!, ts,'\n', body
body := statement*
statement := (ts,semicolon_comment)/equality/nullline
nullline := ts,'\n'
comment := -'\n'*
equality := ts, identifier,ts,'=',ts,identified,ts,'\n'
identifier := [a-zA-Z], [a-zA-Z0-9_]*
identified := string/number/identifier
ts := [ \t]*
'''
You can see that the format allows for comments in Python style,
and fairly
free-form treatment of whitespace around the various items (i.e. “s:=x”
and
“s := x” are equivalent). The grammar is actually written such that you
can break productions (rules) across multiple lines if that will make
your
grammar more readable. The grammar also allows both ':=' and
'::=' for the
"defined as" symbol.
Element tokens are the basic operational unit of the grammar. The concrete implementation of the various tokens is the module simpleparse.objectgenerator, their syntax is defined in the module simpleparse.simpleparsegrammar. You can read a formal definition of the grammar used to define them at the end of this document.
Element Token |
Examples |
Effect |
---|---|---|
Character Range |
[ \t\n] |
Matches any 1 character in the given range |
String Literal |
“[“ |
Match the sequence of characters as given, allowing for special, octal and hexadecimal escape characters |
Case-insensitive String Literal |
c"this" |
Match the sequence of characters without regard to the case of
the target text, allowing for special, octal and hexadecimal escape
characters. Note: Case-insensitive literals are far slower than regular
literals! |
Name Reference |
statement |
Match the production whose name is specified. With 2.0, those productions may have been included from a library module or created by you and passed to the Parser object's initialiser. |
Sequential Groups |
(a,b,c,d) |
Match a sequence of element token children. Sequential groups have a lower precedence than FirstOf groups (below), so the group (a,b/c,d) is equivalent to (a,(b/c),d). |
FirstOf Groups |
(a/b/c/d) |
Match the first child which matches. Note that this is very different from system which parse all
children and choose the longest/most successful child-match. Sequential groups have a lower precedence than FirstOf groups, so the group (a,b/c,d) is equivalent to (a,(b/c),d). |
Error On Fail (Cut) |
! |
Used as a "token", the ErrorOnFail modifer (also called "cut" after Prolog's cut directive), declares that all subsequent items in the enclosing sequential group should be marked ErrorOnFail, as if the given ErrorOnFail modifier were applied to each one individually. Note: can only be used as a member of a Sequential group,
cannot be a member of a FirstOf group. See the section Modifiers/Operators
below for more details of the semantics surrounding this token. |
Both character classes and strings in simpleparse may use octal escaping (of 1 to 3 octal digits), hexadecimal escaping (2 digits), or Unicode escaping (4 or 8 digits) or standard Python character escapes (\a\b\f\n\r\t\v)
Strings may be either single or double quoted (but not triple quoted).
To include a "]" character in a character class, make it the first character of the class. Similarly, a literal "-" character must be either the first (after the optional "]" character) or the last character. The grammar definition for a character class is as follows:
'[', CHARBRACE?,CHARDASH?, (CHARRANGE/CHARNOBRACE)*, CHARDASH?, ']'
It is a common error to have declared something like [+-*] as a
character range (every character including and between + and *)
intending to specify [-+*] or [+*-] (three distinct characters).
Symptoms
include not matching '-' or matching characters that were not expected.
Each element token can have a prefix and/or a postfix modifier applied to it to alter how the engine treats a match of the “base” element token.
Modifier |
Example |
Meaning |
---|---|---|
- |
-"this" |
Match a single character at the current position if the entire
base element token doesn't match. If repeating, match any number of
characters until the base element token matches. |
(postfix)? |
"this"? |
Match the base element token, or if the base element token cannot match, match nothing. |
? (prefix) |
?"this" |
Match the base element token, then return to the previous
position (this is called "LookAhead" in the mx.TextTools
documentation). The
- modifier is applied "after" the lookahead, so that a lookahead on a
negative match equates to "is not followed by", while lookahead on
positive
matches equates to "is followed by". |
* |
"this"* |
Match the base element token from 0 to an infinite number of
times. |
+ |
"this"+ |
Match the base element token from 1 to an infinite number of
times. |
! |
"this"!
|
Consider a failure to match a SyntaxError (stop parsing, and
raise an exception). If the optional string-literal is included, it
specifies the message (template) to be used for the SyntaxError. You
can use %(varname)s formats to have the following variables substituted:
Note: the error_on_failure flag is ignored for optional items
(since they can never fail), and only raises an error if a repeating
non-optional production fails completely. |
Using the ErrorOnFail operator can be somewhat tricky. It is
often easier to use the "stand-alone" element-token version of
cut. Here's an example of use:
top := a/b/bp
a := 'a', !, ws, '=', ws, something
b := 'b', ws, '=', !, ws, something
bp := 'b', ws, '+=', !, ws, something
The production top can match an 'a =', a 'b =', or a 'b +=', but if
it encounters an 'a' without an '=' following, it will raise a syntax
error. For the two "b" productions, we don't want to raise a
Syntax error if the 'b' is not followed by the '=' or '+=' because the
grammar might match the other production, so we only cut off
back-tracking after the operator is found. Consider this
alternative:
top := a/b/bp
a := 'a'!, ws, '=', ws, something # BAD DON'T DO THIS!
b := 'b', ws, '=', !, ws, something
bp := 'b', ws, '+=', !, ws, something
This grammar does something very different (and somewhat
useless). When the "top" production goes to match, it tries to
match the "a" production, which tries to match the 'a' literal.
If literal isn't there, for instance, for the text 'b =', then the 'a'
literal will raise a SyntaxError. The result is that the "b" and
"bp" productions can never match with this grammar.
Each simpleparsegrammar is a series of declarations which define a production (rule) and bind it to a name which can be referenced by any production in the declaration set. Defining a production generally causes a result tuple to be created in the results tree (see below for what else can happen).
Here are some examples showing sample productions and the result trees they would generate.
s := "this" | ('s', start, stop, [] ) # no children |
s := them, those? | ('s', start, stop, [ ("them", start, stop, [...]), ("those",
start, stop, [...]) ] ) ('s', start, stop, [ ("them", start, stop, [...]) ) # optional value not there |
s := them*, those | ('s', start, stop, [ ("them", start, stop, [...]), ("them",
start, stop, [...]), ("those", start,
stop, [...]) ] ) ('s', start, stop, [ ("those", start, stop, [...]) ) # optional repeating value not present |
As a general rule, when a production matches, a match tuple is added to the result tree. The first item of this tuple is normally the name of the production (as a string), the second is the staring position of the match, the third is the stopping position of the match, and the fourth is a list of any child-production's result trees.
Using these features allows you to trim unwanted entries out of your results tree (which is good for efficiency, as the system doesn't need to store the result-trees). Using expanded productions can allow you to reduce the complexity of your grammars by factoring out common patterns and allowing them to be included in multiple productions without generating extraneous result-tuples in the results tree. Both of these methods still produce standard results trees so no special work is required to process the results tree. (There are methods described in Processing Results Trees which can generate non-standard result trees for special purposes).
Report Type |
Examples |
Return Value |
---|---|---|
Normal |
a := (b,c) |
('a', l, r, [ |
Unreported |
a := (b,c) |
('a', l, r, [ |
Expanded |
a := (b,c) |
('a', l, r, [ |
There are situations where the base parsing library simply isn't capable of accomplishing a particular matching task, or where it would be much something to define a method for matching a particular class of value than to define it with an EBNF grammar. In other instances, a particularly common pattern, such as floating point numbers or strings with standard (Python) escapes are wanted, and have been provided in a parsing library.
SimpleParse allows you to pass a set of “pre-built” element tokens to the Parser during initialization. These pre-built parsers can either be instances of simpleparse.objectgenerator.ElementToken, or raw mx.TextTools tag-tables. To use them, pass the Parser's initializer a list of two-tuples of (name, parserObject):
parser = Parser( declaration, "v", prebuilts = [
("word", parserObject1 ),
("white", parserObject2 ),
]
)
You can see a working example (which uses Python's re module to create a prebuilt parser) in examples/prebuilt_call.py .
SimpleParse 2.0 has introduced the ability to create libraries of common parsers for inclusion in other parsers. At present, the common package includes numbers, basic strings, the ISO date format, some character types, and some comment types. New contributions to the library are welcome.
In general, importing a particular module from the common package makes the production names from the module available in any subsequent grammar defined. Refer to the documentation for a particular module to see what production names are exported.
from simpleparse.common import strings, comments, numbers
Many of the standard common parser modules also include “Interpreter” objects which can be used to process the results tree generated by the mini-grammar into a Python-friendly form. See the documentation for the individual modules.
class MyParser( Parser ):
string = strings.StringInterpreter()
This is the formal definition of the SimpleParse 2.0 grammar. Although the grammar is functional (should parse any proper grammar), the grammar used during parser generation is a manually generated version found in the simpleparse.simpleparsegrammar module.
declaration = r"""declarationset := declaration+Up to index...
declaration := ts, (unreportedname/expandedname/name) ,ts,':',':'?,'=',seq_group
element_token := lookahead_indicator?, ts, negpos_indicator?,ts, (literal/range/group/name),ts, occurence_indicator?, ts, error_on_fail?
negpos_indicator := [-+]
lookahead_indicator := "?"
occurence_indicator := [+*?]
error_on_fail := "!", (ts,literal)?
>group< := '(',seq_group, ')'
seq_group := ts,(error_on_fail/fo_group/element_token),
(ts, seq_indicator, ts,
(error_on_fail/fo_group/element_token)
)*, ts
fo_group := element_token, (ts, fo_indicator, ts, element_token)+
# following two are likely something peoples might want to
# replace in many instances...
<fo_indicator> := "/"
<seq_indicator> := ','
unreportedname := '<', name, '>'
expandedname := '>', name, '<'
name := [a-zA-Z_],[a-zA-Z0-9_]*
<ts> := ( [ \011-\015]+ / comment )*
comment := '#',-'\n'*,'\n'
literal := literalDecorator?,("'",(CHARNOSNGLQUOTE/ESCAPEDCHAR)*,"'") / ('"',(CHARNODBLQUOTE/ESCAPEDCHAR)*,'"')
literalDecorator := [c]
range := '[',CHARBRACE?,CHARDASH?, (CHARRANGE/CHARNOBRACE)*, CHARDASH?,']'
CHARBRACE := ']'
CHARDASH := '-'
CHARRANGE := CHARNOBRACE, '-', CHARNOBRACE
CHARNOBRACE := ESCAPEDCHAR/CHAR
CHAR := -[]]
ESCAPEDCHAR := '\\',( SPECIALESCAPEDCHAR / ('x',HEXESCAPEDCHAR) / ("u",UNICODEESCAPEDCHAR_16) /("U",UNICODEESCAPEDCHAR_32)/OCTALESCAPEDCHAR )
SPECIALESCAPEDCHAR := [\\abfnrtv"']
OCTALESCAPEDCHAR := [0-7],[0-7]?,[0-7]?
HEXESCAPEDCHAR := [0-9a-fA-F],[0-9a-fA-F]
CHARNODBLQUOTE := -[\\"]+
CHARNOSNGLQUOTE := -[\\']+
UNICODEESCAPEDCHAR_16 := [0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F]
UNICODEESCAPEDCHAR_32 := [0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F],[0-9a-fA-F]
"""
A
Open Source project