.
- Added new examples nested.py and withAttribute.py to demonstrate
the new features.
- Added performance speedup to grammars using operatorPrecedence,
instigated by Stefan Reichör - thanks for the feedback, Stefan!
- Fixed bug/typo when deleting an element from a ParseResults by
using the element's results name.
- Fixed whitespace-skipping bug in wrapper classes (such as Group,
Suppress, Combine, etc.) and when using setDebug(), reported by
new pyparsing user dazzawazza on SourceForge, nice job!
- Added restriction to prevent defining Word or CharsNotIn expressions
with minimum length of 0 (should use Optional if this is desired),
and enhanced docstrings to reflect this limitation. Issue was
raised by Joey Tallieu, who submitted a patch with a slightly
different solution. Thanks for taking the initiative, Joey, and
please keep submitting your ideas!
- Fixed bug in makeHTMLTags that did not detect HTML tag attributes
with no '= value' portion (such as "
"), reported by
hamidh on the pyparsing wiki - thanks!
- Fixed minor bug in makeHTMLTags and makeXMLTags, which did not
accept whitespace in closing tags.
Version 1.4.7 - July, 2007
--------------------------
- NEW NOTATION SHORTCUT: ParserElement now accepts results names using
a notational shortcut, following the expression with the results name
in parentheses. So this:
stats = "AVE:" + realNum.setResultsName("average") + \
"MIN:" + realNum.setResultsName("min") + \
"MAX:" + realNum.setResultsName("max")
can now be written as this:
stats = "AVE:" + realNum("average") + \
"MIN:" + realNum("min") + \
"MAX:" + realNum("max")
The intent behind this change is to make it simpler to define results
names for significant fields within the expression, while keeping
the grammar syntax clean and uncluttered.
- Fixed bug when packrat parsing is enabled, with cached ParseResults
being updated by subsequent parsing. Reported on the pyparsing
wiki by Kambiz, thanks!
- Fixed bug in operatorPrecedence for unary operators with left
associativity, if multiple operators were given for the same term.
- Fixed bug in example simpleBool.py, corrected precedence of "and" vs.
"or" operations.
- Fixed bug in Dict class, in which keys were converted to strings
whether they needed to be or not. Have narrowed this logic to
convert keys to strings only if the keys are ints (which would
confuse __getitem__ behavior for list indexing vs. key lookup).
- Added ParserElement method setBreak(), which will invoke the pdb
module's set_trace() function when this expression is about to be
parsed.
- Fixed bug in StringEnd in which reading off the end of the input
string raises an exception - should match. Resolved while
answering a question for Shawn on the pyparsing wiki.
Version 1.4.6 - April, 2007
---------------------------
- Simplified constructor for ParseFatalException, to support common
exception construction idiom:
raise ParseFatalException, "unexpected text: 'Spanish Inquisition'"
- Added method getTokensEndLoc(), to be called from within a parse action,
for those parse actions that need both the starting *and* ending
location of the parsed tokens within the input text.
- Enhanced behavior of keepOriginalText so that named parse fields are
preserved, even though tokens are replaced with the original input
text matched by the current expression. Also, cleaned up the stack
traversal to be more robust. Suggested by Tim Arnold - thanks, Tim!
- Fixed subtle bug in which countedArray (and similar dynamic
expressions configured in parse actions) failed to match within Or,
Each, FollowedBy, or NotAny. Reported by Ralf Vosseler, thanks for
your patience, Ralf!
- Fixed Unicode bug in upcaseTokens and downcaseTokens parse actions,
scanString, and default debugging actions; reported (and patch submitted)
by Nikolai Zamkovoi, spasibo!
- Fixed bug when saving a tuple as a named result. The returned
token list gave the proper tuple value, but accessing the result by
name only gave the first element of the tuple. Reported by
Poromenos, nice catch!
- Fixed bug in makeHTMLTags/makeXMLTags, which failed to match tag
attributes with namespaces.
- Fixed bug in SkipTo when setting include=True, to have the skipped-to
tokens correctly included in the returned data. Reported by gunars on
the pyparsing wiki, thanks!
- Fixed typobug in OnceOnly.reset method, omitted self argument.
Submitted by eike welk, thanks for the lint-picking!
- Added performance enhancement to Forward class, suggested by
akkartik on the pyparsing Wiki discussion, nice work!
- Added optional asKeyword to Word constructor, to indicate that the
given word pattern should be matched only as a keyword, that is, it
should only match if it is within word boundaries.
- Added S-expression parser to examples directory.
- Added macro substitution example to examples directory.
- Added holaMundo.py example, excerpted from Marco Alfonso's blog -
muchas gracias, Marco!
- Modified internal cyclic references in ParseResults to use weakrefs;
this should help reduce the memory footprint of large parsing
programs, at some cost to performance (3-5%). Suggested by bca48150 on
the pyparsing wiki, thanks!
- Enhanced the documentation describing the vagaries and idiosyncrasies
of parsing strings with embedded tabs, and the impact on:
. parse actions
. scanString
. col and line helper functions
(Suggested by eike welk in response to some unexplained inconsistencies
between parsed location and offsets in the input string.)
- Cleaned up internal decorators to preserve function names,
docstrings, etc.
Version 1.4.5 - December, 2006
------------------------------
- Removed debugging print statement from QuotedString class. Sorry
for not stripping this out before the 1.4.4 release!
- A significant performance improvement, the first one in a while!
For my Verilog parser, this version of pyparsing is about double the
speed - YMMV.
- Added support for pickling of ParseResults objects. (Reported by
Jeff Poole, thanks Jeff!)
- Fixed minor bug in makeHTMLTags that did not recognize tag attributes
with embedded '-' or '_' characters. Also, added support for
passing expressions to makeHTMLTags and makeXMLTags, and used this
feature to define the globals anyOpenTag and anyCloseTag.
- Fixed error in alphas8bit, I had omitted the y-with-umlaut character.
- Added punc8bit string to complement alphas8bit - it contains all the
non-alphabetic, non-blank 8-bit characters.
- Added commonHTMLEntity expression, to match common HTML "ampersand"
codes, such as "<", ">", "&", " ", and """. This
expression also defines a results name 'entity', which can be used
to extract the entity field (that is, "lt", "gt", etc.). Also added
built-in parse action replaceHTMLEntity, which can be attached to
commonHTMLEntity to translate "<", ">", "&", " ", and
""" to "<", ">", "&", " ", and "'".
- Added example, htmlStripper.py, that strips HTML tags and scripts
from HTML pages. It also translates common HTML entities to their
respective characters.
Version 1.4.4 - October, 2006
-------------------------------
- Fixed traceParseAction decorator to also trap and record exception
returns from parse actions, and to handle parse actions with 0,
1, 2, or 3 arguments.
- Enhanced parse action normalization to support using classes as
parse actions; that is, the class constructor is called at parse
time and the __init__ function is called with 0, 1, 2, or 3
arguments. If passing a class as a parse action, the __init__
method must use one of the valid parse action parameter list
formats. (This technique is useful when using pyparsing to compile
parsed text into a series of application objects - see the new
example simpleBool.py.)
- Fixed bug in ParseResults when setting an item using an integer
index. (Reported by Christopher Lambacher, thanks!)
- Fixed whitespace-skipping bug, patch submitted by Paolo Losi -
grazie, Paolo!
- Fixed bug when a Combine contained an embedded Forward expression,
reported by cie on the pyparsing wiki - good catch!
- Fixed listAllMatches bug, when a listAllMatches result was
nested within another result. (Reported by don pasquale on
comp.lang.python, well done!)
- Fixed bug in ParseResults items() method, when returning an item
marked as listAllMatches=True
- Fixed bug in definition of cppStyleComment (and javaStyleComment)
in which '//' line comments were not continued to the next line
if the line ends with a '\'. (Reported by eagle-eyed Ralph
Corderoy!)
- Optimized re's for cppStyleComment and quotedString for better
re performance - also provided by Ralph Corderoy, thanks!
- Added new example, indentedGrammarExample.py, showing how to
define a grammar using indentation to show grouping (as Python
does for defining statement nesting). Instigated by an e-mail
discussion with Andrew Dalke, thanks Andrew!
- Added new helper operatorPrecedence (based on e-mail list discussion
with Ralph Corderoy and Paolo Losi), to facilitate definition of
grammars for expressions with unary and binary operators. For
instance, this grammar defines a 6-function arithmetic expression
grammar, with unary plus and minus, proper operator precedence,and
right- and left-associativity:
expr = operatorPrecedence( operand,
[("!", 1, opAssoc.LEFT),
("^", 2, opAssoc.RIGHT),
(oneOf("+ -"), 1, opAssoc.RIGHT),
(oneOf("* /"), 2, opAssoc.LEFT),
(oneOf("+ -"), 2, opAssoc.LEFT),]
)
Also added example simpleArith.py and simpleBool.py to provide
more detailed code samples using this new helper method.
- Added new helpers matchPreviousLiteral and matchPreviousExpr, for
creating adaptive parsing expressions that match the same content
as was parsed in a previous parse expression. For instance:
first = Word(nums)
matchExpr = first + ":" + matchPreviousLiteral(first)
will match "1:1", but not "1:2". Since this matches at the literal
level, this will also match the leading "1:1" in "1:10".
In contrast:
first = Word(nums)
matchExpr = first + ":" + matchPreviousExpr(first)
will *not* match the leading "1:1" in "1:10"; the expressions are
evaluated first, and then compared, so "1" is compared with "10".
- Added keepOriginalText parse action. Sometimes pyparsing's
whitespace-skipping leaves out too much whitespace. Adding this
parse action will restore any internal whitespace for a parse
expression. This is especially useful when defining expressions
for scanString or transformString applications.
- Added __add__ method for ParseResults class, to better support
using Python sum built-in for summing ParseResults objects returned
from scanString.
- Added reset method for the new OnlyOnce class wrapper for parse
actions (to allow a grammar to be used multiple times).
- Added optional maxMatches argument to scanString and searchString,
to short-circuit scanning after 'n' expression matches are found.
Version 1.4.3 - July, 2006
------------------------------
- Fixed implementation of multiple parse actions for an expression
(added in 1.4.2).
. setParseAction() reverts to its previous behavior, setting
one (or more) actions for an expression, overwriting any
action or actions previously defined
. new method addParseAction() appends one or more parse actions
to the list of parse actions attached to an expression
Now it is harder to accidentally append parse actions to an
expression, when what you wanted to do was overwrite whatever had
been defined before. (Thanks, Jean-Paul Calderone!)
- Simplified interface to parse actions that do not require all 3
parse action arguments. Very rarely do parse actions require more
than just the parsed tokens, yet parse actions still require all
3 arguments including the string being parsed and the location
within the string where the parse expression was matched. With this
release, parse actions may now be defined to be called as:
. fn(string,locn,tokens) (the current form)
. fn(locn,tokens)
. fn(tokens)
. fn()
The setParseAction and addParseAction methods will internally decorate
the provided parse actions with compatible wrappers to conform to
the full (string,locn,tokens) argument sequence.
- REMOVED SUPPORT FOR RETURNING PARSE LOCATION FROM A PARSE ACTION.
I announced this in March, 2004, and gave a final warning in the last
release. Now you can return a tuple from a parse action, and it will
be treated like any other return value (i.e., the tuple will be
substituted for the incoming tokens passed to the parse action,
which is useful when trying to parse strings into tuples).
- Added setFailAction method, taking a callable function fn that
takes the arguments fn(s,loc,expr,err) where:
. s - string being parsed
. loc - location where expression match was attempted and failed
. expr - the parse expression that failed
. err - the exception thrown
The function returns no values. It may throw ParseFatalException
if it is desired to stop parsing immediately.
(Suggested by peter21081944 on wikispaces.com)
- Added class OnlyOnce as helper wrapper for parse actions. OnlyOnce
only permits a parse action to be called one time, after which
all subsequent calls throw a ParseException.
- Added traceParseAction decorator to help debug parse actions.
Simply insert "@traceParseAction" ahead of the definition of your
parse action, and each invocation will be displayed, along with
incoming arguments, and returned value.
- Fixed bug when copying ParserElements using copy() or
setResultsName(). (Reported by Dan Thill, great catch!)
- Fixed bug in asXML() where token text contains <, >, and &
characters - generated XML now escapes these as <, > and
&. (Reported by Jacek Sieka, thanks!)
- Fixed bug in SkipTo() when searching for a StringEnd(). (Reported
by Pete McEvoy, thanks Pete!)
- Fixed "except Exception" statements, the most critical added as part
of the packrat parsing enhancement. (Thanks, Erick Tryzelaar!)
- Fixed end-of-string infinite looping on LineEnd and StringEnd
expressions. (Thanks again to Erick Tryzelaar.)
- Modified setWhitespaceChars to return self, to be consistent with
other ParserElement modifiers. (Suggested by Erick Tryzelaar.)
- Fixed bug/typo in new ParseResults.dump() method.
- Fixed bug in searchString() method, in which only the first token of
an expression was returned. searchString() now returns a
ParseResults collection of all search matches.
- Added example program removeLineBreaks.py, a string transformer that
converts text files with hard line-breaks into one with line breaks
only between paragraphs.
- Added example program listAllMatches.py, to illustrate using the
listAllMatches option when specifying results names (also shows new
support for passing lists to oneOf).
- Added example program linenoExample.py, to illustrate using the
helper methods lineno, line, and col, and returning objects from a
parse action.
- Added example program parseListString.py, to which can parse the
string representation of a Python list back into a true list. Taken
mostly from my PyCon presentation examples, but now with support
for tuple elements, too!
Version 1.4.2 - April 1, 2006 (No foolin'!)
-------------------------------------------
- Significant speedup from memoizing nested expressions (a technique
known as "packrat parsing"), thanks to Chris Lesniewski-Laas! Your
mileage may vary, but my Verilog parser almost doubled in speed to
over 600 lines/sec!
This speedup may break existing programs that use parse actions that
have side-effects. For this reason, packrat parsing is disabled when
you first import pyparsing. To activate the packrat feature, your
program must call the class method ParserElement.enablePackrat(). If
your program uses psyco to "compile as you go", you must call
enablePackrat before calling psyco.full(). If you do not do this,
Python will crash. For best results, call enablePackrat() immediately
after importing pyparsing.
- Added new helper method countedArray(expr), for defining patterns that
start with a leading integer to indicate the number of array elements,
followed by that many elements, matching the given expr parse
expression. For instance, this two-liner:
wordArray = countedArray(Word(alphas))
print wordArray.parseString("3 Practicality beats purity")[0]
returns the parsed array of words:
['Practicality', 'beats', 'purity']
The leading token '3' is suppressed, although it is easily obtained
from the length of the returned array.
(Inspired by e-mail discussion with Ralf Vosseler.)
- Added support for attaching multiple parse actions to a single
ParserElement. (Suggested by Dan "Dang" Griffith - nice idea, Dan!)
- Added support for asymmetric quoting characters in the recently-added
QuotedString class. Now you can define your own quoted string syntax
like "<>". To define
this custom form of QuotedString, your code would define:
dblAngleQuotedString = QuotedString('<<',endQuoteChar='>>')
QuotedString also supports escaped quotes, escape character other
than '\', and multiline.
- Changed the default value returned internally by Optional, so that
None can be used as a default value. (Suggested by Steven Bethard -
I finally saw the light!)
- Added dump() method to ParseResults, to make it easier to list out
and diagnose values returned from calling parseString.
- A new example, a search query string parser, submitted by Steven
Mooij and Rudolph Froger - a very interesting application, thanks!
- Added an example that parses the BNF in Python's Grammar file, in
support of generating Python grammar documentation. (Suggested by
J H Stovall.)
- A new example, submitted by Tim Cera, of a flexible parser module,
using a simple config variable to adjust parsing for input formats
that have slight variations - thanks, Tim!
- Added an example for parsing Roman numerals, showing the capability
of parse actions to "compile" Roman numerals into their integer
values during parsing.
- Added a new docs directory, for additional documentation or help.
Currently, this includes the text and examples from my recent
presentation at PyCon.
- Fixed another typo in CaselessKeyword, thanks Stefan Behnel.
- Expanded oneOf to also accept tuples, not just lists. This really
should be sufficient...
- Added deprecation warnings when tuple is returned from a parse action.
Looking back, I see that I originally deprecated this feature in March,
2004, so I'm guessing people really shouldn't have been using this
feature - I'll drop it altogether in the next release, which will
allow users to return a tuple from a parse action (which is really
handy when trying to reconstuct tuples from a tuple string
representation!).
Version 1.4.1 - February, 2006
------------------------------
- Converted generator expression in QuotedString class to list
comprehension, to retain compatibility with Python 2.3. (Thanks, Titus
Brown for the heads-up!)
- Added searchString() method to ParserElement, as an alternative to
using "scanString(instring).next()[0][0]" to search through a string
looking for a substring matching a given parse expression. (Inspired by
e-mail conversation with Dave Feustel.)
- Modified oneOf to accept lists of strings as well as a single string
of space-delimited literals. (Suggested by Jacek Sieka - thanks!)
- Removed deprecated use of Upcase in pyparsing test code. (Also caught by
Titus Brown.)
- Removed lstrip() call from Literal - too aggressive in stripping
whitespace which may be valid for some grammars. (Point raised by Jacek
Sieka). Also, made Literal more robust in the event of passing an empty
string.
- Fixed bug in replaceWith when returning None.
- Added cautionary documentation for Forward class when assigning a
MatchFirst expression, as in:
fwdExpr << a | b | c
Precedence of operators causes this to be evaluated as:
(fwdExpr << a) | b | c
thereby leaving b and c out as parseable alternatives. Users must
explicitly group the values inserted into the Forward:
fwdExpr << (a | b | c)
(Suggested by Scot Wilcoxon - thanks, Scot!)
Version 1.4 - January 18, 2006
------------------------------
- Added Regex class, to permit definition of complex embedded expressions
using regular expressions. (Enhancement provided by John Beisley, great
job!)
- Converted implementations of Word, oneOf, quoted string, and comment
helpers to utilize regular expression matching. Performance improvements
in the 20-40% range.
- Added QuotedString class, to support definition of non-standard quoted
strings (Suggested by Guillaume Proulx, thanks!)
- Added CaselessKeyword class, to streamline grammars with, well, caseless
keywords (Proposed by Stefan Behnel, thanks!)
- Fixed bug in SkipTo, when using an ignoreable expression. (Patch provided
by Anonymous, thanks, whoever-you-are!)
- Fixed typo in NoMatch class. (Good catch, Stefan Behnel!)
- Fixed minor bug in _makeTags(), using string.printables instead of
pyparsing.printables.
- Cleaned up some of the expressions created by makeXXXTags helpers, to
suppress extraneous <> characters.
- Added some grammar definition-time checking to verify that a grammar is
being built using proper ParserElements.
- Added examples:
. LAparser.py - linear algebra C preprocessor (submitted by Mike Ellis,
thanks Mike!)
. wordsToNum.py - converts word description of a number back to
the original number (such as 'one hundred and twenty three' -> 123)
. updated fourFn.py to support unary minus, added BNF comments
Version 1.3.3 - September 12, 2005
----------------------------------
- Improved support for Unicode strings that would be returned using
srange. Added greetingInKorean.py example, for a Korean version of
"Hello, World!" using Unicode. (Thanks, June Kim!)
- Added 'hexnums' string constant (nums+"ABCDEFabcdef") for defining
hexadecimal value expressions.
- NOTE: ===THIS CHANGE MAY BREAK EXISTING CODE===
Modified tag and results definitions returned by makeHTMLTags(),
to better support the looseness of HTML parsing. Tags to be
parsed are now caseless, and keys generated for tag attributes are
now converted to lower case.
Formerly, makeXMLTags("XYZ") would return a tag with results
name of "startXYZ", this has been changed to "startXyz". If this
tag is matched against '', the
matched keys formerly would be "Abc", "DEF", and "ghi"; keys are
now converted to lower case, giving keys of "abc", "def", and
"ghi". These changes were made to try to address the lax
case sensitivity agreement between start and end tags in many
HTML pages.
No changes were made to makeXMLTags(), which assumes more rigorous
parsing rules.
Also, cleaned up case-sensitivity bugs in closing tags, and
switched to using Keyword instead of Literal class for tags.
(Thanks, Steve Young, for getting me to look at these in more
detail!)
- Added two helper parse actions, upcaseTokens and downcaseTokens,
which will convert matched text to all uppercase or lowercase,
respectively.
- Deprecated Upcase class, to be replaced by upcaseTokens parse
action.
- Converted messages sent to stderr to use warnings module, such as
when constructing a Literal with an empty string, one should use
the Empty() class or the empty helper instead.
- Added ' ' (space) as an escapable character within a quoted
string.
- Added helper expressions for common comment types, in addition
to the existing cStyleComment (/*...*/) and htmlStyleComment
()
. dblSlashComment = // ... (to end of line)
. cppStyleComment = cStyleComment or dblSlashComment
. javaStyleComment = cppStyleComment
. pythonStyleComment = # ... (to end of line)
Version 1.3.2 - July 24, 2005
-----------------------------
- Added Each class as an enhanced version of And. 'Each' requires
that all given expressions be present, but may occur in any order.
Special handling is provided to group ZeroOrMore and OneOrMore
elements that occur out-of-order in the input string. You can also
construct 'Each' objects by joining expressions with the '&'
operator. When using the Each class, results names are strongly
recommended for accessing the matched tokens. (Suggested by Pradam
Amini - thanks, Pradam!)
- Stricter interpretation of 'max' qualifier on Word elements. If the
'max' attribute is specified, matching will fail if an input field
contains more than 'max' consecutive body characters. For example,
previously, Word(nums,max=3) would match the first three characters
of '0123456', returning '012' and continuing parsing at '3'. Now,
when constructed using the max attribute, Word will raise an
exception with this string.
- Cleaner handling of nested dictionaries returned by Dict. No
longer necessary to dereference sub-dictionaries as element [0] of
their parents.
=== NOTE: THIS CHANGE MAY BREAK SOME EXISTING CODE, BUT ONLY IF
PARSING NESTED DICTIONARIES USING THE LITTLE-USED DICT CLASS ===
(Prompted by discussion thread on the Python Tutor list, with
contributions from Danny Yoo, Kent Johnson, and original post by
Liam Clarke - thanks all!)
Version 1.3.1 - June, 2005
----------------------------------
- Added markInputline() method to ParseException, to display the input
text line location of the parsing exception. (Thanks, Stefan Behnel!)
- Added setDefaultKeywordChars(), so that Keyword definitions using a
custom keyword character set do not all need to add the keywordChars
constructor argument (similar to setDefaultWhitespaceChars()).
(suggested by rzhanka on the SourceForge pyparsing forum.)
- Simplified passing debug actions to setDebugAction(). You can now
pass 'None' for a debug action if you want to take the default
debug behavior. To suppress a particular debug action, you can pass
the pyparsing method nullDebugAction.
- Refactored parse exception classes, moved all behavior to
ParseBaseException, and the former ParseException is now a subclass of
ParseBaseException. Added a second subclass, ParseFatalException, as
a subclass of ParseBaseException. User-defined parse actions can raise
ParseFatalException if a data inconsistency is detected (such as a
begin-tag/end-tag mismatch), and this will stop all parsing immediately.
(Inspired by e-mail thread with Michele Petrazzo - thanks, Michelle!)
- Added helper methods makeXMLTags and makeHTMLTags, that simplify the
definition of XML or HTML tag parse expressions for a given tagname.
Both functions return a pair of parse expressions, one for the opening
tag (that is, '') and one for the closing tag ('').
The opening tagame also recognizes any attribute definitions that have
been included in the opening tag, as well as an empty tag (one with a
trailing '/', as in '' which is equivalent to '').
makeXMLTags uses stricter XML syntax for attributes, requiring that they
be enclosed in double quote characters - makeHTMLTags is more lenient,
and accepts single-quoted strings or any contiguous string of characters
up to the next whitespace character or '>' character. Attributes can
be retrieved as dictionary or attribute values of the returned results
from the opening tag.
- Added example minimath2.py, a refinement on fourFn.py that adds
an interactive session and support for variables. (Thanks, Steven Siew!)
- Added performance improvement, up to 20% reduction! (Found while working
with Wolfgang Borgert on performance tuning of his TTCN3 parser.)
- And another performance improvement, up to 25%, when using scanString!
(Found while working with Henrik Westlund on his C header file scanner.)
- Updated UML diagrams to reflect latest class/method changes.
Version 1.3 - March, 2005
----------------------------------
- Added new Keyword class, as a special form of Literal. Keywords
must be followed by whitespace or other non-keyword characters, to
distinguish them from variables or other identifiers that just
happen to start with the same characters as a keyword. For instance,
the input string containing "ifOnlyIfOnly" will match a Literal("if")
at the beginning and in the middle, but will fail to match a
Keyword("if"). Keyword("if") will match only strings such as "if only"
or "if(only)". (Proposed by Wolfgang Borgert, and Berteun Damman
separately requested this on comp.lang.python - great idea!)
- Added setWhitespaceChars() method to override the characters to be
skipped as whitespace before matching a particular ParseElement. Also
added the class-level method setDefaultWhitespaceChars(), to allow
users to override the default set of whitespace characters (space,
tab, newline, and return) for all subsequently defined ParseElements.
(Inspired by Klaas Hofstra's inquiry on the Sourceforge pyparsing
forum.)
- Added helper parse actions to support some very common parse
action use cases:
. replaceWith(replStr) - replaces the matching tokens with the
provided replStr replacement string; especially useful with
transformString()
. removeQuotes - removes first and last character from string enclosed
in quotes (note - NOT the same as the string strip() method, as only
a single character is removed at each end)
- Added copy() method to ParseElement, to make it easier to define
different parse actions for the same basic parse expression. (Note, copy
is implicitly called when using setResultsName().)
(The following changes were posted to CVS as Version 1.2.3 -
October-December, 2004)
- Added support for Unicode strings in creating grammar definitions.
(Big thanks to Gavin Panella!)
- Added constant alphas8bit to include the following 8-bit characters:
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ
- Added srange() function to simplify definition of Word elements, using
regexp-like '[A-Za-z0-9]' syntax. This also simplifies referencing
common 8-bit characters.
- Fixed bug in Dict when a single element Dict was embedded within another
Dict. (Thanks Andy Yates for catching this one!)
- Added 'formatted' argument to ParseResults.asXML(). If set to False,
suppresses insertion of whitespace for pretty-print formatting. Default
equals True for backward compatibility.
- Added setDebugActions() function to ParserElement, to allow user-defined
debugging actions.
- Added support for escaped quotes (either in \', \", or doubled quote
form) to the predefined expressions for quoted strings. (Thanks, Ero
Carrera!)
- Minor performance improvement (~5%) converting "char in string" tests
to "char in dict". (Suggested by Gavin Panella, cool idea!)
Version 1.2.2 - September 27, 2004
----------------------------------
- Modified delimitedList to accept an expression as the delimiter, instead
of only accepting strings.
- Modified ParseResults, to convert integer field keys to strings (to
avoid confusion with list access).
- Modified Combine, to convert all embedded tokens to strings before
combining.
- Fixed bug in MatchFirst in which parse actions would be called for
expressions that only partially match. (Thanks, John Hunter!)
- Fixed bug in fourFn.py example that fixes right-associativity of ^
operator. (Thanks, Andrea Griffini!)
- Added class FollowedBy(expression), to look ahead in the input string
without consuming tokens.
- Added class NoMatch that never matches any input. Can be useful in
debugging, and in very specialized grammars.
- Added example pgn.py, for parsing chess game files stored in Portable
Game Notation. (Thanks, Alberto Santini!)
Version 1.2.1 - August 19, 2004
-------------------------------
- Added SkipTo(expression) token type, simplifying grammars that only
want to specify delimiting expressions, and want to match any characters
between them.
- Added helper method dictOf(key,value), making it easier to work with
the Dict class. (Inspired by Pavel Volkovitskiy, thanks!).
- Added optional argument listAllMatches (default=False) to
setResultsName(). Setting listAllMatches to True overrides the default
modal setting of tokens to results names; instead, the results name
acts as an accumulator for all matching tokens within the local
repetition group. (Suggested by Amaury Le Leyzour - thanks!)
- Fixed bug in ParseResults, throwing exception when trying to extract
slice, or make a copy using [:]. (Thanks, Wilson Fowlie!)
- Fixed bug in transformString() when the input string contains 's
(Thanks, Rick Walia!).
- Fixed bug in returning tokens from un-Grouped And's, Or's and
MatchFirst's, where too many tokens would be included in the results,
confounding parse actions and returned results.
- Fixed bug in naming ParseResults returned by And's, Or's, and Match
First's.
- Fixed bug in LineEnd() - matching this token now correctly consumes
and returns the end of line "\n".
- Added a beautiful example for parsing Mozilla calendar files (Thanks,
Petri Savolainen!).
- Added support for dynamically modifying Forward expressions during
parsing.
Version 1.2 - 20 June 2004
--------------------------
- Added definition for htmlComment to help support HTML scanning and
parsing.
- Fixed bug in generating XML for Dict classes, in which trailing item was
duplicated in the output XML.
- Fixed release bug in which scanExamples.py was omitted from release
files.
- Fixed bug in transformString() when parse actions are not defined on the
outermost parser element.
- Added example urlExtractor.py, as another example of using scanString
and parse actions.
Version 1.2beta3 - 4 June 2004
------------------------------
- Added White() token type, analogous to Word, to match on whitespace
characters. Use White in parsers with significant whitespace (such as
configuration file parsers that use indentation to indicate grouping).
Construct White with a string containing the whitespace characters to be
matched. Similar to Word, White also takes optional min, max, and exact
parameters.
- As part of supporting whitespace-signficant parsing, added parseWithTabs()
method to ParserElement, to override the default behavior in parseString
of automatically expanding tabs to spaces. To retain tabs during
parsing, call parseWithTabs() before calling parseString(), parseFile() or
scanString(). (Thanks, Jean-Guillaume Paradis for catching this, and for
your suggestions on whitespace-significant parsing.)
- Added transformString() method to ParseElement, as a complement to
scanString(). To use transformString, define a grammar and attach a parse
action to the overall grammar that modifies the returned token list.
Invoking transformString() on a target string will then scan for matches,
and replace the matched text patterns according to the logic in the parse
action. transformString() returns the resulting transformed string.
(Note: transformString() does *not* automatically expand tabs to spaces.)
Also added scanExamples.py to the examples directory to show sample uses of
scanString() and transformString().
- Removed group() method that was introduced in beta2. This turns out NOT to
be equivalent to nesting within a Group() object, and I'd prefer not to sow
more seeds of confusion.
- Fixed behavior of asXML() where tags for groups were incorrectly duplicated.
(Thanks, Brad Clements!)
- Changed beta version message to display to stderr instead of stdout, to
make asXML() easier to use. (Thanks again, Brad.)
Version 1.2beta2 - 19 May 2004
------------------------------
- *** SIMPLIFIED API *** - Parse actions that do not modify the list of tokens
no longer need to return a value. This simplifies those parse actions that
use the list of tokens to update a counter or record or display some of the
token content; these parse actions can simply end without having to specify
'return toks'.
- *** POSSIBLE API INCOMPATIBILITY *** - Fixed CaselessLiteral bug, where the
returned token text was not the original string (as stated in the docs),
but the original string converted to upper case. (Thanks, Dang Griffith!)
**NOTE: this may break some code that relied on this erroneous behavior.
Users should scan their code for uses of CaselessLiteral.**
- *** POSSIBLE CODE INCOMPATIBILITY *** - I have renamed the internal
attributes on ParseResults from 'dict' and 'list' to '__tokdict' and
'__toklist', to avoid collisions with user-defined data fields named 'dict'
and 'list'. Any client code that accesses these attributes directly will
need to be modified. Hopefully the implementation of methods such as keys(),
items(), len(), etc. on ParseResults will make such direct attribute
accessess unnecessary.
- Added asXML() method to ParseResults. This greatly simplifies the process
of parsing an input data file and generating XML-structured data.
- Added getName() method to ParseResults. This method is helpful when
a grammar specifies ZeroOrMore or OneOrMore of a MatchFirst or Or
expression, and the parsing code needs to know which expression matched.
(Thanks, Eric van der Vlist, for this idea!)
- Added items() and values() methods to ParseResults, to better support using
ParseResults as a Dictionary.
- Added parseFile() as a convenience function to parse the contents of an
entire text file. Accepts either a file name or a file object. (Thanks
again, Dang!)
- Added group() method to And, Or, and MatchFirst, as a short-cut alternative
to enclosing a construct inside a Group object.
- Extended fourFn.py to support exponentiation, and simple built-in functions.
- Added EBNF parser to examples, including a demo where it parses its own
EBNF! (Thanks to Seo Sanghyeon!)
- Added Delphi Form parser to examples, dfmparse.py, plus a couple of
sample Delphi forms as tests. (Well done, Dang!)
- Another performance speedup, 5-10%, inspired by Dang! Plus about a 20%
speedup, by pre-constructing and cacheing exception objects instead of
constructing them on the fly.
- Fixed minor bug when specifying oneOf() with 'caseless=True'.
- Cleaned up and added a few more docstrings, to improve the generated docs.
Version 1.1.2 - 21 Mar 2004
---------------------------
- Fixed minor bug in scanString(), so that start location is at the start of
the matched tokens, not at the start of the whitespace before the matched
tokens.
- Inclusion of HTML documentation, generated using Epydoc. Reformatted some
doc strings to better generate readable docs. (Beautiful work, Ed Loper,
thanks for Epydoc!)
- Minor performance speedup, 5-15%
- And on a process note, I've used the unittest module to define a series of
unit tests, to help avoid the embarrassment of the version 1.1 snafu.
Version 1.1.1 - 6 Mar 2004
--------------------------
- Fixed critical bug introduced in 1.1, which broke MatchFirst(!) token
matching.
**THANK YOU, SEO SANGHYEON!!!**
- Added "from future import __generators__" to permit running under
pre-Python 2.3.
- Added example getNTPservers.py, showing how to use pyparsing to extract
a text pattern from the HTML of a web page.
Version 1.1 - 3 Mar 2004
-------------------------
- ***Changed API*** - While testing out parse actions, I found that the value
of loc passed in was not the starting location of the matched tokens, but
the location of the next token in the list. With this version, the location
passed to the parse action is now the starting location of the tokens that
matched.
A second part of this change is that the return value of parse actions no
longer needs to return a tuple containing both the location and the parsed
tokens (which may optionally be modified); parse actions only need to return
the list of tokens. Parse actions that return a tuple are deprecated; they
will still work properly for conversion/compatibility, but this behavior will
be removed in a future version.
- Added validate() method, to help diagnose infinite recursion in a grammar tree.
validate() is not 100% fool-proof, but it can help track down nasty infinite
looping due to recursively referencing the same grammar construct without some
intervening characters.
- Cleaned up default listing of some parse element types, to more closely match
ordinary BNF. Instead of the form :[contents-list], some changes
are:
. And(token1,token2,token3) is "{ token1 token2 token3 }"
. Or(token1,token2,token3) is "{ token1 ^ token2 ^ token3 }"
. MatchFirst(token1,token2,token3) is "{ token1 | token2 | token3 }"
. Optional(token) is "[ token ]"
. OneOrMore(token) is "{ token }..."
. ZeroOrMore(token) is "[ token ]..."
- Fixed an infinite loop in oneOf if the input string contains a duplicated
option. (Thanks Brad Clements)
- Fixed a bug when specifying a results name on an Optional token. (Thanks
again, Brad Clements)
- Fixed a bug introduced in 1.0.6 when I converted quotedString to use
CharsNotIn; I accidentally permitted quoted strings to span newlines. I have
fixed this in this version to go back to the original behavior, in which
quoted strings do *not* span newlines.
- Fixed minor bug in HTTP server log parser. (Thanks Jim Richardson)
Version 1.0.6 - 13 Feb 2004
----------------------------
- Added CharsNotIn class (Thanks, Lee SangYeong). This is the opposite of
Word, in that it is constructed with a set of characters *not* to be matched.
(This enhancement also allowed me to clean up and simplify some of the
definitions for quoted strings, cStyleComment, and restOfLine.)
- **MINOR API CHANGE** - Added joinString argument to the __init__ method of
Combine (Thanks, Thomas Kalka). joinString defaults to "", but some
applications might choose some other string to use instead, such as a blank
or newline. joinString was inserted as the second argument to __init__,
so if you have code that specifies an adjacent value, without using
'adjacent=', this code will break.
- Modified LineStart to recognize the start of an empty line.
- Added optional caseless flag to oneOf(), to create a list of CaselessLiteral
tokens instead of Literal tokens.
- Added some enhancements to the SQL example:
. Oracle-style comments (Thanks to Harald Armin Massa)
. simple WHERE clause
- Minor performance speedup - 5-15%
Version 1.0.5 - 19 Jan 2004
----------------------------
- Added scanString() generator method to ParseElement, to support regex-like
pattern-searching
- Added items() list to ParseResults, to return named results as a
list of (key,value) pairs
- Fixed memory overflow in asList() for deeply nested ParseResults (Thanks,
Sverrir Valgeirsson)
- Minor performance speedup - 10-15%
Version 1.0.4 - 8 Jan 2004
---------------------------
- Added positional tokens StringStart, StringEnd, LineStart, and LineEnd
- Added commaSeparatedList to pre-defined global token definitions; also added
commasep.py to the examples directory, to demonstrate the differences between
parsing comma-separated data and simple line-splitting at commas
- Minor API change: delimitedList does not automatically enclose the
list elements in a Group, but makes this the responsibility of the caller;
also, if invoked using 'combine=True', the list delimiters are also included
in the returned text (good for scoped variables, such as a.b.c or a::b::c, or
for directory paths such as a/b/c)
- Performance speed-up again, 30-40%
- Added httpServerLogParser.py to examples directory, as this is
a common parsing task
Version 1.0.3 - 23 Dec 2003
---------------------------
- Performance speed-up again, 20-40%
- Added Python distutils installation setup.py, etc. (thanks, Dave Kuhlman)
Version 1.0.2 - 18 Dec 2003
---------------------------
- **NOTE: Changed API again!!!** (for the last time, I hope)
+ Renamed module from parsing to pyparsing, to better reflect Python
linkage.
- Also added dictExample.py to examples directory, to illustrate
usage of the Dict class.
Version 1.0.1 - 17 Dec 2003
---------------------------
- **NOTE: Changed API!**
+ Renamed 'len' argument on Word.__init__() to 'exact'
- Performance speed-up, 10-30%
Version 1.0.0 - 15 Dec 2003
---------------------------
- Initial public release
Version 0.1.1 thru 0.1.17 - October-November, 2003
--------------------------------------------------
- initial development iterations:
- added Dict, Group
- added helper methods oneOf, delimitedList
- added helpers quotedString (and double and single), restOfLine, cStyleComment
- added MatchFirst as an alternative to the slower Or
- added UML class diagram
- fixed various logic bugs
././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1680539490.0718386
pyparsing-3.1.3/CODE_OF_CONDUCT.rst 0000644 0000000 0000000 00000006415 14412577542 013510 0 ustar 00 Contributor Covenant Code of Conduct
====================================
Our Pledge
----------
In the interest of fostering an open and welcoming environment,
we as contributors and maintainers pledge to making participation
in our project and our community a harassment-free experience for
everyone, regardless of age, body size, disability, ethnicity,
sex characteristics, gender identity and expression, level of
experience, education, socio-economic status, nationality,
personal appearance, race, religion, or sexual identity and
orientation.
Our Standards
-------------
Examples of behavior that contributes to creating a positive
environment include:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery and unwelcome sexual
attention or advances
- Trolling, insulting/derogatory comments, and personal or political
attacks
- Public or private harassment
- Publishing others’ private information, such as a physical or
electronic address, without explicit permission
- Other conduct which could reasonably be considered
inappropriate in a professional setting
Our Responsibilities
--------------------
Project maintainers are responsible for clarifying the standards
of acceptable behavior and are expected to take appropriate and
fair corrective action in response to any instances of
unacceptable behavior.
Project maintainers have the right and responsibility to remove,
edit, or reject comments, commits, code, wiki edits, issues, and
other contributions that are not aligned to this Code of Conduct,
or to ban temporarily or permanently any contributor for other
behaviors that they deem inappropriate, threatening, offensive,
or harmful.
Scope
-----
This Code of Conduct applies both within project spaces and in
public spaces when an individual is representing the project or
its community. Examples of representing a project or community
include using an official project e-mail address, posting via an
official social media account, or acting as an appointed
representative at an online or offline event. Representation of
a project may be further defined and clarified by project
maintainers.
Enforcement
-----------
Instances of abusive, harassing, or otherwise unacceptable
behavior may be reported by contacting the project team at
pyparsing@mail.com. All complaints will be reviewed and
investigated and will result in a response that is deemed
necessary and appropriate to the circumstances. The project team
is obligated to maintain confidentiality with regard to the
reporter of an incident. Further details of specific enforcement
policies may be posted separately.
Project maintainers who do not follow or enforce the Code of
Conduct in good faith may face temporary or permanent
repercussions as determined by other members of the project’s
leadership.
Attribution
-----------
This Code of Conduct is adapted from the `Contributor Covenant
`__, version 1.4, available
at
https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1680539490.0718386
pyparsing-3.1.3/CONTRIBUTING.md 0000644 0000000 0000000 00000015353 14412577542 012733 0 ustar 00 # CONTRIBUTING
Thank you for your interest in working on pyparsing! Pyparsing has become a popular module for creating simple
text parsing and data scraping applications. It has been incorporated in several widely-used packages, and is
often used by beginners as part of their first Python project.
## Raising questions / asking for help
If you have a question on using pyparsing, there are a number of resources available online.
- [StackOverflow](https://stackoverflow.com/questions/tagged/pyparsing) - about 10 years of SO questions and answers
can be searched on StackOverflow, tagged with the `pyparsing` tag. Note that some of the older posts will refer
to features in Python 2, or to versions and coding practices for pyparsing that have been replaced by newer classes
and coding idioms.
- [pyparsing sub-reddit](https://www.reddit.com/r/pyparsing/) - still very lightly attended, but open to anyone
wishing to post questions or links related to pyparsing. An alternative channel to StackOverflow for asking
questions.
- [online docs](https://pyparsing-docs.readthedocs.io/en/latest/index.html) and a separately maintained set of class
library docs [here](https://pyparsing-doc.neocities.org/) - These docs are auto-generated from the docstrings
embedded in the pyparsing classes, so they can also be viewed in the interactive Python console's and Jupyter
Notebook's `help` commands.
- [the pyparsing Wikispaces archive](https://github.com/pyparsing/wikispaces_archive) - Before hosting on GitHub,
pyparsing had a separate wiki on the wikispaces.com website. In 2018 this page was discontinued. The discussion
content archive has been reformatted into Markdown and can be viewed by year at the GitHub repository. Just as
with some of the older questions on StackOverflow, some of these older posts may reflect out-of-date pyparsing
and Python features.
- [submit an issue](https://github.com/pyparsing/pyparsing/issues) - If you have a problem with pyparsing that looks
like an actual bug, or have an idea for a feature to add to pyparsing please submit an issue on GitHub. Some
pyparsing behavior may be counter-intuitive, so try to review some of the other resources first, or some of the
other open and closed issues. Or post your question on SO or reddit. But don't wait until you are desperate and
frustrated - just ask! :)
## Submitting examples
If you have an example you wish to submit, please follow these guidelines.
- **License - Submitted example code must be available for distribution with the rest of pyparsing under the MIT
open source license.**
- Please follow PEP8 name and coding guidelines, and use the black formatter
to auto-format code.
- Examples should import pyparsing and the common namespace classes as:
import pyparsing as pp
# if necessary
ppc = pp.pyparsing_common
ppu = pp.pyparsing_unicode
- Submitted examples *must* be Python 3.6.8 or later compatible. (It is acceptable if examples use Python
features added after 3.6)
- Where possible use operators to create composite parse expressions:
expr = expr_a + expr_b | expr_c
instead of:
expr = pp.MatchFirst([pp.And([expr_a, expr_b]), expr_c])
Exception: if using a generator to create an expression:
import keyword
python_keywords = keyword.kwlist
any_keyword = pp.MatchFirst(pp.Keyword(kw)
for kw in python_keywords))
- Learn [Common Pitfalls When Writing Parsers](https://github.com/pyparsing/pyparsing/wiki/Common-Pitfalls-When-Writing-Parsers) and
how to avoid them when developing new examples.
- See additional notes under [Some Coding Points](#some-coding-points).
## Submitting changes
If you are considering proposing updates to pyparsing, please bear in mind the following guidelines.
Please review [_The Zen of Pyparsing_ and _The Zen of Pyparsing
Development_](https://github.com/pyparsing/pyparsing/wiki/Zen)
article on the pyparsing wiki, to get a general feel for the historical and future approaches to pyparsing's
design, and intended developer experience as an embedded DSL.
If you are using new Python features or changing usage of the Python stdlib, please check that they work as
intended on prior versions of Python (currently back to Python 3.6.8).
## Some design points
- Minimize additions to the module namespace. Over time, pyparsing's namespace has acquired a *lot* of names.
New features have been encapsulated into namespace classes to try to hold back the name flooding when importing
pyparsing.
- New operator overloads for ParserElement will need to show broad applicability, and should be related to
parser construction.
- Performance tuning should focus on parse time performance. Optimizing parser definition performance is secondary.
- New external dependencies will require substantial justification, and if included, will need to be guarded for
`ImportError`s raised if the external module is not installed.
## Some coding points
These coding styles are encouraged whether submitting code for core pyparsing or for submitting an example.
- PEP8 - pyparsing has historically been very non-compliant with many PEP8 guidelines, especially those regarding
name casing. I had just finished several years of Java and Smalltalk development, and camel case seemed to be the
future trend in coding styles. As of version 3.0.0, pyparsing is moving over to PEP8 naming, while maintaining
compatibility with existing parser code by defining synonyms using the legacy names. These names will be
retained until a future release (probably 4.0), to provide a migration path for current pyparsing-dependent
applications - DO NOT MODIFY OR REMOVE THESE NAMES.
See more information at the [PEP8 wiki page](https://github.com/pyparsing/pyparsing/wiki/PEP-8-planning).
- No backslashes for line continuations.
Continuation lines for expressions in ()'s should start with the continuing operator:
really_long_line = (something
+ some_other_long_thing
+ even_another_long_thing)
- Maximum line length is 120 characters. (Black will override this.)
- Changes to core pyparsing must be compatible back to Py3.6 without conditionalizing. Later Py3 features may be
used in examples by way of illustration.
- str.format() statements should use named format arguments (unless this proves to be a slowdown at parse time).
- List, tuple, and dict literals should include a trailing comma after the last element, which reduces changeset
clutter when another element gets added to the end.
- New features should be accompanied by updates to unitTests.py and a bullet in the CHANGES file.
- Do not modify pyparsing_archive.py. This file is kept as a reference artifact from when pyparsing was distributed
as a single source file.
././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1680539490.0718386
pyparsing-3.1.3/LICENSE 0000644 0000000 0000000 00000001777 14412577542 011514 0 ustar 00 Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
././@PaxHeader 0000000 0000000 0000000 00000000033 00000000000 010211 x ustar 00 27 mtime=1687654186.082798
pyparsing-3.1.3/README.rst 0000644 0000000 0000000 00000007052 14445707452 012167 0 ustar 00 PyParsing -- A Python Parsing Module
====================================
|Version| |Build Status| |Coverage| |License| |Python Versions| |Snyk Score|
Introduction
============
The pyparsing module is an alternative approach to creating and
executing simple grammars, vs. the traditional lex/yacc approach, or the
use of regular expressions. The pyparsing module provides a library of
classes that client code uses to construct the grammar directly in
Python code.
*[Since first writing this description of pyparsing in late 2003, this
technique for developing parsers has become more widespread, under the
name Parsing Expression Grammars - PEGs. See more information on PEGs*
`here `__
*.]*
Here is a program to parse ``"Hello, World!"`` (or any greeting of the form
``"salutation, addressee!"``):
.. code:: python
from pyparsing import Word, alphas
greet = Word(alphas) + "," + Word(alphas) + "!"
hello = "Hello, World!"
print(hello, "->", greet.parseString(hello))
The program outputs the following::
Hello, World! -> ['Hello', ',', 'World', '!']
The Python representation of the grammar is quite readable, owing to the
self-explanatory class names, and the use of '+', '|' and '^' operator
definitions.
The parsed results returned from ``parseString()`` is a collection of type
``ParseResults``, which can be accessed as a
nested list, a dictionary, or an object with named attributes.
The pyparsing module handles some of the problems that are typically
vexing when writing text parsers:
- extra or missing whitespace (the above program will also handle ``"Hello,World!"``, ``"Hello , World !"``, etc.)
- quoted strings
- embedded comments
The examples directory includes a simple SQL parser, simple CORBA IDL
parser, a config file parser, a chemical formula parser, and a four-
function algebraic notation parser, among many others.
Documentation
=============
There are many examples in the online docstrings of the classes
and methods in pyparsing. You can find them compiled into `online docs `__. Additional
documentation resources and project info are listed in the online
`GitHub wiki `__. An
entire directory of examples can be found `here `__.
License
=======
MIT License. See header of the `pyparsing __init__.py `__ file.
History
=======
See `CHANGES `__ file.
.. |Build Status| image:: https://github.com/pyparsing/pyparsing/actions/workflows/ci.yml/badge.svg
:target: https://github.com/pyparsing/pyparsing/actions/workflows/ci.yml
.. |Coverage| image:: https://codecov.io/gh/pyparsing/pyparsing/branch/master/graph/badge.svg
:target: https://codecov.io/gh/pyparsing/pyparsing
.. |Version| image:: https://img.shields.io/pypi/v/pyparsing?style=flat-square
:target: https://pypi.org/project/pyparsing/
:alt: Version
.. |License| image:: https://img.shields.io/pypi/l/pyparsing.svg?style=flat-square
:target: https://pypi.org/project/pyparsing/
:alt: License
.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/pyparsing.svg?style=flat-square
:target: https://pypi.org/project/python-liquid/
:alt: Python versions
.. |Snyk Score| image:: https://snyk.io//advisor/python/pyparsing/badge.svg
:target: https://snyk.io//advisor/python/pyparsing
:alt: pyparsing
././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1680539490.0718386
pyparsing-3.1.3/docs/CODE_OF_CONDUCT.rst 0000644 0000000 0000000 00000000044 14412577542 014430 0 ustar 00 .. include:: ../CODE_OF_CONDUCT.rst
././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1724550051.7920785
pyparsing-3.1.3/docs/HowToUsePyparsing.rst 0000644 0000000 0000000 00000176415 14662505644 015606 0 ustar 00 ==========================
Using the pyparsing module
==========================
:author: Paul McGuire
:address: ptmcg.pm+pyparsing@gmail.com
:revision: 3.1.3
:date: August, 2024
:copyright: Copyright |copy| 2003-2023 Paul McGuire.
.. |copy| unicode:: 0xA9
:abstract: This document provides how-to instructions for the
pyparsing library, an easy-to-use Python module for constructing
and executing basic text parsers. The pyparsing module is useful
for evaluating user-definable
expressions, processing custom application language commands, or
extracting data from formatted reports.
.. sectnum:: :depth: 4
.. contents:: :depth: 4
Note: While this content is still valid, there are more detailed
descriptions and extensive examples at the `online doc server
`_, and
in the online help for the various pyparsing classes and methods (viewable
using the Python interpreter's built-in ``help()`` function). You will also
find many example scripts in the `examples `_
directory of the pyparsing GitHub repo.
-----------
**Note**: *In pyparsing 3.0, many method and function names which were
originally written using camelCase have been converted to PEP8-compatible
snake_case. So ``parseString()`` is being renamed to ``parse_string()``,
``delimitedList`` to DelimitedList_, and so on. You may see the old
names in legacy parsers, and they will be supported for a time with
synonyms, but the synonyms will be removed in a future release.*
*If you are using this documentation, but working with a 2.4.x version of pyparsing,
you'll need to convert methods and arguments from the documented snake_case
names to the legacy camelCase names. In pyparsing 3.0.x and 3.1.x, both forms are
supported, but the legacy forms are deprecated; they will be dropped in a
future release.*
-----------
Steps to follow
===============
To parse an incoming data string, the client code must follow these steps:
1. First define the tokens and patterns to be matched, and assign
this to a program variable. Optional results names or parse
actions can also be defined at this time.
2. Call ``parse_string()``, ``scan_string()``, or ``search_string()``
on this variable, passing in the string to
be parsed. During the matching process, whitespace between
tokens is skipped by default (although this can be changed).
When token matches occur, any defined parse action methods are
called.
3. Process the parsed results, returned as a ParseResults_ object.
The ParseResults_ object can be accessed as if it were a list of
strings. Matching results may also be accessed as named attributes of
the returned results, if names are defined in the definition of
the token pattern, using ``set_results_name()``.
Hello, World!
-------------
The following complete Python program will parse the greeting ``"Hello, World!"``,
or any other greeting of the form ", !"::
import pyparsing as pp
greet = pp.Word(pp.alphas) + "," + pp.Word(pp.alphas) + "!"
for greeting_str in [
"Hello, World!",
"Bonjour, Monde!",
"Hola, Mundo!",
"Hallo, Welt!",
]:
greeting = greet.parse_string(greeting_str)
print(greeting)
The parsed tokens are returned in the following form::
['Hello', ',', 'World', '!']
['Bonjour', ',', 'Monde', '!']
['Hola', ',', 'Mundo', '!']
['Hallo', ',', 'Welt', '!']
Usage notes
-----------
- The pyparsing module can be used to interpret simple command
strings or algebraic expressions, or can be used to extract data
from text reports with complicated format and structure ("screen
or report scraping"). However, it is possible that your defined
matching patterns may accept invalid inputs. Use pyparsing to
extract data from strings assumed to be well-formatted.
- To keep up the readability of your code, use operators_ such as ``+``, ``|``,
``^``, and ``~`` to combine expressions. You can also combine
string literals with ``ParseExpressions`` - they will be
automatically converted to Literal_ objects. For example::
integer = Word(nums) # simple unsigned integer
variable = Char(alphas) # single letter variable, such as x, z, m, etc.
arith_op = one_of("+ - * /") # arithmetic operators
equation = variable + "=" + integer + arith_op + integer # will match "x=2+2", etc.
In the definition of ``equation``, the string ``"="`` will get added as
a ``Literal("=")``, but in a more readable way.
- The pyparsing module's default behavior is to ignore whitespace. This is the
case for 99% of all parsers ever written. This allows you to write simple, clean,
grammars, such as the above ``equation``, without having to clutter it up with
extraneous ``ws`` markers. The ``equation`` grammar will successfully parse all of the
following statements::
x=2+2
x = 2+2
a = 10 * 4
r= 1234/ 100000
Of course, it is quite simple to extend this example to support more elaborate expressions, with
nesting with parentheses, floating point numbers, scientific notation, and named constants
(such as ``e`` or ``pi``). See `fourFn.py `_,
and `simpleArith.py `_
included in the examples directory.
- To modify pyparsing's default whitespace skipping, you can use one or
more of the following methods:
- use the static method ``ParserElement.set_default_whitespace_chars``
to override the normal set of whitespace chars (``' \t\n'``). For instance
when defining a grammar in which newlines are significant, you should
call ``ParserElement.set_default_whitespace_chars(' \t')`` to remove
newline from the set of skippable whitespace characters. Calling
this method will affect all pyparsing expressions defined afterward.
- call ``leave_whitespace()`` on individual expressions, to suppress the
skipping of whitespace before trying to match the expression
- use ``Combine`` to require that successive expressions must be
adjacent in the input string. For instance, this expression::
real = Word(nums) + '.' + Word(nums)
will match "3.14159", but will also match "3 . 12". It will also
return the matched results as ['3', '.', '14159']. By changing this
expression to::
real = Combine(Word(nums) + '.' + Word(nums))
it will not match numbers with embedded spaces, and it will return a
single concatenated string '3.14159' as the parsed token.
- Repetition of expressions can be indicated using ``*`` or ``[]`` notation. An
expression may be multiplied by an integer value (to indicate an exact
repetition count), or indexed with a tuple, representing min and max repetitions
(with ``...`` representing no min or no max, depending whether it is the first or
second tuple element). See the following examples, where n is used to
indicate an integer value:
- ``expr*3`` is equivalent to ``expr + expr + expr``
- ``expr[2, 3]`` is equivalent to ``expr + expr + Opt(expr)``
- ``expr[n, ...]`` or ``expr[n,]`` is equivalent
to ``expr*n + ZeroOrMore(expr)`` (read as "at least n instances of expr")
- ``expr[... ,n]`` is equivalent to ``expr*(0, n)``
(read as "0 to n instances of expr")
- ``expr[...]``, ``expr[0, ...]`` and ``expr * ...`` are equivalent to ``ZeroOrMore(expr)``
- ``expr[1, ...]`` is equivalent to ``OneOrMore(expr)``
Note that ``expr[..., n]`` does not raise an exception if
more than n exprs exist in the input stream; that is,
``expr[..., n]`` does not enforce a maximum number of expr
occurrences. If this behavior is desired, then write
``expr[..., n] + ~expr``.
- ``[]`` notation will also accept a stop expression using ':' slice
notation:
- ``expr[...:end_expr]`` is equivalent to ``ZeroOrMore(expr, stop_on=end_expr)``
- MatchFirst_ expressions are matched left-to-right, and the first
match found will skip all later expressions within, so be sure
to define less-specific patterns after more-specific patterns.
If you are not sure which expressions are most specific, use Or_
expressions (defined using the ``^`` operator) - they will always
match the longest expression, although they are more
compute-intensive.
- Or_ expressions will evaluate all of the specified subexpressions
to determine which is the "best" match, that is, which matches
the longest string in the input data. In case of a tie, the
left-most expression in the Or_ list will win.
- If parsing the contents of an entire file, pass it to the
``parse_file`` method using::
expr.parse_file(source_file)
- ``ParseExceptions`` will report the location where an expected token
or expression failed to match. For example, if we tried to use our
"Hello, World!" parser to parse "Hello World!" (leaving out the separating
comma), we would get an exception, with the message::
pyparsing.ParseException: Expected "," (6), (1,7)
In the case of complex
expressions, the reported location may not be exactly where you
would expect. See more information under ParseException_ .
- Use the ``Group`` class to enclose logical groups of tokens within a
sublist. This will help organize your results into more
hierarchical form (the default behavior is to return matching
tokens as a flat list of matching input strings).
- Punctuation may be significant for matching, but is rarely of
much interest in the parsed results. Use the ``suppress()`` method
to keep these tokens from cluttering up your returned lists of
tokens. For example, DelimitedList_ matches a succession of
one or more expressions, separated by delimiters (commas by
default), but only returns a list of the actual expressions -
the delimiters are used for parsing, but are suppressed from the
returned output.
- Parse actions can be used to convert values from strings to
other data types (ints, floats, booleans, etc.).
- Results names are recommended for retrieving tokens from complex
expressions. It is much easier to access a token using its field
name than using a positional index, especially if the expression
contains optional elements. You can also shortcut
the ``set_results_name`` call::
stats = ("AVE:" + real_num.set_results_name("average")
+ "MIN:" + real_num.set_results_name("min")
+ "MAX:" + real_num.set_results_name("max"))
can more simply and cleanly be written as this::
stats = ("AVE:" + real_num("average")
+ "MIN:" + real_num("min")
+ "MAX:" + real_num("max"))
- Be careful when defining parse actions that modify global variables or
data structures (as in fourFn.py_), especially for low level tokens
or expressions that may occur within an And_ expression; an early element
of an And_ may match, but the overall expression may fail.
Classes
=======
All the pyparsing classes can be found in this
`UML class diagram <_static/pyparsingClassDiagram_3.0.9.jpg>`_.
Classes in the pyparsing module
-------------------------------
``ParserElement`` - abstract base class for all pyparsing classes;
methods for code to use are:
- ``parse_string(source_string, parse_all=False)`` - only called once, on the overall
matching pattern; returns a ParseResults_ object that makes the
matched tokens available as a list, and optionally as a dictionary,
or as an object with named attributes; if ``parse_all`` is set to True, then
``parse_string`` will raise a ParseException_ if the grammar does not process
the complete input string.
- ``parse_file(source_file)`` - a convenience function, that accepts an
input file object or filename. The file contents are passed as a
string to ``parse_string()``. ``parse_file`` also supports the ``parse_all`` argument.
- ``scan_string(source_string)`` - generator function, used to find and
extract matching text in the given source string; for each matched text,
returns a tuple of:
- matched tokens (packaged as a ParseResults_ object)
- start location of the matched text in the given source string
- end location in the given source string
``scan_string`` allows you to scan through the input source string for
random matches, instead of exhaustively defining the grammar for the entire
source text (as would be required with ``parse_string``).
- ``transform_string(source_string)`` - convenience wrapper function for
``scan_string``, to process the input source string, and replace matching
text with the tokens returned from parse actions defined in the grammar
(see set_parse_action_).
- ``search_string(source_string)`` - another convenience wrapper function for
``scan_string``, returns a list of the matching tokens returned from each
call to ``scan_string``.
- ``set_name(name)`` - associate a short descriptive name for this
element, useful in displaying exceptions and trace information
- ``run_tests(tests_string)`` - useful development and testing method on
expressions, to pass a multiline string of sample strings to test against
the expression. Comment lines (beginning with ``#``) can be inserted
and they will be included in the test output::
digits = Word(nums).set_name("numeric digits")
real_num = Combine(digits + '.' + digits)
real_num.run_tests("""\
# valid number
3.14159
# no integer part
.00001
# no decimal
101
# no decimal value
101.
""")
will print::
# valid number
3.14159
['3.14159']
# no integer part
.00001
^
FAIL: Expected numeric digits, found '.' (at char 0), (line:1, col:1)
# no decimal
101
^
FAIL: Expected ".", found end of text (at char 3), (line:1, col:4)
# no decimal value
101.
^
FAIL: Expected numeric digits, found end of text (at char 4), (line:1, col:5)
.. _set_results_name:
- ``set_results_name(string, list_all_matches=False)`` - name to be given
to tokens matching
the element; if multiple tokens within
a repetition group (such as ZeroOrMore_ or DelimitedList_) the
default is to return only the last matching token - if ``list_all_matches``
is set to True, then a list of all the matching tokens is returned.
``expr.set_results_name("key")`` can also be written ``expr("key")``
(a results name with a trailing '*' character will be
interpreted as setting ``list_all_matches`` to ``True``).
Note:
``set_results_name`` returns a *copy* of the element so that a single
basic element can be referenced multiple times and given
different names within a complex grammar.
.. _set_parse_action:
- ``set_parse_action(*fn)`` - specify one or more functions to call after successful
matching of the element; each function is defined as ``fn(s, loc, toks)``, where:
- ``s`` is the original parse string
- ``loc`` is the location in the string where matching started
- ``toks`` is the list of the matched tokens, packaged as a ParseResults_ object
Parse actions can have any of the following signatures::
fn(s: str, loc: int, tokens: ParseResults)
fn(loc: int, tokens: ParseResults)
fn(tokens: ParseResults)
fn()
Multiple functions can be attached to a ``ParserElement`` by specifying multiple
arguments to ``set_parse_action``, or by calling ``add_parse_action``. Calls to ``set_parse_action``
will replace any previously defined parse actions. ``set_parse_action(None)`` will clear
all previously defined parse actions.
Each parse action function can return a modified ``toks`` list, to perform conversion, or
string modifications. For brevity, ``fn`` may also be a
lambda - here is an example of using a parse action to convert matched
integer tokens from strings to integers::
int_number = Word(nums).set_parse_action(lambda s, l, t: [int(t[0])])
If ``fn`` modifies the ``toks`` list in-place, it does not need to return
and pyparsing will use the modified ``toks`` list.
If ``set_parse_action`` is called with an argument of ``None``, then this clears all parse actions
attached to that expression.
A nice short-cut for calling ``set_parse_action`` is to use it as a decorator::
identifier = Word(alphas, alphanums+"_")
@identifier.set_parse_action
def resolve_identifier(results: ParseResults):
return variable_values.get(results[0])
(Posted by @MisterMiyagi in this SO answer: https://stackoverflow.com/a/63031959/165216)
- ``add_parse_action`` - similar to ``set_parse_action``, but instead of replacing any
previously defined parse actions, will append the given action or actions to the
existing defined parse actions.
- ``add_condition`` - a simplified form of ``add_parse_action`` if the purpose
of the parse action is to simply do some validation, and raise an exception
if the validation fails. Takes a method that takes the same arguments,
but simply returns ``True`` or ``False``. If ``False`` is returned, an exception will be
raised.
- ``set_break(break_flag=True)`` - if ``break_flag`` is ``True``, calls ``pdb.set_break()``
as this expression is about to be parsed
- ``copy()`` - returns a copy of a ``ParserElement``; can be used to use the same
parse expression in different places in a grammar, with different parse actions
attached to each; a short-form ``expr()`` is equivalent to ``expr.copy()``
- ``leave_whitespace()`` - change default behavior of skipping
whitespace before starting matching (mostly used internally to the
pyparsing module, rarely used by client code)
- ``set_whitespace_chars(chars)`` - define the set of chars to be ignored
as whitespace before trying to match a specific ``ParserElement``, in place of the
default set of whitespace (space, tab, newline, and return)
- ``set_default_whitespace_chars(chars)`` - class-level method to override
the default set of whitespace chars for all subsequently created ParserElements
(including copies); useful when defining grammars that treat one or more of the
default whitespace characters as significant (such as a line-sensitive grammar, to
omit newline from the list of ignorable whitespace)
- ``suppress()`` - convenience function to suppress the output of the
given element, instead of wrapping it with a ``Suppress`` object.
- ``ignore(expr)`` - function to specify parse expression to be
ignored while matching defined patterns; can be called
repeatedly to specify multiple expressions; useful to specify
patterns of comment syntax, for example
- ``set_debug(flag=True)`` - function to enable/disable tracing output
when trying to match this element
- ``validate()`` - function to verify that the defined grammar does not
contain infinitely recursive constructs (``validate()`` is deprecated, and
will be removed in a future pyparsing release. Pyparsing now supports
left-recursive parsers, which this function attempted to catch.)
.. _parse_with_tabs:
- ``parse_with_tabs()`` - function to override default behavior of converting
tabs to spaces before parsing the input string; rarely used, except when
specifying whitespace-significant grammars using the White_ class.
- ``enable_packrat()`` - a class-level static method to enable a memoizing
performance enhancement, known as "packrat parsing". packrat parsing is
disabled by default, since it may conflict with some user programs that use
parse actions. To activate the packrat feature, your
program must call the class method ``ParserElement.enable_packrat()``. For best
results, call ``enable_packrat()`` immediately after importing pyparsing.
- ``enable_left_recursion()`` - a class-level static method to enable
pyparsing with left-recursive (LR) parsers. Similar to ``ParserElement.enable_packrat()``,
your program must call the class method ``ParserElement.enable_left_recursion()`` to
enable this feature. ``enable_left_recursion()`` uses a separate packrat cache, and so
is incompatible with ``enable_packrat()``.
Basic ParserElement subclasses
------------------------------
.. _Literal:
- ``Literal`` - construct with a string to be matched exactly
.. _CaselessLiteral:
- ``CaselessLiteral`` - construct with a string to be matched, but
without case checking; results are always returned as the
defining literal, NOT as they are found in the input string
.. _Keyword:
- ``Keyword`` - similar to Literal_, but must be immediately followed by
whitespace, punctuation, or other non-keyword characters; prevents
accidental matching of a non-keyword that happens to begin with a
defined keyword
- ``CaselessKeyword`` - similar to Keyword_, but with caseless matching
behavior as described in CaselessLiteral_.
.. _Word:
- ``Word`` - one or more contiguous characters; construct with a
string containing the set of allowed initial characters, and an
optional second string of allowed body characters; for instance,
a common ``Word`` construct is to match a code identifier - in C, a
valid identifier must start with an alphabetic character or an
underscore ('_'), followed by a body that can also include numeric
digits. That is, ``a``, ``i``, ``MAX_LENGTH``, ``_a1``, ``b_109_``, and
``plan9FromOuterSpace``
are all valid identifiers; ``9b7z``, ``$a``, ``.section``, and ``0debug``
are not. To
define an identifier using a ``Word``, use either of the following::
Word(alphas+"_", alphanums+"_")
Word(srange("[a-zA-Z_]"), srange("[a-zA-Z0-9_]"))
Pyparsing also provides pre-defined strings ``identchars`` and
``identbodychars`` so that you can also write::
Word(identchars, identbodychars)
If only one
string given, it specifies that the same character set defined
for the initial character is used for the word body; for instance, to
define an identifier that can only be composed of capital letters and
underscores, use one of::
``Word("ABCDEFGHIJKLMNOPQRSTUVWXYZ_")``
``Word(srange("[A-Z_]"))``
A ``Word`` may
also be constructed with any of the following optional parameters:
- ``min`` - indicating a minimum length of matching characters
- ``max`` - indicating a maximum length of matching characters
- ``exact`` - indicating an exact length of matching characters;
if ``exact`` is specified, it will override any values for ``min`` or ``max``
- ``as_keyword`` - indicating that preceding and following characters must
be whitespace or non-keyword characters
- ``exclude_chars`` - a string of characters that should be excluded from
init_chars and body_chars
Sometimes you want to define a word using all
characters in a range except for one or two of them; you can do this
with the ``exclude_chars`` argument. This is helpful if you want to define
a word with all ``printables`` except for a single delimiter character, such
as '.'. Previously, you would have to create a custom string to pass to Word.
With this change, you can just create ``Word(printables, exclude_chars='.')``.
- ``Char`` - a convenience form of ``Word`` that will match just a single character from
a string of matching characters::
single_digit = Char(nums)
- ``CharsNotIn`` - similar to Word_, but matches characters not
in the given constructor string (accepts only one string for both
initial and body characters); also supports ``min``, ``max``, and ``exact``
optional parameters.
- ``Regex`` - a powerful construct, that accepts a regular expression
to be matched at the current parse position; accepts an optional
``flags`` parameter, corresponding to the flags parameter in the ``re.compile``
method; if the expression includes named sub-fields, they will be
represented in the returned ParseResults_.
- ``QuotedString`` - supports the definition of custom quoted string
formats, in addition to pyparsing's built-in ``dbl_quoted_string`` and
``sgl_quoted_string``. ``QuotedString`` allows you to specify the following
parameters:
- ``quote_char`` - string of one or more characters defining the quote delimiting string
- ``esc_char`` - character to escape quotes, typically backslash (default=None)
- ``esc_quote`` - special quote sequence to escape an embedded quote string (such as SQL's "" to escape an embedded ") (default=None)
- ``multiline`` - boolean indicating whether quotes can span multiple lines (default=False)
- ``unquote_results`` - boolean indicating whether the matched text should be unquoted (default=True)
- ``end_quote_char`` - string of one or more characters defining the end of the quote delimited string (default=None => same as ``quote_char``)
.. _SkipTo:
- ``SkipTo`` - skips ahead in the input string, accepting any
characters up to the specified pattern; may be constructed with
the following optional parameters:
- ``include`` - if set to true, also consumes the match expression
(default is false)
- ``ignore`` - allows the user to specify patterns to not be matched,
to prevent false matches
- ``fail_on`` - if a literal string or expression is given for this argument, it defines an expression that
should cause the SkipTo_ expression to fail, and not skip over that expression
``SkipTo`` can also be written using ``...``::
LBRACE, RBRACE = map(Literal, "{}")
brace_expr = LBRACE + SkipTo(RBRACE) + RBRACE
# can also be written as
brace_expr = LBRACE + ... + RBRACE
.. _White:
- ``White`` - also similar to Word_, but matches whitespace
characters. Not usually needed, as whitespace is implicitly
ignored by pyparsing. However, some grammars are whitespace-sensitive,
such as those that use leading tabs or spaces to indicating grouping
or hierarchy. (If matching on tab characters, be sure to call
parse_with_tabs_ on the top-level parse element.)
- ``Empty`` - a null expression, requiring no characters - will always
match; useful for debugging and for specialized grammars
- ``NoMatch`` - opposite of ``Empty``, will never match; useful for debugging
and for specialized grammars
Expression subclasses
---------------------
.. _And:
- ``And`` - construct with a list of ``ParserElements``, all of which must
match for ``And`` to match; can also be created using the '+'
operator; multiple expressions can be ``Anded`` together using the '*'
operator as in::
ip_address = Word(nums) + ('.' + Word(nums)) * 3
A tuple can be used as the multiplier, indicating a min/max::
us_phone_number = Word(nums) + ('-' + Word(nums)) * (1,2)
A special form of ``And`` is created if the '-' operator is used
instead of the '+' operator. In the ``ip_address`` example above, if
no trailing '.' and ``Word(nums)`` are found after matching the initial
``Word(nums)``, then pyparsing will back up in the grammar and try other
alternatives to ``ip_address``. However, if ``ip_address`` is defined as::
strict_ip_address = Word(nums) - ('.'+Word(nums))*3
then no backing up is done. If the first ``Word(nums)`` of ``strict_ip_address``
is matched, then any mismatch after that will raise a ``ParseSyntaxException``,
which will halt the parsing process immediately. By careful use of the
'-' operator, grammars can provide meaningful error messages close to
the location where the incoming text does not match the specified
grammar.
.. _Or:
- ``Or`` - construct with a list of ``ParserElements``, any of which must
match for ``Or`` to match; if more than one expression matches, the
expression that makes the longest match will be used; can also
be created using the '^' operator
.. _MatchFirst:
- ``MatchFirst`` - construct with a list of ``ParserElements``, any of
which must match for ``MatchFirst`` to match; matching is done
left-to-right, taking the first expression that matches; can
also be created using the '|' operator
.. _Each:
- ``Each`` - similar to And_, in that all of the provided expressions
must match; however, ``Each`` permits matching to be done in any order;
can also be created using the '&' operator
- ``Opt`` - construct with a ``ParserElement``, but this element is
not required to match; can be constructed with an optional ``default`` argument,
containing a default string or object to be supplied if the given optional
parse element is not found in the input string; parse action will only
be called if a match is found, or if a default is specified.
An optional element ``expr`` can also be expressed using ``expr | ""``.
(``Opt`` was formerly named ``Optional``, but since the standard Python
library module ``typing`` now defines ``Optional``, the pyparsing class has
been renamed to ``Opt``. A compatibility synonym ``Optional`` is defined,
but will be removed in a future release.)
.. _ZeroOrMore:
- ``ZeroOrMore`` - similar to ``Opt``, but can be repeated; ``ZeroOrMore(expr)``
can also be written as ``expr[...]``.
.. _OneOrMore:
- ``OneOrMore`` - similar to ZeroOrMore_, but at least one match must
be present; ``OneOrMore(expr)`` can also be written as ``expr[1, ...]``.
.. _DelimitedList:
- ``DelimitedList`` - used for
matching one or more occurrences of ``expr``, separated by ``delim``.
By default, the delimiters are suppressed, so the returned results contain
only the separate list elements. Can optionally specify ``combine=True``,
indicating that the expressions and delimiters should be returned as one
combined value (useful for scoped variables, such as ``"a.b.c"``, or
``"a::b::c"``, or paths such as ``"a/b/c"``). Can also optionally specify ``min` and ``max``
restrictions on the length of the list, and
``allow_trailing_delim`` to accept a trailing delimiter at the end of the list.
.. _FollowedBy:
- ``FollowedBy`` - a lookahead expression, requires matching of the given
expressions, but does not advance the parsing position within the input string
.. _NotAny:
- ``NotAny`` - a negative lookahead expression, prevents matching of named
expressions, does not advance the parsing position within the input string;
can also be created using the unary '~' operator
.. _operators:
Expression operators
--------------------
- ``+`` - creates And_ using the expressions before and after the operator
- ``|`` - creates MatchFirst_ (first left-to-right match) using the expressions before and after the operator
- ``^`` - creates Or_ (longest match) using the expressions before and after the operator
- ``&`` - creates Each_ using the expressions before and after the operator
- ``*`` - creates And_ by multiplying the expression by the integer operand; if
expression is multiplied by a 2-tuple, creates an And_ of ``(min,max)``
expressions (similar to ``{min,max}`` form in regular expressions); if
``min`` is ``None``, interpret as ``(0,max)``; if ``max`` is ``None``, interpret as
``expr*min + ZeroOrMore(expr)``
- ``-`` - like ``+`` but with no backup and retry of alternatives
- ``~`` - creates NotAny_ using the expression after the operator
- ``==`` - matching expression to string; returns ``True`` if the string matches the given expression
- ``<<=`` - inserts the expression following the operator as the body of the
``Forward`` expression before the operator (``<<`` can also be used, but ``<<=`` is preferred
to avoid operator precedence misinterpretation of the pyparsing expression)
- ``...`` - inserts a SkipTo_ expression leading to the next expression, as in
``Keyword("start") + ... + Keyword("end")``.
- ``[min, max]`` - specifies repetition similar to ``*`` with ``min`` and ``max`` specified
as the minimum and maximum number of repetitions. ``...`` can be used in place of ``None``.
For example ``expr[...]`` is equivalent to ``ZeroOrMore(expr)``, ``expr[1, ...]`` is
equivalent to ``OneOrMore(expr)``, and ``expr[..., 3]`` is equivalent to "up to 3 instances
of ``expr``".
Positional subclasses
---------------------
- ``StringStart`` - matches beginning of the text
- ``StringEnd`` - matches the end of the text
- ``LineStart`` - matches beginning of a line (lines delimited by ``\n`` characters)
- ``LineEnd`` - matches the end of a line
- ``WordStart`` - matches a leading word boundary
- ``WordEnd`` - matches a trailing word boundary
Converter subclasses
--------------------
- ``Combine`` - joins all matched tokens into a single string, using
specified ``join_string`` (default ``join_string=""``); expects
all matching tokens to be adjacent, with no intervening
whitespace (can be overridden by specifying ``adjacent=False`` in constructor)
- ``Suppress`` - clears matched tokens; useful to keep returned
results from being cluttered with required but uninteresting
tokens (such as list delimiters)
Special subclasses
------------------
- ``Group`` - causes the matched tokens to be enclosed in a list;
useful in repeated elements like ZeroOrMore_ and OneOrMore_ to
break up matched tokens into groups for each repeated pattern
- ``Dict`` - like ``Group``, but also constructs a dictionary, using the
``[0]``'th elements of all enclosed token lists as the keys, and
each token list as the value
- ``Forward`` - placeholder token used to define recursive token
patterns; when defining the actual expression later in the
program, insert it into the ``Forward`` object using the ``<<=``
operator (see fourFn.py_ for an example).
- ``Tag`` - a non-parsing token that always matches, and inserts
a tag and value into the current parsed tokens; useful for adding
metadata or annotations to parsed results (see tag_example.py_).
Other classes
-------------
.. _ParseResults:
- ``ParseResults`` - class used to contain and manage the lists of tokens
created from parsing the input using the user-defined parse
expression. ``ParseResults`` can be accessed in a number of ways:
- as a list
- total list of elements can be found using ``len()``
- individual elements can be found using ``[0], [1], [-1],`` etc.,
or retrieved using slices
- elements can be deleted using ``del``
- the last element can be extracted and removed in a single operation
using ``pop()``, or any element can be extracted and removed
using ``pop(n)``
- a nested ParseResults_ can be created by using the pyparsing ``Group`` class
around elements in an expression::
Word(alphas) + Group(Word(nums)[...]) + Word(alphas)
will parse the string "abc 100 200 300 end" as::
['abc', ['100', '200', '300'], 'end']
If the ``Group`` is constructed using ``aslist=True``, the resulting tokens
will be a Python list instead of a ParseResults_. In this case, the returned value will
no longer support the extended features or methods of a ParseResults_.
- as a dictionary
- if ``set_results_name()`` is used to name elements within the
overall parse expression, then these fields can be referenced
as dictionary elements or as attributes
- the ``Dict`` class generates dictionary entries using the data of the
input text - in addition to ParseResults_ listed as ``[ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ]``
it also acts as a dictionary with entries defined as ``{ a1 : [ b1, c1, ... ] }, { a2 : [ b2, c2, ... ] }``;
this is especially useful when processing tabular data where the first column contains a key
value for that line of data; when constructed with ``asdict=True``, will
return an actual Python ``dict`` instead of a ParseResults_. In this case, the returned value will
no longer support the extended features or methods of a ParseResults_.
- list elements that are deleted using ``del`` will still be accessible by their
dictionary keys
- supports ``get()``, ``items()`` and ``keys()`` methods, similar to a dictionary
- a keyed item can be extracted and removed using ``pop(key)``. Here
``key`` must be non-numeric (such as a string), in order to use dict
extraction instead of list extraction.
- new named elements can be added (in a parse action, for instance), using the same
syntax as adding an item to a dict (``parse_results["X"] = "new item"``);
named elements can be removed using ``del parse_results["X"]``
- as a nested list
- results returned from the Group class are encapsulated within their
own list structure, so that the tokens can be handled as a hierarchical
tree
- as an object
- named elements can be accessed as if they were attributes of an object:
if an element is referenced that does not exist, it will return ``""``.
ParseResults_ can also be converted to an ordinary list of strings
by calling ``as_list()``. Note that this will strip the results of any
field names that have been defined for any embedded parse elements.
(The ``pprint`` module is especially good at printing out the nested contents
given by ``as_list()``.)
If a ParseResults_ is built with expressions that use results names (see _set_results_name) or
using the ``Dict`` class, then those names and values can be extracted as a Python
dict using ``as_dict()``.
Finally, ParseResults_ can be viewed by calling ``dump()``. ``dump()`` will first show
the ``as_list()`` output, followed by an indented structure listing parsed tokens that
have been assigned results names.
Here is sample code illustrating some of these methods::
>>> number = Word(nums)
>>> name = Combine(Word(alphas)[...], adjacent=False, join_string=" ")
>>> parser = number("house_number") + name("street_name")
>>> result = parser.parse_string("123 Main St")
>>> print(result)
['123', 'Main St']
>>> print(type(result))
>>> print(repr(result))
(['123', 'Main St'], {'house_number': ['123'], 'street_name': ['Main St']})
>>> result.house_number
'123'
>>> result["street_name"]
'Main St'
>>> result.as_list()
['123', 'Main St']
>>> result.as_dict()
{'house_number': '123', 'street_name': 'Main St'}
>>> print(result.dump())
['123', 'Main St']
- house_number: '123'
- street_name: 'Main St'
Exception classes and Troubleshooting
-------------------------------------
.. _ParseException:
- ``ParseException`` - exception returned when a grammar parse fails;
``ParseExceptions`` have attributes ``loc``, ``msg``, ``line``, ``lineno``, and ``column``; to view the
text line and location where the reported ParseException occurs, use::
except ParseException as err:
print(err.line)
print(" " * (err.column - 1) + "^")
print(err)
``ParseExceptions`` also have an ``explain()`` method that gives this same information::
except ParseException as err:
print(err.explain())
- ``RecursiveGrammarException`` - exception returned by ``validate()`` if
the grammar contains a recursive infinite loop, such as::
bad_grammar = Forward()
good_token = Literal("A")
bad_grammar <<= Opt(good_token) + bad_grammar
- ``ParseFatalException`` - exception that parse actions can raise to stop parsing
immediately. Should be used when a semantic error is found in the input text, such
as a mismatched XML tag.
- ``ParseSyntaxException`` - subclass of ``ParseFatalException`` raised when a
syntax error is found, based on the use of the '-' operator when defining
a sequence of expressions in an And_ expression.
- You can also get some insights into the parsing logic using diagnostic parse actions,
and ``set_debug()``, or test the matching of expression fragments by testing them using
``search_string()`` or ``scan_string()``.
- Use ``with_line_numbers`` from ``pyparsing_testing`` to display the input string
being parsed, with line and column numbers that correspond to the values reported
in set_debug() output::
import pyparsing as pp
ppt = pp.testing
data = """\
A
100"""
expr = pp.Word(pp.alphanums).set_name("word").set_debug()
print(ppt.with_line_numbers(data))
expr[...].parseString(data)
prints::
. 1
1234567890
1: A|
2: 100|
Match word at loc 3(1,4)
A
^
Matched word -> ['A']
Match word at loc 11(2,7)
100
^
Matched word -> ['100']
`with_line_numbers` has several options for displaying control characters, end-of-line
and space markers, Unicode symbols for control characters - these are documented in the
function's docstring.
- Diagnostics can be enabled using ``pyparsing.enable_diag`` and passing
one of the following enum values defined in ``pyparsing.Diagnostics``
- ``warn_multiple_tokens_in_named_alternation`` - flag to enable warnings when a results
name is defined on a MatchFirst_ or Or_ expression with one or more And_ subexpressions
- ``warn_ungrouped_named_tokens_in_collection`` - flag to enable warnings when a results
name is defined on a containing expression with ungrouped subexpressions that also
have results names
- ``warn_name_set_on_empty_Forward`` - flag to enable warnings when a ``Forward`` is defined
with a results name, but has no contents defined
- ``warn_on_parse_using_empty_Forward`` - flag to enable warnings when a ``Forward`` is
defined in a grammar but has never had an expression attached to it
- ``warn_on_assignment_to_Forward`` - flag to enable warnings when a ``Forward`` is defined
but is overwritten by assigning using ``'='`` instead of ``'<<='`` or ``'<<'``
- ``warn_on_multiple_string_args_to_oneof`` - flag to enable warnings when ``one_of`` is
incorrectly called with multiple str arguments
- ``enable_debug_on_named_expressions`` - flag to auto-enable debug on all subsequent
calls to ``ParserElement.set_name``
All warnings can be enabled by calling ``pyparsing.enable_all_warnings()``.
Sample::
import pyparsing as pp
pp.enable_all_warnings()
fwd = pp.Forward().set_results_name("recursive_expr")
>>> UserWarning: warn_name_set_on_empty_Forward: setting results name 'recursive_expr'
on Forward expression that has no contained expression
Warnings can also be enabled using the Python ``-W`` switch (using ``-Wd`` or
``-Wd:::pyparsing``) or setting a non-empty value to the environment variable
``PYPARSINGENABLEALLWARNINGS``. (If using ``-Wd`` for testing, but wishing to
disable pyparsing warnings, add ``-Wi:::pyparsing``.)
Miscellaneous attributes and methods
====================================
Helper methods
--------------
- ``counted_array(expr)`` - convenience function for a pattern where an list of
instances of the given expression are preceded by an integer giving the count of
elements in the list. Returns an expression that parses the leading integer,
reads exactly that many expressions, and returns the array of expressions in the
parse results - the leading integer is suppressed from the results (although it
is easily reconstructed by using len on the returned array).
- ``one_of(choices, caseless=False, as_keyword=False)`` - convenience function for quickly declaring an
alternative set of Literal_ expressions. ``choices`` can be passed as a list of strings
or as a single string of values separated by spaces. The values are sorted so that longer
matches are attempted first; this ensures that a short value does
not mask a longer one that starts with the same characters. If ``caseless=True``,
will create an alternative set of CaselessLiteral_ tokens. If ``as_keyword=True``,
``one_of`` will declare Keyword_ expressions instead of Literal_ expressions.
- ``dict_of(key, value)`` - convenience function for quickly declaring a
dictionary pattern of ``Dict(ZeroOrMore(Group(key + value)))``.
- ``make_html_tags(tag_str)`` and ``make_xml_tags(tag_str)`` - convenience
functions to create definitions of opening and closing tag expressions. Returns
a pair of expressions, for the corresponding ```` and ```` strings. Includes
support for attributes in the opening tag, such as ```` - attributes
are returned as named results in the returned ParseResults_. ``make_html_tags`` is less
restrictive than ``make_xml_tags``, especially with respect to case sensitivity.
- ``infix_notation(base_operand, operator_list)`` -
convenience function to define a grammar for parsing infix notation
expressions with a hierarchical precedence of operators. To use the ``infix_notation``
helper:
1. Define the base "atom" operand term of the grammar.
For this simple grammar, the smallest operand is either
an integer or a variable. This will be the first argument
to the ``infix_notation`` method.
2. Define a list of tuples for each level of operator
precedence. Each tuple is of the form
``(operand_expr, num_operands, right_left_assoc, parse_action)``, where:
- ``operand_expr`` - the pyparsing expression for the operator;
may also be a string, which will be converted to a Literal_; if
``None``, indicates an empty operator, such as the implied
multiplication operation between 'm' and 'x' in "y = mx + b".
- ``num_operands`` - the number of terms for this operator (must
be 1, 2, or 3)
- ``right_left_assoc`` is the indicator whether the operator is
right or left associative, using the pyparsing-defined
constants ``OpAssoc.RIGHT`` and ``OpAssoc.LEFT``.
- ``parse_action`` is the parse action to be associated with
expressions matching this operator expression (the
``parse_action`` tuple member may be omitted)
3. Call ``infix_notation`` passing the operand expression and
the operator precedence list, and save the returned value
as the generated pyparsing expression. You can then use
this expression to parse input strings, or incorporate it
into a larger, more complex grammar.
``infix_notation`` also supports optional arguments ``lpar`` and ``rpar``, to
parse groups with symbols other than "(" and ")". They may be passed as strings
(in which case they will be converted to ``Suppress`` objects, and suppressed from
the parsed results), or passed as pyparsing expressions, in which case they will
be kept as-is, and grouped with their contents.
For instance, to use "<" and ">" for grouping symbols, you could write::
expr = infix_notation(int_expr,
[
(one_of("+ -"), 2, opAssoc.LEFT),
],
lpar="<",
rpar=">"
)
expr.parse_string("3 - <2 + 11>")
returning::
[3, '-', [2, '+', 11]]
If the grouping symbols are to be retained, then pass them as pyparsing ``Literals``::
expr = infix_notation(int_expr,
[
(one_of("+ -"), 2, opAssoc.LEFT),
],
lpar=Literal("<"),
rpar=Literal(">")
)
expr.parse_string("3 - <2 + 11>")
returning::
[3, '-', ['<', [2, '+', 11], '>']]
- ``match_previous_literal`` and ``match_previous_expr`` - function to define an
expression that matches the same content
as was parsed in a previous parse expression. For instance::
first = Word(nums)
match_expr = first + ":" + match_previous_literal(first)
will match "1:1", but not "1:2". Since this matches at the literal
level, this will also match the leading "1:1" in "1:10".
In contrast::
first = Word(nums)
match_expr = first + ":" + match_previous_expr(first)
will *not* match the leading "1:1" in "1:10"; the expressions are
evaluated first, and then compared, so "1" is compared with "10".
- ``nested_expr(opener, closer, content=None, ignore_expr=quoted_string)`` - method for defining nested
lists enclosed in opening and closing delimiters.
- ``opener`` - opening character for a nested list (default="("); can also be a pyparsing expression
- ``closer`` - closing character for a nested list (default=")"); can also be a pyparsing expression
- ``content`` - expression for items within the nested lists (default=None)
- ``ignore_expr`` - expression for ignoring opening and closing delimiters (default=``quoted_string``)
If an expression is not provided for the content argument, the nested
expression will capture all whitespace-delimited content between delimiters
as a list of separate values.
Use the ``ignore_expr`` argument to define expressions that may contain
opening or closing characters that should not be treated as opening
or closing characters for nesting, such as ``quoted_string`` or a comment
expression. Specify multiple expressions using an Or_ or MatchFirst_.
The default is ``quoted_string``, but if no expressions are to be ignored,
then pass ``None`` for this argument.
- ``IndentedBlock(statement_expr, recursive=False, grouped=True)`` -
function to define an indented block of statements, similar to
indentation-based blocking in Python source code:
- ``statement_expr`` - the expression defining a statement that
will be found in the indented block; a valid ``IndentedBlock``
must contain at least 1 matching ``statement_expr``
- ``recursive`` - flag indicating whether the IndentedBlock can
itself contain nested sub-blocks of the same type of expression
(default=False)
- ``grouped`` - flag indicating whether the tokens returned from
parsing the IndentedBlock should be grouped (default=True)
.. _originalTextFor:
- ``original_text_for(expr)`` - helper function to preserve the originally parsed text, regardless of any
token processing or conversion done by the contained expression. For instance, the following expression::
full_name = Word(alphas) + Word(alphas)
will return the parse of "John Smith" as ['John', 'Smith']. In some applications, the actual name as it
was given in the input string is what is desired. To do this, use ``original_text_for``::
full_name = original_text_for(Word(alphas) + Word(alphas))
- ``ungroup(expr)`` - function to "ungroup" returned tokens; useful
to undo the default behavior of And_ to always group the returned tokens, even
if there is only one in the list.
- ``lineno(loc, string)`` - function to give the line number of the
location within the string; the first line is line 1, newlines
start new rows
- ``col(loc, string)`` - function to give the column number of the
location within the string; the first column is column 1,
newlines reset the column number to 1
- ``line(loc, string)`` - function to retrieve the line of text
representing ``lineno(loc, string)``; useful when printing out diagnostic
messages for exceptions
- ``srange(range_spec)`` - function to define a string of characters,
given a string of the form used by regexp string ranges, such as ``"[0-9]"`` for
all numeric digits, ``"[A-Z_]"`` for uppercase characters plus underscore, and
so on (note that ``range_spec`` does not include support for generic regular
expressions, just string range specs)
- ``trace_parse_action(fn)`` - decorator function to debug parse actions. Lists
each call, called arguments, and return value or exception
Helper parse actions
--------------------
- ``remove_quotes`` - removes the first and last characters of a quoted string;
useful to remove the delimiting quotes from quoted strings
- ``replace_with(repl_string)`` - returns a parse action that simply returns the
``repl_string``; useful when using ``transform_string``, or converting HTML entities, as in::
nbsp = Literal(" ").set_parse_action(replace_with(""))
- ``original_text_for``- restores any internal whitespace or suppressed
text within the tokens for a matched parse
expression. This is especially useful when defining expressions
for ``scan_string`` or ``transform_string`` applications.
- ``with_attribute(*args, **kwargs)`` - helper to create a validating parse action to be used with start tags created
with ``make_xml_tags`` or ``make_html_tags``. Use ``with_attribute`` to qualify a starting tag
with a required attribute value, to avoid false matches on common tags such as
```` or `` ``.
``with_attribute`` can be called with:
- keyword arguments, as in ``(class="Customer", align="right")``, or
- a list of name-value tuples, as in ``(("ns1:class", "Customer"), ("ns2:align", "right"))``
An attribute can be specified to have the special value
``with_attribute.ANY_VALUE``, which will match any value - use this to
ensure that an attribute is present but any attribute value is
acceptable.
- ``match_only_at_col(column_number)`` - a parse action that verifies that
an expression was matched at a particular column, raising a
``ParseException`` if matching at a different column number; useful when parsing
tabular data
- ``common.convert_to_integer()`` - converts all matched tokens to int
- ``common.convert_to_float()`` - converts all matched tokens to float
- ``common.convert_to_date()`` - converts matched token to a datetime.date
- ``common.convert_to_datetime()`` - converts matched token to a datetime.datetime
- ``common.strip_html_tags()`` - removes HTML tags from matched token
- ``common.downcase_tokens()`` - converts all matched tokens to lowercase
- ``common.upcase_tokens()`` - converts all matched tokens to uppercase
Common string and token constants
---------------------------------
- ``alphas`` - same as ``string.ascii_letters``
- ``nums`` - same as ``string.digits``
- ``alphanums`` - a string containing ``alphas + nums``
- ``alphas8bit`` - a string containing alphabetic 8-bit characters::
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ
.. _identchars:
- ``identchars`` - a string containing characters that are valid as initial identifier characters::
ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzª
µºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
- ``identbodychars`` - a string containing characters that are valid as identifier body characters (those following a
valid leading identifier character as given in identchars_)::
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzª
µ·ºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
- ``printables`` - same as ``string.printable``, minus the space (``' '``) character
- ``empty`` - a global ``Empty()``; will always match
- ``sgl_quoted_string`` - a string of characters enclosed in 's; may
include whitespace, but not newlines
- ``dbl_quoted_string`` - a string of characters enclosed in "s; may
include whitespace, but not newlines
- ``quoted_string`` - ``sgl_quoted_string | dbl_quoted_string``
- ``python_quoted_string`` - ``quoted_string | multiline quoted string``
- ``c_style_comment`` - a comment block delimited by ``'/*'`` and ``'*/'`` sequences; can span
multiple lines, but does not support nesting of comments
- ``html_comment`` - a comment block delimited by ``''`` sequences; can span
multiple lines, but does not support nesting of comments
- ``comma_separated_list`` - similar to DelimitedList_, except that the
list expressions can be any text value, or a quoted string; quoted strings can
safely include commas without incorrectly breaking the string into two tokens
- ``rest_of_line`` - all remaining printable characters up to but not including the next
newline
- ``common.integer`` - an integer with no leading sign; parsed token is converted to int
- ``common.hex_integer`` - a hexadecimal integer; parsed token is converted to int
- ``common.signed_integer`` - an integer with optional leading sign; parsed token is converted to int
- ``common.fraction`` - signed_integer '/' signed_integer; parsed tokens are converted to float
- ``common.mixed_integer`` - signed_integer '-' fraction; parsed tokens are converted to float
- ``common.real`` - real number; parsed tokens are converted to float
- ``common.sci_real`` - real number with optional scientific notation; parsed tokens are convert to float
- ``common.number`` - any numeric expression; parsed tokens are returned as converted by the matched expression
- ``common.fnumber`` - any numeric expression; parsed tokens are converted to float
- ``common.ieee_float`` - any floating-point literal (int, real number, infinity, or NaN), returned as float
- ``common.identifier`` - a programming identifier (follows Python's syntax convention of leading alpha or "_",
followed by 0 or more alpha, num, or "_")
- ``common.ipv4_address`` - IPv4 address
- ``common.ipv6_address`` - IPv6 address
- ``common.mac_address`` - MAC address (with ":", "-", or "." delimiters)
- ``common.iso8601_date`` - date in ``YYYY-MM-DD`` format
- ``common.iso8601_datetime`` - datetime in ``YYYY-MM-DDThh:mm:ss.s(Z|+-00:00)`` format; trailing seconds,
milliseconds, and timezone optional; accepts separating ``'T'`` or ``' '``
- ``common.url`` - matches URL strings and returns a ParseResults with named fields like those returned
by ``urllib.parse.urlparse()``
Unicode character sets for international parsing
------------------------------------------------
Pyparsing includes the ``unicode`` namespace that contains definitions for ``alphas``, ``nums``, ``alphanums``,
``identchars``, ``identbodychars``, and ``printables`` for character ranges besides 7- or 8-bit ASCII. You can
access them using code like the following::
import pyparsing as pp
ppu = pp.unicode
greek_word = pp.Word(ppu.Greek.alphas)
greek_word[...].parse_string("Καλημέρα κόσμε")
The following language ranges are defined.
========================== ================= ================================================
Unicode set Alternate names Description
-------------------------- ----------------- ------------------------------------------------
Arabic العربية
Chinese 中文
CJK Union of Chinese, Japanese, and Korean sets
Cyrillic кириллица
Devanagari देवनागरी
Greek Ελληνικά
Hangul Korean, 한국어
Hebrew עִברִית
Japanese 日本語 Union of Kanji, Katakana, and Hiragana sets
Japanese.Hiragana ひらがな
Japanese.Kanji 漢字
Japanese.Katakana カタカナ
Latin1 All Unicode characters up to code point 255
LatinA
LatinB
Thai ไทย
BasicMultilingualPlane BMP All Unicode characters up to code point 65535
========================== ================= ================================================
The base ``unicode`` class also includes definitions based on all Unicode code points up to ``sys.maxunicode``. This
set will include emojis, wingdings, and many other specialized and typographical variant characters.
Generating Railroad Diagrams
============================
Grammars are conventionally represented in what are called "railroad diagrams", which allow you to visually follow
the sequence of tokens in a grammar along lines which are a bit like train tracks. You might want to generate a
railroad diagram for your grammar in order to better understand it yourself, or maybe to communicate it to others.
Usage
-----
To generate a railroad diagram in pyparsing, you first have to install pyparsing with the ``diagrams`` extra.
To do this, just run ``pip install pyparsing[diagrams]``, and make sure you add ``pyparsing[diagrams]`` to any
``setup.py`` or ``requirements.txt`` that specifies pyparsing as a dependency.
Create your parser as you normally would. Then call ``create_diagram()``, passing the name of an output HTML file.::
street_address = Word(nums).set_name("house_number") + Word(alphas)[1, ...].set_name("street_name")
street_address.set_name("street_address")
street_address.create_diagram("street_address_diagram.html")
This will result in the railroad diagram being written to ``street_address_diagram.html``.
`create_diagrams` takes the following arguments:
- ``output_html`` (str or file-like object) - output target for generated diagram HTML
- ``vertical`` (int) - threshold for formatting multiple alternatives vertically instead of horizontally (default=3)
- ``show_results_names`` - bool flag whether diagram should show annotations for defined results names
- ``show_groups`` - bool flag whether groups should be highlighted with an unlabeled surrounding box
- ``embed`` - bool flag whether generated HTML should omit , , and tags to embed
the resulting HTML in an enclosing HTML source (such as PyScript HTML)
- ``head`` - str containing additional HTML to insert into the section of the generated code;
can be used to insert custom CSS styling
- ``body`` - str containing additional HTML to insert at the beginning of the section of the
generated code
Example
-------
You can view an example railroad diagram generated from `a pyparsing grammar for
SQL SELECT statements <_static/sql_railroad.html>`_ (generated from
`examples/select_parser.py <../examples/select_parser.py>`_).
Naming tip
----------
Parser elements that are separately named will be broken out as their own sub-diagrams. As a short-cut alternative
to going through and adding ``.set_name()`` calls on all your sub-expressions, you can use ``autoname_elements()`` after
defining your complete grammar. For example::
a = pp.Literal("a")
b = pp.Literal("b").set_name("bbb")
pp.autoname_elements()
`a` will get named "a", while `b` will keep its name "bbb".
Customization
-------------
You can customize the resulting diagram in a few ways.
To do so, run ``pyparsing.diagrams.to_railroad`` to convert your grammar into a form understood by the
`railroad-diagrams `_ module, and
then ``pyparsing.diagrams.railroad_to_html`` to convert that into an HTML document. For example::
from pyparsing.diagram import to_railroad, railroad_to_html
with open('output.html', 'w') as fp:
railroad = to_railroad(my_grammar)
fp.write(railroad_to_html(railroad))
This will result in the railroad diagram being written to ``output.html``
You can then pass in additional keyword arguments to ``pyparsing.diagrams.to_railroad``, which will be passed
into the ``Diagram()`` constructor of the underlying library,
`as explained here `_.
In addition, you can edit global options in the underlying library, by editing constants::
from pyparsing.diagram import to_railroad, railroad_to_html
import railroad
railroad.DIAGRAM_CLASS = "my-custom-class"
my_railroad = to_railroad(my_grammar)
These options `are documented here `_.
Finally, you can edit the HTML produced by ``pyparsing.diagrams.railroad_to_html`` by passing in certain keyword
arguments that will be used in the HTML template. Currently, these are:
- ``head``: A string containing HTML to use in the ```` tag. This might be a stylesheet or other metadata
- ``body``: A string containing HTML to use in the ```` tag, above the actual diagram. This might consist of a
heading, description, or JavaScript.
If you want to provide a custom stylesheet using the ``head`` keyword, you can make use of the following CSS classes:
- ``railroad-group``: A group containing everything relating to a given element group (ie something with a heading)
- ``railroad-heading``: The title for each group
- ``railroad-svg``: A div containing only the diagram SVG for each group
- ``railroad-description``: A div containing the group description (unused)
././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1680539490.0718386
pyparsing-3.1.3/docs/Makefile 0000644 0000000 0000000 00000001136 14412577542 013064 0 ustar 00 # Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = PyParsing
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) ././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1680539490.0718386
pyparsing-3.1.3/docs/_static/pyparsingClassDiagram_1.5.2.jpg 0000644 0000000 0000000 00000715562 14412577542 020605 0 ustar 00 JFIF C
$.' ",#(7),01444'9=82<.342 C
2!!22222222222222222222222222222222222222222222222222 {"
} !1AQa"q2#BR$3br
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
w !1AQaq"2B #3Rbr
$4%&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz ? Aloou;ޱwKQ@T(T :U|-@$چ"W.'8viaeVm|sPhM
n
OS[U>NBa<8 Q n ?ZoqvuroG"UgcAa$YSN4K}'ɎjV860WLkV w!_k= ֹ =Y8ռbz=v`y[CNXt+
=~[t^G[!}|j.ya?Ǿ, x眚M ֹ =G!_k= 6)7閡ƬLHc | |