Parse-RecDescent-1.967015/0000755000175000017500000000000013070731552014754 5ustar jtbraunjtbraunParse-RecDescent-1.967015/Changes0000644000175000017500000006362113070730372016256 0ustar jtbraunjtbraunRevision history for Parse-RecDescent 1.00 Mon Aug 11 13:17:13 1997 - original version 1.01 Mon Sep 8 18:04:14 EST 1997 - changed "quotemeta" to "quotemeta $_" in Text::Balanced to workaround bug in Perl 5.002 and 5.003 1.10 Tue Sep 30 14:51:49 EST 1997 - fixed fatal bug in tracing code - added m... format for regex tokens - added support for trailing modifiers (/.../gimsox) in regex tokens - added $thisline variable 1.20 Thu Oct 2 11:46:57 EST 1997 - fixed handling of trailing regex modifiers (now no whitespace allowed before between last delimiter and first modifier) - added trace diagnostic for failure/success of actions (demystifies failures caused by an action returning undef) - added context for "Matched rule..." trace - added a test so that an integer value (N>1) in the $::RD_TRACE variable now truncates (to N characters) all contexts reported in any trace - added "start-up" actions: actions appearing before the first rule were previously an error. They are now executed (once) at the start of the parser's namespace. 1.21 Sat Oct 4 17:12:23 EST 1997 - modified truncation of context in trace diagnostics (successful matches now report first N/2 and last N/2 chars, instead of first N) - fixed incorrect version number in Balanced.pm 1.22 Tue Oct 7 11:53:27 EST 1997 - fixed lurking trace problem with certain pathological regexes - fixed bug in generation of special namespaces (this was serious as it prevented the use of more than one alternation in a parser, as well as preventing the use of more than one parser in a script! 1.23 Fri Oct 17 10:15:22 EST 1997 - fixed error message generation for directives - fixed error message handling of empty productions - fixed handling of multi-line start-up actions - removed spurious debugging message for implicit subrule generation - changed naming scheme for alternations (pseudo-rule's name now describes location of alternation exactly) - added support for repetition specifiers on alternations. - Text::Balanced::extract_.... altered to honour the context in which they are called (see Balanced.pod for details). 1.24 - fixed minor problem in tracing code (context string now correctly reported for actions) - added explicit namespace declaration at beginning of generated code, to ensure that any "start code" is declared in the appropriate namespace. - fixed left recursion check on empty productions - added $::RD_AUTOSTUB flag and associated autostubbing behaviour (see new section - "Autostubbing" - in RecDescent.pod) - eliminated hierarchical precedence between $::RD_HINT and $::RD_TRACE. Enabling tracing now does _not_ automatically turn on hinting (although error and warning messages are still automatically enabled). - fixed bug in Text::Balanced. Division now correctly handled in code blocks. 1.25 Mon Feb 09 12:19:14 EST 1998 - Resynchronized numbering schemes for RecDescent and Balanced. 1.26 Wed Feb 25 13:52:15 EST 1998 - Fixed bug (and inefficiency) in directive. - Improved checking of regexes within grammars - Added subrule arguments (major change to internal grammar parser) - Added directive - started work on Compile() option (not complete yet - do not use!) - Made generated code "use strict" - Fixed bug which incorrectly warned against items following a directive. - Improved $thisline (added assignment and resync) - Fixed expectation messages for subrules - Rearranged tar file to co-operate with CPAN.pm 1.30 Fri May 22 05:52:06 1998 - Added directive - Added culling of productions starting with or - Cleaned up and improved format (and speed) of tracing code - Added warning levels - Optimized generation of token separator checking code. - Fixed bug encountered when parsing a literal string - Added $::RD_AUTOACTION to simplify standard actions at the end of each production 1.31 Fri May 22 06:11:26 1998 - Fixed bug in naming archive file 1.33 Fri May 22 06:15:26 1998 1.35 Wed Jun 24 09:57:02 1998 - Removed "foreach my $var ( @list )" constructs, which were biting users with perl 5.003 and earlier. - Fixed bug calling &Parse::RecDescent::toksepcode instead of &Parse::RecDescent::Rule::toksepcode - Changed grammar so that colons in rule definitions must appear on the same line as the rule name (as documented). Added an explicit error message when this is not the case. - Added $thiscolumn, which indicates the current column at any point in the parse. - Added $thisoffset, which indicates the absolute position in the original text string at any point in the parse. - Added $prevline and $prevcolumn, which indicate line and column of the last char of the last successfully matched item. - Added @itempos which provides: $itempos[$n]{offset}{from} $itempos[$n]{offset}{to} $itempos[$n]{line}{from} $itempos[$n]{line}{to} $itempos[$n]{column}{from} $itempos[$n]{column}{to} corresponding to each $item[$n]. See new documentation. - Several trivial lexical changes to make xemacs happy 1.41 Mon Aug 10 14:52:53 1998 - Enhanced POD in response to user feedback - Fixed subtle bug in Text::Balanced::extract_codeblock. It only bit when '(?)' appeared in implicit subrules - Added ability to pass args to the start-rule. 1.42 ???? - Added a test.pl - Modified behaviour of repetitions, so that the results of repeated subrules which succeed but don't consume are preserved (at least up to the minimal number of repetitions) - Fixed bug: @itempos now not incorrectly reset if grammar contained alternations - Fixed bug: Embedded unmatched '}' in regex tokens now works correctly - Miscellaneous tweaks to RecDescent.pod (e.g. updated meta-grammar) 1.43 Sat Aug 15 06:43:46 1998 - Resychronized Balanced.pm versions 1.50 Thu Aug 27 09:29:31 1998 - Changed parser to use extract_codeblock, so as to handle embedded '>' chars (e.g. {tmp}> ) - Added to allow deferred actions which are only executed if they are part of a rule that eventually succeeds. (see the new section under "Directives" in RecDescent.pod) - Fixed matching interpolated literals (was broken when literal contained pattern metacharacters) 1.51 Thu Aug 27 16:25:08 1998 - Maintenance release, rectifying bad soft links in the 1.50 distributions 1.60 Wed Oct 21 09:44:15 1998 [Never released] 1.61 Wed Oct 21 11:06:19 1998 - Added directive for supporting (future) token-stream parsing (see pod) - Added feature that data is consumed if passed as a reference (see pod) - Fixed bug in autogenerated errors: now ignores directives - Modified behaviour of directive so that deferred actions only executed if total parse succeeds (i.e. returns a defined value) - Made error messages "anti-deferred". That is, only those errors invoked in paths that eventually caused a parse to fail are printed - see documentation. - Miscellaneous fixes for Text::Balanced subroutines - Made private namespaces inherit Parse::RecDescent namespace (leads to more intuitive behaviour when calling methods of $thisparser) - *** NON-BACKWARDS COMPATIBLE CHANGE! *** Changed the behaviour of token separator specification. Now uses directive. See pod for new details. 1.62 Wed Dec 9 11:26:29 1998 - Reinstated missing $prevoffset variable - Corrected a possible bug with autoactions (thanks Mitchell) - *** IMPORTANT CHANGE *** $::RD_WARN now initialized 3 by default. Serious but non-fatal errors are automatically reported, unless you explicitly undefine $::RD_WARN. - Fixed bug in AUTOLOADing non-method subs defined in package Parse::RecDescent (thanks Mario) 1.63 Thu Mar 25 09:13:21 1999 - Rewrote documentation to replace the concept of a token separator with that of a token prefix. - Fixed obscure bug in replacement of rules containing implicit subrules (alternations). Thanks Craig. 1.64 Sun Mar 28 05:44:14 1999 - Synchronized with Text::Balanced version - Fixed obscure bug in the treatment of escaped backslashes in literal tokens. Thanks Matthew. 1.65 Wed May 19 12:35:05 1999 - Added and directives - Added level 2 warning and autoreject for lone directive in a production. 1.66 Fri Jul 2 13:35:06 1999 - Improved error message when an action fails to parse (Thanks Tuomas). - Allowed predefined subroutines in package Parse::RecDescent to be used as rules in grammars - Changed error report on bad regexes to level 3 warning, since compile-time interpolation failure may falsely invalidate regexes that would work at run-time. 1.70 Fri Oct 8 14:15:36 1999 - Clarified use of "eofile" idiom in POD file Clarified meaning of "free-form" in description of grammars Fixed examples, which were invalidated by earlier change in semantics of . (Thanks Knut). - Added grammar precompiler (see documentation) - Tweaked message for optimization. - Fixed bug when using '@' as a terminal (thanks Abigail) - Fixed nasty bug when $return set to zero - Added and directives (see documentation) 1.77 Mon Nov 22 06:11:32 1999 - IMPORTANT: Now requires 5.005 or better. - Added , , and directives (see documentation) - Added directive (see documentation) - Added %item hash (see documentation - thanks Stef!) - Tweaked internal parser in line with changes to Text::Balanced - Added directive to switch off recursion checking and other checks in stable grammars (see documentation). - Refined code generation WRT positional variables ($thisoffset, etc) - Added positional entries for %item (see documentation) - Fixed bug with (missing) start actions under precompiler (thanks Theo) 1.78 Mon Mar 20 12:03:17 2000 - Fixed error messages and documentation for Parse::RecDescent::Precompile (thanks Jim) - Moved demos to /demo subdirectory - Added tutorial in /tutorial subdirectory - Added directive - Added (s /sep/) notation (thanks Greg) - Circumvented \G and /gc calamities - Added more comprehensible error message when parser invoked through non-existent startrule (thanks Jeff) - Fixed serious bug with creating new parsers after existing ones had failed. (Thanks Paul) - Fixed problem with nested implicit subrules (thanks Marc). 1.79 Mon Aug 21 11:27:39 2000 - Pod tweak (thanks Abigail) - Documented need to use do{..} within some directives (thanks Paul) - Added Save method - Fixed bug that was preventing precompiled parsers being subsequently extended (thanks Jeff). - Changed keys used by %item. Now uses "named positionals" rather that simple positionals for non-subrule items (see documentation) - Added trimmer for surrounding whitespace in matchrules. - Squelched bug in (not) handling invalid directives (thanks John) 1.80 Sat Jan 20 05:02:35 2001 - Fixed Save so that saved parsers can still be used after saving (thanks Supun) - Fixed bug in line number tracking (thanks Theo) - Fixed bug in (s /pat/) shorthand (thanks Julien) - Improved docs on (thanks Steve) 1.90 Tue Mar 25 01:17:38 2003 - BACKWARDS INCOMPATIBLE CHANGE: The key of an %item entry for a repeated subrule now includes the repetition specifier. For example, in: sentence: subject verb word(s) the various matched items will be stored in $item{'subject'}, $item{'verb'}, and $item{'word(s)'} (i.e. *not* in $item{'word'}, as it would have been in previous versions of the module). (thanks Anthony) - Changed argument passing behaviour. If no arguments specified for subrule, it is now passed current rule's @arg instead. To get old (no arguments) behaviour use: subrule[] - Fixed bug in handling: failed to reject if $return had been set. (thanks Nick) - Added two useful demos of restructuring nested data (thanks Marc) - Fixed doc bug re use of // (thanks Randal) - Localized filehandles, like a good citizen should - Misc doc bug fixes (thanks all) - Fixed Text::Balance dependency in Makefile.PL (thanks Dominique) - Fixed bug that @itempos wasn't set up if referred to only in an autoaction. (thanks Eric) - Fixed truncation bug in tracing contexts - Dramatically improved speed of line counting (thanks John) - Made item(s) and item(s /,/) behave consistently wrt %item (thanks Marcel) - Added prototype handling - Added outer block markers for - Fixed multi-grammar precompilation (thanks Dominique) - Fixed numerous snafus in tutorial.html (thanks Ralph) - Added nesting level information to traces - Fixed resetting of $text after an rule. 1.91 Fri Mar 28 23:20:28 2003 - Updated Text::Balanced to fix various bugs 1.92 Wed Apr 2 04:45:37 2003 - Removed Text::Balanced from distribution (now a prereq only) 1.93 Wed Apr 2 22:25:14 2003 - Fixed fatal error with $tracelevel (thanks everyone) 1.94 Wed Apr 9 08:29:33 2003 - Replaced 'our' with 'use vars' to reinstate 5.005 compatibility. 1.95.1 Sun Sep 30 05:06:56 2007 - Updated README to reflect new status of Text::Balanced (i.e. required but not included in the distribution) - Fixed demo_logic (Thanks, Steve) - Fixed autopropagation of arguments into repetitions (Thanks, Luke) - Limited context info to 500 chars in traces (Thanks, Stephen) - Added option to select base namespace for autotreeing (thanks Gaal) - Improved formatting compatibility with 5.9.0 (thanks, David) - Added support for $::RD_HINT = 0 to turn off hinting entirely - Fixed bug in line handling - Returned $return variable to documented behaviour (i.e. setting return doesn't guarantee the match, only what is returned if the match succeeds) - Fixed nit in debugging of conditional regexes (thanks, Brian) - Moved expectation creation to compile-time (thanks François) - Removed redundant inheritances (i.e. @ISA elements) in internal namespace (thanks Hugo) - Added warning against C in actions to "GOTCHAS" documentation - Added demo_another_Cgrammar.pl (thanks Hendrik) - Documented parens (thanks Robin) - Removed incorrect meta-grammar from docs 1.96.0 Fri Oct 3 06:08:24 2008 - Propagated correct Changes file (thanks Matthew!) - Added: - 1.962.0 Tue Aug 25 19:45:15 2009 - Doc bug fix (thanks Christophe) - Fixed assymmetrical push/pop on @lines tracker (thanks Peter!) - Bumped sub-version number hugely to fix CPAN indexing (thanks Jerome) - Remove all occurrences of $& so we don't affect other regular expressions. - Perl 5.6.0 required for use of $+[0] and $-[0] for replacement of $&. 1.962.1 Thu Aug 27 21:39:30 2009 - Fixed subtle bug in leftop and rightop caused by removal of $& 1.963 Thu Jan 21 09:13:19 2010 - Fixed even subtler bug in leftop and rightop caused by removal of $& (Thanks Francesco) 1.964 Wed Feb 17 09:33:39 2010 - Fixed bug with undefined $1 when parsing literals (thanks Dan!) - Fixed premature namespace destruction bug with compiled grammars 1.964001 Tue Feb 23 15:15:18 2010 - Updated version number because versioning is a neverending nightmare in Perl 5 (thanks Matt) 1.965001 Sun Apr 4 15:00:10 2010 - Removed all references to /opts version of perl interpreter - Added Parse::RecDescent::redirect_reporting_to() to enable ERROR, TRACE, and TRACECONTEXT filehandles to be easily redirected. 1.966_000 Mon Jun 27 08:32:50 2011 - Patched unnamed subrules, so that they actual fail when not correctly specified (thanks Evgeniy!) - Added skip tests (thanks Flavio) - Added doc patch to make $skip semantics clearer (thanks Flavio!) - Fixed POD description of semantics (thanks Dirk!) 1.966_001 Mon Nov 14 10:34:52 2011 - Applied fix to restore skipped prefixes on match failure (thanks Jeremy!) - *** NON-BACKWARDS COMPATIBLE CHANGE! *** Removed formats to eliminate problems with filehandle duplication in forked environments. Removed redirect_reporting_to() in favor of using STDERR for all error/trace output. 1.966_002 Sun Jan 22 19:08:37 2012 - *** NON-BACKWARDS COMPATIBLE CHANGE! *** Change the caches for $prevline and $thisline to be local to the parser, rather than lexical vars in Parse::RecDescent. This prevents previously generated parsers from interfering with the line counts of later parsers. - removed trailing whitespace from all member files (cosmetic) - new tests, updated MANIFEST - Added Jeremy Braun as an author and current maintainer - update file permissions - fixed a few broken links in the pod 1.967001 Sat Jan 28 20:54:48 2012 - Addressed RT.cpan.org #28314: regex modifiers for tokens not honored during regex syntax check. (Thanks SADAHIRO!) - Fixed some POD typos - Added message on how to turn off "default" hint value in the default hint value ($::RD_HINT = 0). RT.cpan.org # #4898. - Modified _write_ERROR to call formline twice to avoid repeated $errorprefix. - Collected match tracing messages into a common function which takes into account positive/negative lookahead. - Addressed RT.cpan.org #74258: RD_AUTOSTUB does not work with precompiled parsers. (Thanks Yuri!) - Special-case $::RD_AUTOSTUB eq 1. $::RD_AUTOSTUB's value is ignored, and the unknown subrule 'rule' has a production "rule: 'rule'", rather than "rule: '1'". - Change Parse::RecDescent::new to call $self->Replace with only the grammar as an argument. That prevents the $compiling argument to new() from being incorrectly interpreted as $isimplicit. 1.967002 Sun Jan 29 19:13:04 2012 - Addressed RT.cpan.org #29966: regex with single backslash not recognized. Changed the definition of $TOKEN to handle backslashes inside of regex patterns. - Skip tests in t/reentry.t if Test::Warn isn't available. 1.967003 Mon Jan 30 07:24:53 2012 - Remove the 'use 5.10' from t/skip_dynamic.t, it runs fine against Perl 5.8.9. (Thanks Slaven!) 1.967_004 Tue Feb 7 22:11:11 2012 - Localize the OUT filehandle during Precompile. - Document the form of the directive. - Provide a simple test for the directive, t/autotree.t. Renamed basics.t to ensure it runs before autotree.t. - Allow a global directive that functions the same as modifying $Parse::RecDescent::skip prior to compiling a grammar. (Thanks Flavio!) - Require that the $file returned by caller() be eq '-', rather than merely starting with '-'. This allows execution of the following. (Thanks Christopher) perl -MParse::RecDescent -e 'print "$Parse::RecDescent::VERSION\n";' - Warn on empty productions followed by other productions. The empty production always matches, so following productions will never be reached. - *** NON-BACKWARDS COMPATIBLE CHANGE! *** A repetition directive such as 'id(s /,/)' correctly creates a temporary @item variable to hold the 'id's that are matched. That @item variable is them used to set the real $item[] entry for that repetition. The same treatment is now given to %item. Formerly, in a production like: id ',' id(s /,/) matched against: xxx, yyy, zzz The $item{id} entry which should be 'xxx' is overwritten by 'yyy' and then 'zzz' prior to the action being executed. Now 'yyy' and 'zzz' set $item{id}, but in the private %item, which goes out of scope once the repetition match completes. - ** EXPERIMENTAL ** When precompiling, optionally create a standalone parser by including most of the contents of Parse::RecDescent in the resulting Precompiled output. - Accept an optional $options hashref to Precompile, which can be used to specify $options->{-standalone}, which currently defaults to false. - The subroutines import, Precompile and Save are not included in the Precompile'd parser. - The included Parse::RecDescent module is renamed to Parse::RecDescent::_Runtime to avoid namespace conflicts with an installed and use'd Parse::RecDescent. - Add a new t/precompile.t to test precompilation. - Add a new $_FILENAME global to Parse::RecDescent to make it easy for the Precompile method to find the module. - Remove the prototype from _generate. It is not required, and it caused t/precompile.t (which ends up re-definiing a lot of Parse::RecDescent subroutines) to fail needlessly, as the calls to _generate in Replace and Extend normally do not see the prototype, but do when re-defined. - POD documentation for standalone parsers added. 1.967_005 Wed Feb 8 18:46:35 2012 - Added JTBRAUN@CPAN.org as author in Build.PL. - Added ExtUtils::MakeMaker build/configure version requirements. (RT.cpan.org #74787, Thanks POPEL!) 1.967006 Fri Feb 10 20:48:48 2012 - Bumped version to 1.967006 for non-development release. 1.967_007 Thu Feb 23 07:26:03 2012 - Revised ExtUtils::MakeMaker build/configure version requirements. (RT.cpan.org #74787, Thanks Paul!) - Revised Text::Balanced prereq to require version 1.95, necessary for t/skip.t to pass. (RT.cpan.org #74787, Thanks Paul!) - Removed unused version.pm prereq. - Fix a circular reference memory leak caused by the use of $AUTOLOAD in sub AUTOLOAD{} in the generated parser's namespace. Workaround documented in perl5 RT #110248. Workaround a circular reference memory leak in ISA documented by perl5 RT #92708. A parser's DESTROY() method redefines all subs before deleting the stash, to avoid circular references between subrules. (RT #53710, thanks Andreas!) - Parse::RecDescent::AUTOLOAD did not correctly handle initializing the line counting mechanism when a reference to a scalar was passed to the parser. (RT.cpan.org #27705, thanks Justin!) 1.967_008 Tue Mar 13 22:28:00 2012 - Restore old _parserepeat calling convention. Change a parser's DESTROY method to check for $self->{_not_precompiled} instead of $self->{_precompiled}. (Fix for RT #74593). 1.967009 Fri Mar 16 07:25:09 2012 - Bumped version to 1.967009 for non-development release. 1.967_010 Sun Jul 7 11:23:53 2013 - Base the standalone precompiled parser's runtime name on the parser's class, rather than use the fixed "Parse::RecDescent::_Runtime". This prevents "already defined" warnings when two standalone precompiled parsers are used. - Add support for allowing precompiled parsers to share a common runtime via the Precompile({-runtime_class}) option and the PrecompiledRuntime() function. - Warn on creation of Precompiled parsers that depend on Parse::RecDescent. - *** NON-BACKWARDS COMPATIBLE CHANGE! *** Change global the directive to eval similar to other directives, rather than being single-quoted in the resulting parser. 1.967011 Sat Sep 12 16:42:01 2015 - Correct some typos in the documentation. (RT.cpan.org #87185, thanks dsteinbrunner!) - Sort hash keys and rulenames when generating code. This keeps the output text for a given input text the same, reducing differences in automated builds. (RT.cpan.org #102160, thanks Reiner!) - Precompiled parsers now document which $Parse::RecDescent::VERSION was used to generate them. (RT.cpan.org #77001) 1.967012 Sun Sep 13 07:59:00 2015 - Reference Data::Dumper::Sortkeys, not SortKeys. Actually produces reproducible precompiled parsers now. (RT.cpan.org #107061, thanks Slaven!) 1.967013 Sun Sep 27 10:00:36 2015 - Wrap Data::Dumper->Dump() to localize some $Data::Dumper::VARS to control the dumped output. In particular, Data::Dumper::Terse=1 was reported to break parser generation. (RT.cpan.org #107355, thanks Sherrard!) 1.967014 Sat Apr 1 10:33:29 2017 - Add a newline to package declaration lines in precompiled parsers, to keep CPAN from indexing them. (RT.cpan.org #110404, thanks Martin!) - Provide repository and bugtracker entries in MYMETA.*. (RT.cpan.org #110403, thanks Martin!) - Update tests to handle '.' no longer being part of @INC in perl-5.26.0. (RT.cpan.org #120415, thanks Jim!) 1.967015 Tue Apr 4 07:38:07 2017 - Fix misuse of require to include MYMETA.pl, data is just included in both Makefile.PL and Build.PL nowB. (RT.cpan.org #120922, thanks Kent!) Parse-RecDescent-1.967015/demo/0000755000175000017500000000000013070731552015700 5ustar jtbraunjtbraunParse-RecDescent-1.967015/demo/demo_dot.pl0000755000175000017500000001144311710167512020033 0ustar jtbraunjtbraunFrom root@bib.adnintern.org Thu Oct 21 05:31:59 1999 Received: from gimli.cs.monash.edu.au (gimli.cs.monash.edu.au [130.194.64.60]) by indy05.cs.monash.edu.au (8.8.8/8.8.8) with ESMTP id FAA07658 for ; Thu, 21 Oct 1999 05:31:59 +1000 (EST) Received: from bib.adnintern.org ([194.242.172.1]) by gimli.cs.monash.edu.au (8.8.8/8.8.8) with ESMTP id FAA08901 for ; Thu, 21 Oct 1999 05:31:54 +1000 Received: (from root@localhost) by bib.adnintern.org (8.9.3/8.9.3) id VAA00889; Wed, 20 Oct 1999 21:29:05 +0200 Date: Wed, 20 Oct 1999 21:29:05 +0200 Message-Id: <199910201929.VAA00889@bib.adnintern.org> From: "Stéphane Payrard -- stef@adnaccess.com (06 60 95 82 69)" To: damian@cs.monash.edu.au Subject: parsing dot file Reply-to: stef@adnaccess.com Status: RO you may be interested at this rough cut at the dot grammar to enrichen your collection. dot is a language that describe graphs. I have problem with embedded \n in strings that I have not investigated yet http:/pub/web/www.research.att.com/sw/tools/graphviz __ stef #! /usr/bin/perl use Parse::RecDescent; my $DOTSRC="/var/src/gv1.5"; my $graph = "$DOTSRC/graphs"; # $::RD_HINT=1; # $::RD_AUTOSTUB=1; # $::RD_TRACE=1; # 'strict'(?) pas accepté $gram = <<'EOF'; graph: comment(?) strict(?) ( 'digraph' | 'graph' ) id '{' stmt_list '}' attr_stmt: m/(graph|node|edge)\s+/ attrs(?) subgraph: ( 'subgraph' id )(?) '{' stmt_list '}' | 'subgraph' id stmt_list: ( stmt semi(?) )(s?) stmt: id '=' id | attr_stmt | edge_stmt | subgraph | node_stmt | comment node_stmt: node_id attrs(?) node_id: id ( ':' id )(?) attrs: '[' ( id '=' value comma(?) )(s) ']' value: id | CONSTANT edge_stmt: ( node_id | subgraph ) edgeRHS(s) attrs(?) edgeRHS: edgeop ( node_id | subgraph ) edgeop: m|-[>-]| keyword: m/(subgraph|graph|node|edge)\s+/ id: ...!keyword /([\w\d][\w\d-]*)/ | STRING_LITERAL STRING_LITERAL: { extract_delimited($text,'"') } CONSTANT: /[+-]?(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?/ strict: 'strict' semi: ';' comma: ',' comment : m{\s* # optional whitespace // # comment delimiter [^\n]* # anything except a newline \n # then a newline }x | m{\s* # optional whitespace /\* # comment opener (?:[^*]+|\*(?!/))* # anything except */ \*/ # comment closer ([ \t]*)? # trailing blanks or tabs }x EOF $parser = new Parse::RecDescent($gram); $_='hashtable.dot'; for ( <$graph/directed/*.dot>, <$graph/undirected/*.dot> ) { undef $/; open F, $_; $txt = ; $ok = $parser->graph($txt); print $ok ? '': "not ", "OK $_\n"; } __END__ here is the score so far: OK /var/src/gv1.5/graphs/directed/KW91.dot OK /var/src/gv1.5/graphs/directed/NaN.dot OK /var/src/gv1.5/graphs/directed/abstract.dot OK /var/src/gv1.5/graphs/directed/alf.dot OK /var/src/gv1.5/graphs/directed/awilliams.dot OK /var/src/gv1.5/graphs/directed/clust.dot OK /var/src/gv1.5/graphs/directed/clust1.dot OK /var/src/gv1.5/graphs/directed/clust2.dot OK /var/src/gv1.5/graphs/directed/clust3.dot OK /var/src/gv1.5/graphs/directed/clust4.dot OK /var/src/gv1.5/graphs/directed/clust5.dot OK /var/src/gv1.5/graphs/directed/crazy.dot OK /var/src/gv1.5/graphs/directed/ctext.dot OK /var/src/gv1.5/graphs/directed/dfa.dot OK /var/src/gv1.5/graphs/directed/fig6.dot OK /var/src/gv1.5/graphs/directed/fsm.dot OK /var/src/gv1.5/graphs/directed/grammar.dot not OK /var/src/gv1.5/graphs/directed/hashtable.dot OK /var/src/gv1.5/graphs/directed/jcctree.dot OK /var/src/gv1.5/graphs/directed/jsort.dot OK /var/src/gv1.5/graphs/directed/ldbxtried.dot OK /var/src/gv1.5/graphs/directed/mike.dot OK /var/src/gv1.5/graphs/directed/newarrows.dot OK /var/src/gv1.5/graphs/directed/nhg.dot OK /var/src/gv1.5/graphs/directed/pgram.dot not OK /var/src/gv1.5/graphs/directed/pm2way.dot not OK /var/src/gv1.5/graphs/directed/pmpipe.dot not OK /var/src/gv1.5/graphs/directed/polypoly.dot not OK /var/src/gv1.5/graphs/directed/proc3d.dot OK /var/src/gv1.5/graphs/directed/records.dot OK /var/src/gv1.5/graphs/directed/rowe.dot OK /var/src/gv1.5/graphs/directed/shells.dot OK /var/src/gv1.5/graphs/directed/states.dot OK /var/src/gv1.5/graphs/directed/structs.dot OK /var/src/gv1.5/graphs/directed/train11.dot OK /var/src/gv1.5/graphs/directed/trapeziumlr.dot OK /var/src/gv1.5/graphs/directed/tree.dot not OK /var/src/gv1.5/graphs/directed/triedds.dot OK /var/src/gv1.5/graphs/directed/try.dot OK /var/src/gv1.5/graphs/directed/unix.dot OK /var/src/gv1.5/graphs/directed/unix2.dot OK /var/src/gv1.5/graphs/directed/viewfile.dot OK /var/src/gv1.5/graphs/directed/world.dot not OK /var/src/gv1.5/graphs/undirected/ER.dot OK /var/src/gv1.5/graphs/undirected/ngk10_4.dot OK /var/src/gv1.5/graphs/undirected/process.dot Parse-RecDescent-1.967015/demo/demo_LaTeXish.pl0000755000175000017500000000321111710167512020720 0ustar jtbraunjtbraunuse v5.10; use warnings; use Parse::RecDescent; $RD_TRACE = 1; my $parser = Parse::RecDescent->new(<<'EOGRAMMAR'); file: element(s) element: command | literal command: '\\' literal options(?) args(?) options: '[' option(s? /,/) ']' args: '{' element(s?) '}' option: /[^][\\$&%#_{}~^ \t\n,]+/ literal: /[^][\\$&%#_{}~^ \t\n]+/ EOGRAMMAR local $/; my $tree = $parser->file(); $tree->explain(0); sub file::explain { my ($self, $level) = @_; for (@{$self->{'element(s)'}}) { $_->explain($level); print "\n"; } } sub element::explain { my ($self, $level) = @_; ($self->{command}||$self->{literal})->explain($level) } sub command::explain { my ($self, $level) = @_; print "\t"x$level, "Command: $self->{literal}{__VALUE__}\n"; print "\t"x$level, "\tOptions:\n"; $self->{'options(?)'}[0]->explain($level+2) if @{$self->{'options(?)'}}; print "\t"x$level, "\tArgs:\n"; $self->{'args(?)'}[0]->explain($level+2) if @{$self->{'args(?)'}}; } sub options::explain { my ($self, $level) = @_; $_->explain($level) foreach @{$self->{'option(s?)'}}; } sub args::explain { my ($self, $level) = @_; $_->explain($level) foreach @{$self->{'element(s?)'}}; } sub option::explain { my ($self, $level) = @_; print "\t"x$level, "Option: $self->{__VALUE__}\n"; } sub literal::explain { my ($self, $level) = @_; print "\t"x$level, "Literal: $self->{__VALUE__}\n"; } __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Parse-RecDescent-1.967015/demo/demo_parsetree.pl0000755000175000017500000000123411710167512021234 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # PARSE LOGICAL EXPRESSIONS TO A "list of lists" PARSE TREE sub printtree { print " " x $_[0]; print "$_[1]:\n"; foreach ( @_[2..$#_] ) { if (ref($_)) { printtree($_[0]+1,@$_); } else { print " " x $_[0], "$_\n" } } print "\n"; } use Parse::RecDescent; $RD_AUTOACTION = q{ [@item] }; $grammar = q{ expr : disj disj : conj 'or' disj | conj conj : unary 'and' conj | unary unary : 'not' atom | '(' expr ')' | atom atom : /[a-z]+/i }; $parse = new Parse::RecDescent ($grammar); while () { my $tree = $parse->expr($_); printtree(0,@$tree) if $tree; } __DATA__ a and b and not c (c or d) and f Parse-RecDescent-1.967015/demo/demo_separators.pl0000755000175000017500000000142211663610756021436 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; undef $::RD_WARN; my $parse = Parse::RecDescent->new(<<'EOGRAMMAR'); {use Tie::Hash; } line: 0) > line: seplist[sep=>','] | seplist[sep=>':'] | seplist[sep=>" "] | { $max{item} } seplist: { $max{count} = @{$max{item} = $item[2]} if @{$item[2]} > $max{count}; } EOGRAMMAR while () { chomp; my $res = $parse->line($_); print '[', join('][', @$res), "]\n"; } __DATA__ c,o,m,m,a,s,e,p,a,r,a,t,e,d c:o:l:o:n:s:e:p:a:r:a:t:e:d s p a c e s e p a r a t e d m u:l t i,s:ep ar:a,ted m u:l,t i,s:ep ar:a,ted m:u:l,t i,s:ep ar:a,ted Parse-RecDescent-1.967015/demo/demo_metaRD.pm0000644000175000017500000000603111710167512020414 0ustar jtbraunjtbraun# Changes! /usr/local/bin/perl -w use Parse::RecDescent; local $/; my $parse = Parse::RecDescent->new(); my $grammar = <>; $tree = parse->grammar($grammar) or die "Bad grammar! No biscuit!"; print Data::Dumper->Dump([$tree]); __DATA__ grammar : prerule(s?) components(s?) /\Z/ component : rule | comment rule : "\n" identifier ":" production(s? /|/) production : items(s) item : lookahead(s?) simpleitem | directive | comment lookahead : '...' | '...!' # +'ve or -'ve lookahead simpleitem : subrule args(?) rep(?) # match another rule | terminal # match the next input | bracket args(?) # match alternative items | action # do something subrule : identifier # the name of the rule args : {extract_codeblock($_[0],'[]')} # just like a [...] array ref rep : '(' repspec ')' repspec : '?' # 0 or 1 times | 's?' # 0 or more times | 's' # 1 or more times | /(\d+)[.][.](/\d+)/ # $1 to $2 times | /[.][.](/\d*)/ # at most $1 times | /(\d*)[.][.])/ # at least $1 times terminal : /[/]([\][/]|[^/])*[/]/ # interpolated pattern | /"([\]"|[^"])*"/ # interpolated literal | /'([\]'|[^'])*'/ # uninterpolated literal action : # embedded Perl code bracket : '(' production(s? /|/) ')' # alternative subrules directive : '' # commit to production | '' # cancel commitment | '' # skip to newline | '' # skip | '' # fail this production | '' # fail if | '' # report an error | '' # report error as "" | '' # error only if committed | '' # " " " " | ']+/ '>' # define rule-local variable | '' # invoke rule named in string identifier : /[a-z]\w*/i # must start with alpha comment : /#[^\n]*/ # same as Perl pattern : {extract_bracketed($text,'<')} # allow embedded "<..>" condition : {extract_codeblock($text,'{<')} # full Perl expression string : {extract_variable($text)} # any Perl variable | {extract_quotelike($text)} # or quotelike string | {extract_bracketed($text,'<')} # or balanced brackets Parse-RecDescent-1.967015/demo/demo_restructure_easy.pl0000755000175000017500000000233311710167512022653 0ustar jtbraunjtbraun#!/usr/local/bin/perl -w # CONVERT FROM ONE EXTERNAL STRUCTURE TO A # SLIGHTLY DIFFERENT INTERNAL STRUCTURE # # LOCALIZED RULEVARS ARE ALWAYS EASIEST WHEN CHANGING STRUCTURES # SEE demo_restructure_painful.pl FOR ANOTHER APPROACH THAT SHOWS WHY use strict; use Parse::RecDescent; use Data::Dumper; my $grammar = q( file: file: section(s) { $file } section: header '{' body '}' { $file->{$item[1]} = $item[3] } header: 'Domain=' /.+/ body: body: line(s) { $body } line: lineA | lineB lineA: /[^\W_]+/ '=' /.+/ { $body->{$item[1]} = $item[3] } lineB: /[^\W_]+/ '_' /[^\W_]+/ '=' /.+/ { $body->{$item[1]}{$item[3]} = $item[5] } ); my $parser = Parse::RecDescent->new($grammar); my $text; my @text = ; foreach (@text) { next if /^\#/; # Strip comments $text .= $_; } my $f = $parser->file($text); print Dumper ($f); __DATA__ # # Domain=domain1 { P1_Name=n1 P1_Address=host1:port1 P2_Name=n2 P2_Address=host2:port2 } Domain=domain2 { f1=v1 f2=v2a v2b #comment } Parse-RecDescent-1.967015/demo/demo_OOautoparsetree.pl0000755000175000017500000000376111710167512022372 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # PARSE AND EVALUATE LOGICAL EXPRESSIONS WITH A AUTOGENERATED OO PARSE TREE use Parse::RecDescent; use Data::Dumper; sub trace_only { my ($pattern) = @_; $RD_TRACE=1; my $_real_trace = \&Parse::RecDescent::_trace; *Parse::RecDescent::_trace = sub ($;$$$) { my ($msg, $context, $rulename, $level) = @_; return if $msg !~ $pattern; goto &{$_real_trace}; }; } my $parse = Parse::RecDescent->new(<<'EOG'); expr : set | clear | disj set : 'set' atom clear : 'clear' atom disj : { bless $item[-1], 'LOGICAL::'.$item[0] } conj : { bless $item[-1], 'LOGICAL::'.$item[0] } unary : neg | bracket | atom bracket : '(' expr ')' neg : 'not' unary atom : /[a-z]+/i EOG trace_only( qr/Matched|consumed/ ); while () { my $tree = $parse->expr($_); print Data::Dumper->Dump([$tree]); print $tree->eval(), "\n" if $tree; } BEGIN {@var{qw(a c e)} = (1,1,1);} sub returning { # local $^W; # print +(caller(1))[3], " returning ($_[0])\n"; $_[0]; } sub LOGICAL::expr::eval { my $type = $_[0]->{set}||$_[0]->{clear} ||$_[0]->{disj}; returning $type->eval() } sub LOGICAL::disj::eval { returning join '', map {$_->eval()} @{$_[0]} } sub LOGICAL::conj::eval { returning ! join '', map {! $_->eval()} @{$_[0]} } sub LOGICAL::unary::eval { my $type = $_[0]->{neg}||$_[0]->{bracket} ||$_[0]->{atom}; returning $type->eval() } sub LOGICAL::bracket::eval { returning $_[0]->{expr}->eval() } sub LOGICAL::neg::eval { returning ! $_[0]->{unary}->eval() } sub LOGICAL::set::eval { returning $::var{$_[0]->{atom}->name()} = 1 } sub LOGICAL::clear::eval { returning $::var{$_[0]->{atom}->name()} = 0 } sub LOGICAL::atom::eval { returning $::var{$_[0]->{__VALUE__}} } sub LOGICAL::atom::name { returning $_[0]->{__VALUE__} } __DATA__ a or b and not c or d Parse-RecDescent-1.967015/demo/demo_logic.pl0000755000175000017500000000104511710167512020337 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # PARSE LOGICAL EXPRESSIONS $RD::TRACE=1; use Parse::RecDescent; $grammar = q{ expr : disj no_garbage no_garbage: /^\s*$/ | disj : conj ('or' conj)(s?) conj : unary ('and' unary)(s?) unary : 'not' atom | '(' disj ( ')' | ) | atom atom : /<.+?>/ }; $parse = new Parse::RecDescent ($grammar); $input = ''; print "> "; while (<>) { if (/^\.$/) { defined $parse->expr($input) or print "huh?\n"; $input = '' } else { chomp; $input .= " $_" } print "> "; } Parse-RecDescent-1.967015/demo/demo_selfmod.pl0000755000175000017500000000170011710167512020671 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # A SELF-MODIFYING**2 PARSER use Parse::RecDescent; $grammar = q{ type_defn : 'only' "$::keyword" is type_name ';' { $thisparser->Replace("type_name: '$item[5]'"); print "\"$item[5]\" is now the only valid type name\n"; } | "$::keyword" identifier(s) is type_name /('?s)?/ ';' { my $newnames = join " | ", map { "'$_'" } @{$item[3]}; $thisparser->Extend("type_name: $newnames"); print "added $newnames as type name(s)\n"; } | /change\s+$::keyword\s+to/ identifier { $::keyword = $item[2]; print "changed $item[0] keyword to: $item[2]\n"; } | is : /is|are/ type_name : 'int' | 'float' identifier: ...!is ...!"$::keyword" ...!type_name /[A-Za-z]\w*/ }; use vars qw { $keyword }; $keyword = "type"; $parse = new Parse::RecDescent ($grammar); while (<>) { $parse->type_defn($_) or print "huh?\n"; } Parse-RecDescent-1.967015/demo/demo_undumper.pl0000755000175000017500000005625711710167512021120 0ustar jtbraunjtbraunFrom merlyn@stonehenge.com Fri Jul 13 09:47:35 2001 Received: from nexus.csse.monash.edu.au (nexus.csse.monash.edu.au [130.194.226.4]) by indy05.csse.monash.edu.au (SGI-8.9.3/8.9.3) with ESMTP id JAA19597 for ; Fri, 13 Jul 2001 09:47:35 +1000 (EST) Received: from ALPHA9.CC.MONASH.EDU.AU (alpha9.cc.monash.edu.au [130.194.1.9]) by nexus.csse.monash.edu.au (8.9.3+Sun/8.9.3) with ESMTP id JAA28599 for ; Fri, 13 Jul 2001 09:47:32 +1000 (EST) Received: from c000.snv.cp.net ([209.228.32.59]) by vaxh.cc.monash.edu.au (PMDF V5.2-31 #29714) with SMTP id <01K5VMB591048YCMRT@vaxh.cc.monash.edu.au> for damian.conway@mail.csse.monash.edu.au; Fri, 13 Jul 2001 09:47:15 +1000 Received: (cpmta 24211 invoked from network); Thu, 12 Jul 2001 16:46:12 -0700 Received: (cpmta 24197 invoked from network); Thu, 12 Jul 2001 16:46:10 -0700 Received: from halfdome.holdit.com (209.102.105.64) by smtp.c000.snv.cp.net (209.228.32.59) with SMTP; Thu, 12 Jul 2001 16:46:10 -0700 Received: (from merlyn@localhost) by halfdome.holdit.com (8.9.1/8.9.1) id QAA21492; Thu, 12 Jul 2001 16:46:09 -0700 X-Received: 12 Jul 2001 23:46:10 GMT Date: Thu, 12 Jul 2001 16:46:09 -0700 From: merlyn@stonehenge.com (Randal L. Schwartz) Subject: new potential example for P::RD's eg directory? To: damian@conway.org Message-id: MIME-version: 1.0 Content-type: MULTIPART/MIXED; BOUNDARY="Boundary_(ID_V8Qg38zNd7FfillEy8fgog)" Delivered-to: conway.org%damian@conway.org User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.3 X-UID: Inbox;992469057;10334 Lines: 15 Status: RO --Boundary_(ID_V8Qg38zNd7FfillEy8fgog) Content-type: TEXT/PLAIN Content-transfer-encoding: 7BIT I just wrote a "Data::Dumper" undumper using P::RD. I'm attaching the magazine column (which you can't re-use, unfortunately) but perhaps the code itself can be put in P::RD's distro (with a small README) as an example of an interesting grammar and usage. Or perhaps you can develop it into a full reference/dereference parser? -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! --Boundary_(ID_V8Qg38zNd7FfillEy8fgog) Content-type: TEXT/PLAIN; NAME=col29.pod Content-description: col 29 linux magazine Content-disposition: attachment; filename=col29.pod Content-transfer-encoding: 7BIT =head1 Linux Magazine Column 29 (Oct 2001) [suggested title: Safe Undumping] Recently on the Perl Monestary at C, the user known as C asked about parsing a Perl-style double-quoted string, as part of a project to construct a safe C parser that would take the output and interpret it rather than handing the result directly to C as is typically done. A few postings later, the work in progress for their C was posted, and I commented that there was probably a simpler way to do some of the things, and that it didn't handle blessed references. Well, me and my big mouth. Or was it my continuing curiousity to tinker with Damian Conway's excellent C module? I'm not sure, but I found myself over the next dozen hours or so staring at C source code, output, test cases, and C traces and documentation. I also pounded my head on the desk for better than a day trying to figure out how to break a left-recursion loop, and came up with a nice obvious (now!) solution. The point of this is to be able to take the output of C and reconstruct the original data, but not open ourselves to the possibility of being fed dangerous constructs, like backticks or symbol-table-manipulating code. Sure, you could also do this with the right use of the C module, but I was already committed to finishing this version before I thought of that. Maybe another column someday. But anyway, that brings us to the program in [listing one, below]. Most of this program is input to C, so I'll start by setting that aside and describing the Perl support structure first. Lines 1 through 3 start nearly every program I write, turning on warnings, compiler restrictions, and disabling the buffering on C. Lines 5 and 6 control the behaviour of C. When tracing is enabled (although commented out here), I can see how the parse is progressing. The number 80 trims the "to-be-parsed" string dumping to its first 80 characters, which is usually plenty for me to see what's going on. I left hints enabled, because the error messages that result from a hint were often helpful in debugging the grammar. Lines 8 and 9 pull in the C (found in the CPAN) and C (part of the core Perl distribution) modules. Lines 13 to 81 form the input grammar for C, included as a single-quoted string using C for delimiters. Damian Conway seems to prefer this quoting style for sending stuff to C, so I've copied it at least here. In other places, I might have used a "here-document" instead. The result of feeding this grammar to C is a parser object that we can then call on to parse the output of C to "undump" it. If the grammar is bad, we die in line 81. At that point C has already printed its own diagnostics, expanded by the enabled "hint" flag earlier. During development, I probably changed this grammar over 50 times, constantly tweaking it to recognize each new thing that I encountered in the input source. The hints were often useful, although the "left recursion" problem (that I'll describe later) had me stumped for the better part of two days. Once we have a parser object (in $parser), the action to "undump" a string containing Perl code is simple. Skipping down quickly to the C subroutine starting at line 177, we can see the use of this C<$parser> to take a given C<$input> and transform it to a hashref response (saved as C<$symbol> here). The keys of the hashref contain the top-level scalar variable names, which default to C<$VAR1>, C<$VAR2> and so on, unless overridden during the dumping. The values of this hashref are then the corresponding values of those variables from the original dump, as best can be reconstructed. The C subroutine here uses C to convert an existing collection of values (passed in as arguments) into a string, shows the string as the input to the undumper, then parses the string with the constructed parser object. The output of this undumping process is then dumped once again and shown. The comparison process has to be a manual one, because it is sensitive to the order in which the items are presented to C. Above the definition of this testing routine is a series of tests that I used to develop and verify the undumper. I started with the included tests found in the Perl 5.6.1 distribution to verify Perl on initial installation. I actually didn't include the entire test, but took the data structure for each applicable test and copied that into this program. I skipped over the tests that included symbol-table references (using globs) or coderefs, because I chose to ignore that for this program. Each test provides an arrayref containing the references to be dumped, and most tests also provide a label list to keep the variables from simply being named C<$VAR1>, C<$VAR2> and so on. I enclosed each test in a conditional that I could switch between "if (0)" and "if (1)" to either run the test or skip it. When I was focussing on a new feature, I enabled only one test until it worked, but once I got a good run, I reenabled all previous tests to ensure that I didn't break something else in the process. The "Extreme Programming" technique relies on this strategy of creating tests first and driving the development from those tests, and it worked nicely for the development of this program. Other than a few snippets of code that I used for scaffolding to test some initial grammar ideas, all of the code that I used to develop and test this code is thereby archived along with the routines, for later maintenance to take place easily. Any one of the tests can be switched to invoke C rather than C, as I've done on the final test (starting in line 171). This final test, by the way, exercises blessed references (objects), which none of the distribution tests even approach. (Maybe a patch to the distribution's C is in order?) Using C, we pop down to the routine beginning in line 184, which runs a quick benchmark comparing a straight C with our safe undumping parser. I'm using the C routine from C (a standard Perl module), giving it the default "0 times", which attempts to run each routine for slighly over 3 CPU seconds to compute an interations per second for each type. The bad news is that my code clocks in at 1/100th of the speed of C. Ahh, the price to pay for safety. The good news is that the speed difference is probably negligible for typical applications, and pennies to pay for the dollars saved in having a safe evaluation of what should be good C output. Well, what little Perl code there is (outside of the grammar), that's it. So let's pick apart the grammar now, starting back up at line 13. Line 15 declares a variable local to the parser for the "symbol table" we'll be creating. The keys will be the full variable names as given in the dump. The values will be the constructed value for that key. Yes, this is the hash that is returned upon a successful parse. Line 17 defines the top-level rule for the parse. A "file" is zero or more "assignments". Howver, we also have to clear out the possibly leftover values from the previous run, and this is handled by the action block immediately preceding the assignment. We also have to verify that we've parsed the entire input string, which I'll do by matching C, which matches only at the end of the string. If all went well, we'll return the hashref. Line 19 defines an assignment. Currently, I'm handling only a scalar assignment, although C can also generate array and hash assignments under some circumstances. Gotta leave something for "release 2.0". Lines 21 and 22 define a scalar assignment, and the action to take for that. A scalar assignment consists of an "lvalue" (like a variable) and an "rvalue" (like an expression). Both lvalues and rvalues are references, so we dereference each of them in the action, and perform the requested assignment to keep our "virtual Perl symbol table" up to date. The action returns a constant 1 to let the parent rule know that this rule succeeded. Without that, an C value being assigned would have caused that rule to fail (one of my debugging attempts revealed such, with much head-banging-on-desk until I realized what I was doing). Lvalues are defined starting in line 24. The most complex lvalue is a dereferenced scalar reference, handled in lines 26 to 47. This is because C can generate a reference path to a specific entry in the data structure to patch it up so that it points to a proper place for complex data interelationships. For example, if an element of C<@a> points to C<@b>, but an element of C<@b> points to C<@a>, the "chicken-and-egg" tie is broken by first generating C<@a> without the C<@b> reference, then C<@b> with the C<@a> reference, then patching up C<@a> finally to point to C<@b>. The definition starting in line 26 uses a C to be a simple scalar, followed by an optional C which is one or more array-element or hash element dereferences. Let's look at the syntax before we study the actions taken. Line 39 shows us that C is either a C or C. Further down in lines 49 and 53, we see that a C can be either a scalar variable or a scalar reference dereference. And that would form the head of the chain. If it's a scalar variable, we take a reference to a hash element in our "symbol table", and return that. The key is the name of the variable, including the dollar sign. This permits eventual expansion to include arrays and hashes in our "symbol table", again for version 2.0. For the dereference of the scalar reference, we emulate the same steps in our virtual world, again returning a reference for all lvalues. Once we have the chain head, we then look for a dereference chain (line 40), starting with the mandatory arrow then a hash or array subscript expression. This may be followed with additional arrow-optional hash or array subscript expressions (line 42). Each subscript is returned as a two-element arrayref, with either the keyword "hash" or "array" followed by the scalar constant selecting the particular element (lines 45 and 47). The action in line 41 rolls up the deref chain as a reference to an array of those two-element arrayrefs. Back in line 26, we thus have a reference to a starting point for the C, and possible one or more dereferences to apply to it. The action in lines 27 to 37 walk the dereference chain to get to the final target lvalue, and return it (as a reference to a scalar somewhere in our virtual Perl symbol space). On the other side of the equation, we've got rvalues of various shapes, defined in line 58 to be either a C or a C. The loop back to C is necessary because any variable reference is also itself a source of values. Lines 60 to 66 define the only scalar constants that C seems to emit: C, simple signed integers, and quoted strings. These values are returned as references to those values so that C can be a valid return value: again, something I figured out the hard way after my rule for C kept breaking. Lines 68 to 75 handle the reference values: scalar references, array references, and hash references. Again, the syntax is very narrow: just the types of things I was able to see C generate, both by trying some sample data, and by lightly examining the source code to C. I may have missed a form or two, in which case this grammar will make it easy to add additional forms. The grammar is nicely recursive: an array reference in turn contains one or more C items, which brings us right back to the same place in the hierarchy, but at a nested level. Lines 76 to 79 handle blessed references. Again, the syntax is very specific to what I was able to determine about C's behavior. And that's pretty much it. The grammar looks for a series of assignments, each of which is an lvalue being given an rvalue. Some of the lvalues ultimately lead to symbol table entries, which then populate our symbol table hash with keys and values when assigned. Rvalues can be constants, or pointers into existing data in the symbol table. OK, it I somewhat full of smoke and mirrors, and I was rather pleased when the whole thing worked. And on some rainy Oregon afternoon, I hope to extend it to handle arrays and hashes as well, just to make it extremely flexible and universal. I even have some ideas about how to get it to handle globs. But that's for another day: until next time, enjoy! =head2 Listing =1= #!/usr/bin/perl -w =2= use strict; =3= $|++; =4= =5= # $::RD_TRACE = 80; =6= $::RD_HINT = 1; =7= =8= use Parse::RecDescent; =9= use Data::Dumper; =10= =11= ## define grammar =12= =13= my $parser = Parse::RecDescent->new(q{ =14= =15= { my %TABLE; } =16= =17= file: { %TABLE = (); } assignment(s?) /\z/ { \%TABLE } =18= =19= assignment: scalar_assignment | =20= =21= scalar_assignment: scalar_lvalue '=' scalar_rvalue ';' =22= { ${$item{scalar_lvalue}} = ${$item{scalar_rvalue}}; 1; } =23= =24= ## lvalues, indicated as reference to value, so we can assign to them =25= =26= scalar_lvalue: deref_head deref_chain(?) { =27= my $return = $item{deref_head}; =28= if ($item{deref_chain}) { =29= for (@{$item{deref_chain}[0]}) { =30= if ($_->[0] eq "hash") { =31= $return = \$$return->{${$_->[1]}}; =32= } elsif ($_->[0] eq "array") { =33= $return = \$$return->[${$_->[1]}]; =34= } else { die "what is $_->[0]?" } =35= } =36= } =37= $return; =38= } =39= deref_head: simple_scalar_lvalue | simple_scalar_rvalue =40= deref_chain: "->" hash_or_array_subscript deref_chain_more(s?) =41= { [$item[2], @{$item[3]}] } =42= deref_chain_more: "->" hash_or_array_subscript | hash_or_array_subscript =43= hash_or_array_subscript: hash_subscript | array_subscript =44= hash_subscript: "{" scalar_constant "}" =45= { ["hash", $item{scalar_constant}] } =46= array_subscript: "[" scalar_constant "]" =47= { ["array", $item{scalar_constant}] } =48= =49= simple_scalar_lvalue: '$' ident =50= { \ $TABLE{'$' . $item{ident}} } =51= ident: /[^\W\d]\w*/ =52= =53= simple_scalar_lvalue: '$' '{' scalar_rvalue '}' =54= { \ ${${$item{scalar_rvalue}}} } =55= =56= ## rvalues, indicated as reference to value, because "undef" is legal =57= =58= scalar_rvalue: simple_scalar_rvalue | scalar_lvalue =59= =60= simple_scalar_rvalue: scalar_constant =61= scalar_constant: 'undef' =62= { \ undef } =63= scalar_constant: /-?[1-9]\d*|0/ =64= { \ $item[1] } =65= scalar_constant: =66= { \ $item[1][2] } =67= =68= simple_scalar_rvalue: "\x5C" scalar_rvalue =69= { \ $item{scalar_rvalue} } =70= simple_scalar_rvalue: '[' scalar_rvalue(s? /,/) ']' =71= { \ [map $$_, @{$item[2]}] } =72= simple_scalar_rvalue: '{' hashpair(s? /,/) '}' =73= { \ {map @$_, @{$item[2]}} } =74= hashpair: scalar_constant '=>' scalar_rvalue =75= { [${$item{scalar_constant}}, ${$item{scalar_rvalue}}] } =76= simple_scalar_rvalue: 'bless' '(' scalar_rvalue ',' scalar_constant ')' =77= { \ bless( ${$item{scalar_rvalue}}, ${$item{scalar_constant}} ) } =78= simple_scalar_rvalue: 'do' '{' "\x5C" '(' 'my' '$o' '=' scalar_rvalue ')' '}' =79= { \ do { \ (my $o = ${$item{scalar_rvalue}})} } =80= =81= }) or die "compile"; =82= =83= ## following tests from t/dumper.t in 5.6.1 distribution =84= =85= if (0) { =86= my @c = ('c'); =87= my $c = \@c; =88= my $b = {}; =89= my $a = [1, $b, $c]; =90= $b->{a} = $a; =91= $b->{b} = $a->[1]; =92= $b->{c} = $a->[2]; =93= =94= test([$a, $b, $c], [qw(a b c)]); =95= } =96= =97= if (0) { =98= my $foo = { "abc\000\'\efg" => "mno\000", =99= "reftest" => \\1, =100= }; =101= =102= test([$foo], [qw($foo)]); =103= } =104= =105= if (0) { =106= my $foo = 5; =107= my @foo = (-10,\$foo); =108= my %foo = (a=>1,b=>\$foo,c=>\@foo); =109= $foo{d} = \%foo; =110= $foo[2] = \%foo; =111= =112= test([\%foo],[qw($foo)]); =113= } =114= =115= if (0) { =116= my @dogs = ( 'Fido', 'Wags' ); =117= my %kennel = ( =118= First => \$dogs[0], =119= Second => \$dogs[1], =120= ); =121= $dogs[2] = \%kennel; =122= my $mutts = \%kennel; =123= test([\@dogs, \%kennel, $mutts], [qw($dogs $kennel $mutts)]); =124= } =125= =126= if (0) { =127= my $a = []; =128= $a->[1] = \$a->[0]; =129= test([$a], [qw($a)]); =130= } =131= =132= if (0) { =133= my $a = \\\\\'foo'; =134= my $b = $$$a; =135= test([$a, $b], [qw($a $b)]); =136= } =137= =138= if (1) { =139= my $b; =140= my $a = [{ a => \$b }, { b => undef }]; =141= $b = [{ c => \$b }, { d => \$a }]; =142= timetest([$a, $b], [qw($a $b)]); =143= } =144= =145= if (0) { =146= my $a = [[[[\\\\\'foo']]]]; =147= my $b = $a->[0][0]; =148= my $c = $${$b->[0][0]}; =149= test([$a, $b, $c], [qw($a $b $c)]); =150= } =151= =152= if (0) { =153= my $f = "pearl"; =154= my $e = [ $f ]; =155= my $d = { 'e' => $e }; =156= my $c = [ $d ]; =157= my $b = { 'c' => $c }; =158= my $a = { 'b' => $b }; =159= test([$a, $b, $c, $d, $e, $f], [qw($a $b $c $d $e $f)]); =160= } =161= =162= if (0) { =163= my $a; =164= $a = \$a; =165= my $b = [$a]; =166= test([$b], [qw($b)]); =167= } =168= =169= ## end of tests from t/dumper.t, now some of my own =170= =171= if (0) { =172= my $x = bless {fred => 'flintstone'}, 'x'; =173= my $y = bless \$x, 'y'; =174= timetest([$x, $y], [qw($x $y)]); =175= } =176= =177= sub test { =178= my $input = Data::Dumper->new(@_)->Purity(1)->Dumpxs; =179= print "=" x 60, "\ninput:\n$input\n==>\noutput:\n"; =180= my $symbol = $parser->file($input) or die "execute"; =181= print Data::Dumper->new([values %$symbol], [keys %$symbol])->Purity(1)->Dumpxs; =182= } =183= =184= sub timetest { =185= require Benchmark; =186= =187= my $input = Data::Dumper->new(@_)->Purity(1)->Dumpxs; =188= print "=" x 60, "\ninput:\n$input\n==>\noutput:\n"; =189= Benchmark::timethese(0, { =190= PRD => sub { =191= package Dummy; =192= no strict; =193= my $symbol = $parser->file($input) =194= or die "execute"; =195= }, =196= EVAL => sub { =197= package Dummy; =198= no strict; =199= eval $input; =200= }, =201= }); =202= } --Boundary_(ID_V8Qg38zNd7FfillEy8fgog)-- Parse-RecDescent-1.967015/demo/demo_quicklist.pl0000755000175000017500000000301211710167512021246 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w use Parse::RecDescent; #$RD_TRACE=1; #$RD_HINT=1; my $parser = Parse::RecDescent->new(<<'EOG') or die; list1N_c: term(s /,/) list1N_s: term(s /\/+/) list0N_c: term(s? /,/) list0N_s: term(s? /\/+/) list01_c: term(? /,/) list01_s: term(? /\/+/) list2_c: term(2 /,/) list2_s: term(2 /\/+/) list02_c: term(0..2 /,/) list02_s: term(0..2 /\/+/) list2N_c: term(2.. /,/) list2N_s: term(2.. /\/+/) list13_c: term(..3 /,/) list13_s: term(..3 /\/+/) term: 't' EOG while () { print; print "\tlist1N_c:\t", @{$parser->list1N_c($_)||['undef']}, "\n"; print "\tlist1N_s:\t", @{$parser->list1N_s($_)||['undef']}, "\n"; print "\tlist0N_c:\t", @{$parser->list0N_c($_)||['undef']}, "\n"; print "\tlist0N_s:\t", @{$parser->list0N_s($_)||['undef']}, "\n"; print "\tlist01_c:\t", @{$parser->list01_c($_)||['undef']}, "\n"; print "\tlist01_s:\t", @{$parser->list01_s($_)||['undef']}, "\n"; print "\tlist2_c:\t", @{$parser->list2_c($_)||['undef']}, "\n"; print "\tlist2_s:\t", @{$parser->list2_s($_)||['undef']}, "\n"; print "\tlist02_c:\t", @{$parser->list02_c($_)||['undef']}, "\n"; print "\tlist02_s:\t", @{$parser->list02_s($_)||['undef']}, "\n"; print "\tlist2N_c:\t", @{$parser->list2N_c($_)||['undef']}, "\n"; print "\tlist2N_s:\t", @{$parser->list2N_s($_)||['undef']}, "\n"; print "\tlist13_c:\t", @{$parser->list13_c($_)||['undef']}, "\n"; print "\tlist13_s:\t", @{$parser->list13_s($_)||['undef']}, "\n"; print "-----\n"; } __DATA__ t t,t t,t,t t,t,t,t t t/t t/t//t t/t///t/t Parse-RecDescent-1.967015/demo/demo_mccoy.pl0000755000175000017500000000101311663610756020361 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # DAMMIT, JIM, I'M A DOCTOR, NOT A PERL PROGRAM! use Parse::RecDescent; $grammar = q{ McCoy : curse ',' name ", I'm a doctor, not a" profession '!' | possessive 'dead,' name '!' | curse : 'Dammit' | 'Goddammit' name : 'Jim' | 'Spock' | 'Scotty' profession: 'magician' | 'miracle worker' | 'Perl hacker' possessive: "He's" | "She's" | "It's" | "They're" }; $parse = new Parse::RecDescent ($grammar); print "> "; while (<>) { $parse->McCoy($_); print "> "; } Parse-RecDescent-1.967015/demo/demo_autoscoresep.pl0000755000175000017500000000112211710167512021752 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; $RD_WARN = undef; $RD_TRACE=1; my $parse = Parse::RecDescent->new(<<'EOGRAMMAR'); line: line: seplist[sep=>','] | seplist[sep=>':'] | seplist[sep=>" "] seplist: EOGRAMMAR while () { chomp; my $res = $parse->line($_); print '[', join('][', @$res), "]\n"; } __DATA__ c,o,m,m,a,s,e,p,a,r,a,t,e,d c:o:l:o:n:s:e:p:a:r:a:t:e:d s p a c e s e p a r a t e d m u:l t i,s:ep ar:a,ted m u:l,t i,s:ep ar:a,ted m:u:l,t i,s:ep ar:a,ted Parse-RecDescent-1.967015/demo/demo.pl0000755000175000017500000000056311710167512017166 0ustar jtbraunjtbraun#! /usr/bin/perl -w package Foo; use base Parse::RecDescent; sub print_cat { print "Found ", @_[1..$#_], "\n"; } package main; $parse = new Parse::RecDescent(<<'EOG'); input : "cat" { $thisparser->print_cat($item[1]); } | "dog" { $thisparser->print_cat($item[1]); } EOG $parse = bless $parse, "Foo"; $parse->input("cat"); $parse->input("doggone"); Parse-RecDescent-1.967015/demo/demo_lisplike.pl0000755000175000017500000000277711663610756021105 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w use Parse::RecDescent; use Data::Dumper; my $grammar = q{ s_expr: '(' s_expr(s) ')' # NESTED S-EXPRS { $return = $item[2] } | /" [^\\"]* (\\. [^\\"]*)* "/x # STRINGS | /[^()\s]+/ # ANYTHING ELSE }; my $parser = Parse::RecDescent->new($grammar) or die; undef $/; my $data = ; my $nested_array = $parser->s_expr($data); print Data::Dumper->Dump($nested_array); __DATA__ (net (rename VDDTX "vddtx") (joined (portRef VDDTX) (portRef &2 (instanceRef I_45_1102680350))) (figure WIRE (path (pointList (pt 6700 -2100) (pt 6900 -2100))) (path (pointList (pt 6900 -2100) (pt 6900 -2500))))) (net (rename N00023 "N00023") (joined (portRef &2 (instanceRef I_45_1215476478)) (portRef &1 (instanceRef I_45_1102680350)) (portRef &2 (instanceRef I_45_1215470655))) (figure WIRE (path (pointList (pt 5800 -3400) (pt 6900 -3400))) (path (pointList (pt 5800 -3400) (pt 5800 -3700))) (path (pointList (pt 6900 -3100) (pt 6900 -3400))) (path (pointList (pt 8000 -3400) (pt 8000 -3700))) (path (pointList (pt 6900 -3400) (pt 8000 -3400)))) (instance (rename TIE1 "TIE1") (viewRef XTIE0A2Y2 (cellRef XTIE0A2Y2 (libraryRef RIPPER_LIBRARY))) (transform (origin (pt 6900 -3400))))) Parse-RecDescent-1.967015/demo/demo_derived.pl0000755000175000017500000000160211710167512020663 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # THE OL' "EMPTY SUBCLASS IN THE DEMO" TRICK. use Parse::RecDescent; sub Parse::RecDescent::f { print "Parse::RecDescent::f\n"; } @DerParser::ISA = qw { Parse::RecDescent }; $grammar = q{ typedef : /type/ ident /has/ field(s) 'end type' { $return = $item[2]; } | /type/ ident ( /is/ | /are/ ) ident { $return = $item[2]; } | field : /field/ ident /is/ ident ident : /[A-Za-z]\w*/ { f(); $return = $item[1]; } }; $parse = new DerParser ( $grammar ) || die "\n"; $str = " type student has field name is text field age end type "; print "> ", $parse->typedef($str) || "", "\n"; $str = " type student has end type "; print "> ", $parse->typedef($str) || "", "\n"; $str = " type studentRec is student "; print "> ", $parse->typedef($str) || "", "\n"; Parse-RecDescent-1.967015/demo/demo_skipcomment.pl0000755000175000017500000000114711710167512021576 0ustar jtbraunjtbraun#!/usr/local/bin/perl -ws # REMOVE COMMENTS FROM C++ CODE use strict; use Parse::RecDescent; use vars qw/ $Grammar /; $RD_TRACE=1; my $parser = new Parse::RecDescent $Grammar or die "invalid grammar"; undef $/; my $text = @ARGV ? <> : ; print join " ", @{$parser->program($text) or die "malformed C program"}; BEGIN { $Grammar=<<'EOF'; program : part(s) part : /\S+/ EOF } __DATA__ Now we find # to our inexpressible joy That the parser # a cunningly wrought thing Skips both whitespace # the usual behaviour # though it be configurable And comments #however many there be Parse-RecDescent-1.967015/demo/demo_buildcalc.pl0000755000175000017500000000247011710167512021167 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; $RD_AUTOACTION = q{ $item[-1]; # JUST TO SHOW THEY WORK WITH PRECOMPILED PARSERS }; Parse::RecDescent->Precompile(<<'EndGrammar', "Calc", $0 ); { use Coy; } { my $lexical_var = 1; } mult_op: '*' { sub { $_[0] *= $_[1] } } | '/' { sub { $_[0] /= $_[1] } } main: expr /\s*\Z/ { $lexical_var++ } { $item[1]->() } | expr: /for(each)?/ lvar range expr { my ($vname,$expr) = @item[2,4]; my ($from, $to) = @{$item[3]}; sub { my $val; no strict "refs"; for $$vname ($from->()..$to->()) { $val = $expr->() } return $val; } } | lvar '=' addition { my ($vname, $expr) = @item[1,3]; sub { no strict 'refs'; $$vname = $expr->() } } | addition range: "(" expr ".." expr ")" { [ @item[2,4] ] } addition: { my $add = $item[1]; sub { ::evalop($add) } } add_op: '+' { sub { $_[0] += $_[1] } } | '-' { sub { $_[0] -= $_[1] } } multiplication: { my $mult = $item[1]; sub { ::evalop($mult) } } factor: number | rvar | '(' expr ')' { $item[2] } number: /[-+]?\d+(\.\d+)?/ { sub { $item[1] } } lvar: /\$([a-z]\w*)/ { $1 } rvar: lvar { sub { no strict 'refs'; ${$item[1]} } } EndGrammar Parse-RecDescent-1.967015/demo/demo_recipe.pl0000755000175000017500000000414411710167512020514 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # "Potato, Egg, Red meat & Lard Cookbook", # T. Omnicient Rash & N. Hot Ignorant-Kant # O'Besity & Associates use Parse::RecDescent; $grammar = q{ Recipe: Step(s) Step: Verb Object Clause(s?) { print "$item[1]\n" } | Verb: 'boil' | 'peel' | 'mix' | 'melt' | 'fry' | 'steam' | 'marinate' | 'sprinkle' | 'is' | 'are' | 'has' Object: IngredientQualifier(s) Ingredient | ReferenceQualifier(s) Ingredient | Reference Clause: SubordinateClause | CoordinateClause SubordinateClause: 'until' State | 'while' State | 'for' Time CoordinateClause: /and( then)?/ Step | /or/ Step State: Object Verb Adjective | Adjective Time: Number TimeUnit TimeUnit: /hours?/ /minutes?/ /seconds?/ QuantityUnit: /lbs?/ Object: ReferenceQualifier Ingredient | Reference Reference: 'they' | 'it' | 'them' Ingredient: 'potatoes' | 'lard' | 'olive oil' | 'sugar' | 'bacon fat' | 'butter' | 'salt' | 'vinegar' IngredientQualifier: Amount | Number | 'a' | 'some' | 'large' | 'small' Amount: Number QuantityUnit ReferenceQualifier: 'the' | 'those' | 'each' | 'half the' Number: /[1-9][0-9]*/ | /one|two|three|four|five|six|seven|eight|nine/ | 'a dozen' Adjective: 'soft' | 'tender' | 'done' | 'charred' | 'grey' }; $parse = new Parse::RecDescent ($grammar); $/ = "\n\n"; while () { if($ingredients = $parse->Recipe(lc $_)) { print "$ingredients\n$_"; } } __DATA__ Boil six large potatoes until they are grey and then marinate them for at least two hours in a mixture of lard, olive oil, raw sugar, and sea-salt. In a deep-fryer melt 2 lbs of bacon fat and bring to the boil. Fry the marinated potatoes for 7 minutes, or until they are nicely charred. Serve with lashings of butter. Sprinkle with salt and vinegar to taste. Parse-RecDescent-1.967015/demo/demo_piecewise.pl0000755000175000017500000000604311663610756021234 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; my $parser = new Parse::RecDescent q { components: ganglia(s) ganglia: 'ganglia' '{' cell(s) '}' cell: 'cell' '{' synapse(s) '}' synapse: 'synapse' '{' WHATEVER '}' WHATEVER: /\d\.\d\.\d/ # OBVIOUSLY REPLACE THIS WITH WHATEVER :-) } or die "Bad grammar!\n"; my $text = ''; getnextchunk($text) # GET FIRST CHUNK OF DATA or die "No data!\n"; $text =~ s/\s*define\s*NetworkDetails\s*{// # EAT OUTERMOST OPENING BRACKET or die "Bad data! (expected 'define...')"; do { $parser->components(\$text); # TRY TO EAT GANGLIA } while (getnextchunk($text)); # APPEND THE NEXT CHUNK $text =~ m/\s*}/ # CHECK FOR THE OUTERMOST CLOSING BRACKET or die "Unclosed 'define' (missing '}')!"; sub getnextchunk # APPEND SOME MORE TEXT TO THE FIRST ARG { my $wasmore = 0; for (1..10) # OPTIMALLY, ONE LINE BIGGER THAN THE LARGEST # SINGLE GANGLIA DEFINITION, BUT ANY NUMBER # OF LINES WILL DO { my $nextline = ; defined $nextline or last; $_[0] .= $nextline; $wasmore++ } return $wasmore; } __DATA__ define NetworkDetails { ganglia { cell { synapse { 1.1.1 } synapse { 1.1.2 } synapse { 1.1.3 } synapse { 1.1.4 } synapse { 1.1.5 } } cell { synapse { 1.2.1 } synapse { 1.2.2 } synapse { 1.2.3 } synapse { 1.2.4 } synapse { 1.2.5 } } cell { synapse { 1.3.1 } synapse { 1.3.2 } synapse { 1.3.3 } synapse { 1.3.4 } synapse { 1.3.5 } } cell { synapse { 1.4.1 } synapse { 1.4.2 } synapse { 1.4.3 } synapse { 1.4.4 } synapse { 1.4.5 } } } ganglia { cell { synapse { 2.1.1 } synapse { 2.1.2 } synapse { 2.1.3 } synapse { 2.1.4 } synapse { 2.1.5 } } cell { synapse { 2.2.1 } synapse { 2.2.2 } synapse { 2.2.3 } synapse { 2.2.4 } synapse { 2.2.5 } } cell { synapse { 2.3.1 } synapse { 2.3.2 } synapse { 2.3.3 } synapse { 2.3.4 } synapse { 2.3.5 } } cell { synapse { 2.4.1 } synapse { 2.4.2 } synapse { 2.4.3 } synapse { 2.4.4 } synapse { 2.4.5 } } } ganglia { cell { synapse { 3.1.1 } synapse { 3.1.2 } synapse { 3.1.3 } synapse { 3.1.4 } synapse { 3.1.5 } } cell { synapse { 3.2.1 } synapse { 3.2.2 } synapse { 3.2.3 } synapse { 3.2.4 } synapse { 3.2.5 } } cell { synapse { 3.3.1 } synapse { 3.3.2 } synapse { 3.3.3 } synapse { 3.3.4 } synapse { 3.3.5 } } cell { synapse { 3.4.1 } synapse { 3.4.2 } synapse { 3.4.3 } synapse { 3.4.4 } synapse { 3.4.5 } } } ganglia { cell { synapse { 4.1.1 } synapse { 4.1.2 } synapse { 4.1.3 } synapse { 4.1.4 } synapse { 4.1.5 } } cell { synapse { 4.2.1 } synapse { 4.2.2 } synapse { 4.2.3 } synapse { 4.2.4 } synapse { 4.2.5 } } cell { synapse { 4.3.1 } synapse { 4.3.2 } synapse { 4.3.3 } synapse { 4.3.4 } synapse { 4.3.5 } } cell { synapse { 4.4.1 } synapse { 4.4.2 } synapse { 4.4.3 } synapse { 4.4.4 } synapse { 4.4.5 } } } } Parse-RecDescent-1.967015/demo/demo_simplequery.pl0000755000175000017500000000150611710167512021623 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; use Data::Dumper; use Parse::RecDescent; my $grammar = <<'EOGRAMMAR'; { my %patterns; } query: { %patterns=() } pattern(s) end_of_query { $return = { %patterns } } | end_of_query: /\Z/ pattern: '+' string { push @{$patterns{required}}, $item[2] } | '-' string { push @{$patterns{excluded}}, $item[2] } | string { push @{$patterns{optional}}, $item[1] } | string: quoted_string | bareword quoted_string: { my $string = extract_delimited($text,q{'"}); $return = substr($string,1,length($string)-2) if $string; } bareword: /\w+/ EOGRAMMAR my $parser = Parse::RecDescent->new($grammar) or die; while (<>) { my $query = $parser->query($_); print Data::Dumper->Dump([$query],['query']); } Parse-RecDescent-1.967015/demo/demo_language.pl0000755000175000017500000000511511710167512021027 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws #SHARED SYMBOL_TABLE my %symbol_table = (); package Operation; sub new { my ($class, %args) = @_; bless \%args, $class; } package Assignment_Op; @ISA = qw( Operation ); sub eval { my ($self) = @_; $symbol_table{$self->{var}->{name}} = $self->{value}->eval(); } package Addition_Op; @ISA = qw ( Operation ); sub eval { my ($self) = @_; return $self->{left}->eval() + $self->{right}->eval(); } package Multiplication_Op; @ISA = qw ( Operation ); sub eval { my ($self) = @_; return $self->{left}->eval() * $self->{right}->eval(); } package IfThenElse_Op; @ISA = qw ( Operation ); sub eval { my ($self) = @_; if ($self->{condition}->eval() ) { return $self->{true_expr}->eval(); } else { return $self->{false_expr}->eval(); } } package LessThan_Op; @ISA = qw ( Operation ); sub eval { my ($self) = @_; return $self->{left}->eval() < $self->{right}->eval(); } package Value_Op; @ISA = qw( Operation ); sub eval { my ($self) = @_; return $self->{value}; } package Variable_Op; @ISA = qw( Operation ); sub eval { my ($self) = @_; return $symbol_table{$self->{name}}; } package Sequence_Op; sub new { my ($class, $list_ref) = @_; bless $list_ref, $class; } sub eval { my ($self) = @_; my $last_val; foreach my $statement ( @$self ) { $last_val = $statement->eval(); } return $last_val; } package main; use Parse::RecDescent; my $grammar = q { Script: Statement(s) /^$/ { Sequence_Op->new( $item[1] ) } Statement: Assignment | IfThenElse | Expression | Assignment: Variable '<-' Expression { Assignment_Op->new( var => $item[1], value => $item[3]) } Expression: Product "+" Expression { Addition_Op->new( left => $item[1], right => $item[3] ) } | Product Product: Value "*" Product { Multiplication_Op->new( left => $item[1], right => $item[3] ) } | Value Value: /\d+/ { Value_Op->new( value => $item[1] ) } | Variable Variable: /(?!if)[a-z]/ { Variable_Op->new( name => $item[1] ); } IfThenElse: 'if' Condition 'then' Statement 'else' Statement { IfThenElse_Op->new( condition => $item[2], true_expr => $item[4], false_expr => $item[6]) } Condition: Expression '<' Expression { LessThan_Op->new( left => $item[1], right => $item[3] ) } }; my $parser = Parse::RecDescent->new($grammar) or die "Bad grammar"; local $/; my $script = ; my $tree = $parser->Script($script) or die "Bad script"; print $tree->eval(); __DATA__ a <- 1 b <- 2 if anew(<<'EOGRAMMAR'); file: line(s?) line: field(s? /,|=>/) "\n" { $item[1] } field: { join "", @{$item[1]} } | /((?!,|=>).)*/ EOGRAMMAR use Data::Dumper; print Data::Dumper->Dump($PCSV_parser->file(join "", )); __DATA__ comma,separated=>values with,q{a,perl,twist},/to them/ Parse-RecDescent-1.967015/demo/demo_implicit.pl0000755000175000017500000000107711663610756021073 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # IMPLICIT SUB RULES, OKAY! use strict; use Parse::RecDescent; my $grammar = q{ { my $state = "happy"; } testimony : declaration(s) { print "Ended up $state\n"; } declaration : 'I' ('like'|'loathe')(s) ('ice-cream'|'lawyers') { print "$item[3]\n"; $state = 'happy/sad'; } | 'I' ("can't"|"can") ('fly'|'swim') { print "$item[3]\n"; $state = 'able/unable'; } | }; my $parse = new Parse::RecDescent ($grammar); my $input; while (defined ($input = <>)) { $parse->testimony($input) or print "huh?\n"; } Parse-RecDescent-1.967015/demo/demo_decomment.pl0000755000175000017500000000507511711403033021214 0ustar jtbraunjtbraun#!/usr/local/bin/perl -ws # $::RD_TRACE = 1; # REMOVE COMMENTS FROM C++ CODE # ORIGINAL BY Helmut Jarausch # EXTENDED BY Damian Conway AND Helmut Jarausch AND Jeremy Braun use strict; use Parse::RecDescent; use vars qw/ $Grammar /; my $parser = new Parse::RecDescent $Grammar or die "invalid grammar"; undef $/; my $text = @ARGV ? <> : ; my $parts = $parser->program($text) or die "malformed C program"; print "Comments\n========\n$parts->{comments}\n"; print "\nCode\n====\n$parts->{code}\n"; print "\nStrings\n=======\n", map(qq{\t"$_"\n}, @{$parts->{strings}}); BEGIN { $Grammar=<<'EOF'; program : program : /this shouldn't be here :-/ program : program : /with prejudice/ program : program : program : part(s) { { code=>$Code, comments=>$Comments, strings=>[@Strings]} } part : comment | C_code | string | charlit C_code : m{( [^"/]+ # one or more non-delimiters ( # then (optionally)... / # a potential comment delimiter [^*/] # which is not an actual delimiter )? # )+ # all repeated once or more }x { $Code .= $item[1] } string : m{" # a leading delimiter (( # zero or more... \\. # escaped anything | # or [^"] # anything but a delimiter )* ) "}x { $Code .= $item[1]; push @Strings, $1 } charlit : m{' # a leading delimiter (( # zero or more... \\. # escaped anything | # or [^'] # anything but a delimiter )* ) '}x { $Code .= $item[1]; push @Strings, $1 } comment : m{\s* # optional whitespace // # comment delimiter [^\n]* # anything except a newline \n # then a newline }x { $Code .= "\n"; $Comments .= $item[1] } | m{\s* # optional whitespace /\* # comment opener (?:[^*]+|\*(?!/))* # anything except */ \*/ # comment closer ([ \t]*)? # trailing blanks or tabs }x { $Code .= " "; $Comments .= $item[1] } EOF } __DATA__ program test; // for decomment // using Parse::RecDescent int main() { /* this should be removed */ char *cp1 = ""; char *cp2 = "cp2"; char c3 = 'c'; int i; // a counter // remove this line altogehter int k; int more_indented; // keep indentation int l; /* a loop variable */ // should be completely removed char *str = "/* ceci n'est pas un commentaire */"; return 0; } Parse-RecDescent-1.967015/demo/demo_autostub.pl0000755000175000017500000000042611710167512021112 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws # use strict; use Parse::RecDescent; print 1 if Parse::RecDescent->new(<<'EOGRAMMAR')->file(join "", ); file: defn(s) defn: 'def' ident ':' block block: '{' item(s) '}' EOGRAMMAR __DATA__ def ident : { item item item } Parse-RecDescent-1.967015/demo/demo_eval.pl0000755000175000017500000000224111710167512020170 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w use Parse::RecDescent; $text = "1 + 2 * 3 ** 4"; $grammar = q{ expr : conj : addn : mult : expo : value : /\d+(\.\d+)?/ }; my $extract_tree = Parse::RecDescent->new($grammar); my $tree = $extract_tree->expr($text); use Data::Dumper 'Dumper'; print Dumper $tree; print $tree->eval(); package expr; use List::Util 'reduce'; sub eval { reduce {$_[0] || $_[1]} map $_->eval(), @{$_[0]{__DIRECTIVE1__}} } package conj ; use List::Util 'reduce'; sub eval { reduce {$_[0] && $_[1]} map $_->eval(), @{$_[0]{__DIRECTIVE1__}} } package addn ; use List::Util 'reduce'; sub eval { reduce {$_[0] + $_[1]} map $_->eval(), @{$_[0]{__DIRECTIVE1__}} } package mult ; use List::Util 'reduce'; sub eval { reduce {$_[0] * $_[1]} map $_->eval(), @{$_[0]{__DIRECTIVE1__}} } package expo ; use List::Util 'reduce'; sub eval { reduce {$_[0] ** $_[1]} map $_->eval(), @{$_[0]{__DIRECTIVE1__}} } package value ; sub eval { return $_[0]{__VALUE__} } Parse-RecDescent-1.967015/demo/demo_LaTeXish_autoact.pl0000755000175000017500000000334011710167512022443 0ustar jtbraunjtbraun use Parse::RecDescent; #$RD_TRACE = 1; #$RD_HINT = 1; my $parser = Parse::RecDescent->new(<<'EOGRAMMAR'); file: element(s) element: command | literal command: '\\' literal options(?) args(?) options: '[' option(s? /,/) ']' args: '{' element(s?) '}' option: /[^][\\$&%#_{}~^ \t\n,]+/ literal: /[^][\\$&%#_{}~^ \t\n]+/ EOGRAMMAR local $/; my $tree = $parser->file(); use Data::Dumper 'Dumper'; warn Dumper [ $tree ]; $tree->explain(0); sub file::explain { my ($self, $level) = @_; for (@{$self->{'element(s)'}}) { $_->explain($level); print "\n"; } } sub element::explain { my ($self, $level) = @_; ($self->{command}||$self->{literal})->explain($level) } sub command::explain { my ($self, $level) = @_; print "\t"x$level, "Command: $self->{literal}{__PATTERN1__}\n"; print "\t"x$level, "\tOptions:\n"; $self->{'options(?)'}[0]->explain($level+2) if @{$self->{'options(?)'}}; print "\t"x$level, "\tArgs:\n"; $self->{'args(?)'}[0]->explain($level+2) if @{$self->{'args(?)'}}; } sub options::explain { my ($self, $level) = @_; $_->explain($level) foreach @{$self->{'option(s?)'}}; } sub args::explain { my ($self, $level) = @_; $_->explain($level) foreach @{$self->{'element(s?)'}}; } sub option::explain { my ($self, $level) = @_; print "\t"x$level, "Option: $self->{__PATTERN1__}\n"; } sub literal::explain { my ($self, $level) = @_; print "\t"x$level, "Literal: $self->{__PATTERN1__}\n"; } __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Parse-RecDescent-1.967015/demo/demo_perlparsing.pl0000755000175000017500000000170311710167512021571 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w use Parse::RecDescent; my $parse = Parse::RecDescent->new(<<'EndGrammar'); perl: { print 'quotelike: [', join("|", @{$item[1]}), "]\n" } | { print "variable: $item[-1]\n" } | { print "codeblock: $item[1]\n" } | /.*/ { print "unknown: $item[1]\n" } EndGrammar print "> "; while () { # FOR DEMO CHANGE TO: while () $parse->perl($_); } __DATA__ {$a=1}; $a; { $a = $b; \n $a =~ /$b/; \n @a = map /\s/ @b }; $_; $a[1]; $_[1]; $a{cat}; $_{cat}; $a->[1]; $a->{"cat"}[1]; @$listref; @{$listref}; $obj->nextval; @{$obj->nextval}; @{$obj->nextval($cat,$dog)->{new}}; @{$obj->nextval($cat?$dog:$fish)->{new}}; @{$obj->nextval(cat()?$dog:$fish)->{new}}; $ a {'cat'}; $a::b::c{d}->{$e->()}; $#_; $#array; $#{array}; $var[$#var]; 'a'; "b"; `c`; 'a\''; 'a\\'; '\\a'; "a\\"; "\\a"; "b\'\"\'"; `c '\`abc\`'`; q{a}; qq{a}; qx{a}; s{a}/b/; tr!a!b!; Parse-RecDescent-1.967015/demo/demo_leftop.pl0000755000175000017500000000170411710167512020535 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws # THE COMMONEST REASON FOR WANTING LEFT RECURSION use strict; use Parse::RecDescent; $::RD_HINT = 1; sub Parse::RecDescent::evalop { $_[0][0] = $_[0][$_+1](@{$_[0]}[0,$_+2]) for map 2*($_-1), 1..@{$_[0]}/2; return $_[0][0]; } my $parse = Parse::RecDescent->new(<<'EndGrammar'); main: expr /\Z/ { $item[1] } | expr: { evalop($item[1]) } add_op: '+' { sub { $_[0] += $_[1] } } | '-' { sub { $_[0] -= $_[1] } } term: { evalop($item[1]) } mult_op: '*' { sub { $_[0] *= $_[1] } } | '/' { sub { $_[0] /= $_[1] } } factor: number | '(' expr ')' { $item[2] } number: /[-+]?\d+(\.\d+)?/ EndGrammar while () { print "$_ = ", $parse->main($_), "\n"; } while (print "> " and defined($_=<>)) { print "= ", $parse->main($_), "\n"; } __DATA__ 2+3 2*3 +1-1+1-1+1-1+1-1+1 7*7-6*8 121/(121/11)/121*11 1/(10-1/(1/(10-1))) Parse-RecDescent-1.967015/demo/demo_NL2SQL.pl0000755000175000017500000000443111710167512020217 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; my $grammar = <<'EOGRAMMAR'; translate: select | sum | identify | { "Could you rephase that?\n" } select: ask_select qualifier(?) field /of/ qualifier(?) table { "SELECT DISTINCT $item[3]\nFROM $item[6]\n" } | ask_select qualifier(?) table /'?/ qualifier(?) field { "SELECT DISTINCT $item[6]\nFROM $item[3]\n" } | ask_select qualifier(?) table { "SELECT *\nFROM $item[3]\n" } sum: ask_count table prep qualifier field prep(?) value { "SELECT COUNT(*)\nFROM $item[2]\n" . "WHERE $item[5] = $item[7]\n" } | ask_count table { "SELECT COUNT(*)\nFROM $item[2]\n" } identify: ask_select(?) /who supplies/ qualifier value { "SELECT supplier\nFROM merchandise\n" . "WHERE name = $item[4]\n" } | whats qualifier field /of/ qualifier /suppliers? of/ value { "SELECT suppliers.$item[3]\n" . "FROM suppliers, products\n" . "WHERE products.name = $item[-1]\n" . "AND suppliers.name = products.supplier \n" } field: /(name)s?/ { $1 } | /(product)s?/ { $1 } | /(id)(entit(y|ies))?/ { $1 } | /(quantit(y|ies))/ { $1 } | /(received)s?/ { $1 } | /(supplier)s?/ { $1 } | /(cost)s?/ { $1 } | /(address)(es)?/ { $1 } table: /suppliers?/ { 'suppliers' } | 'merchandise' | /orders?/ { 'orders' } qualifier: /the|every|all( the)?|any|our/ ask_select: reply to_me ask_count: ask_select(?) /how (many|much)/ reply: /tell|show|list/ value: /\w+/ { qq{'$item[1]'} } to_me: /((to )?(me|us))?/ whats: /what's|what (is|are)/ prep: /for|of|with|by/ EOGRAMMAR my $parser = Parse::RecDescent->new($grammar) or die "Bad grammar"; $| = 1; while () { print "> "; sleep 1; print; <>; my $SQL = $parser->translate($_); print $SQL, "\n"; } __DATA__ how many orders for the product spam are there? tell me how many suppliers by the name of Jones we have what are the names of our suppliers of trinitrotoluene? tell me our suppliers' names list our merchandise show us the suppliers of the mechanise list all supplier names how many orders are there? how much merchandise do we carry? who supplies our nitrocelluose? tell me who supplies our ethylacetate? what are the addresses of our suppliers of trinitrotoluene? Parse-RecDescent-1.967015/demo/demo_decomment_nonlocal.pl0000755000175000017500000000470711711403033023102 0ustar jtbraunjtbraun#!/usr/local/bin/perl -ws # REMOVE COMMENTS FROM C++ CODE # ORIGINAL BY Helmut Jarausch # EXTENDED BY Damian Conway AND Helmut Jarausch AND Jeremy Braun use strict; use Parse::RecDescent; use vars qw/ $Grammar /; my $parser = new Parse::RecDescent $Grammar or die "invalid grammar"; undef $/; my $text = @ARGV ? <> : ; $parser->program($text) or die "malformed C program"; print "Comments\n========\n$parser->{comments}\n"; print "\nCode\n====\n$parser->{code}\n"; print "\nStrings\n=======\n", map(qq{\t"$_"\n}, @{$parser->{strings}}); BEGIN { $Grammar=<<'EOF'; program : { @{$thisparser}{qw(comments code strings)} = () } part(s) part : comment { $thisparser->{comments} .= $item[1]; $thisparser->{code} .= " "; } | C_code { $thisparser->{code} .= $item[1]; } | string { $thisparser->{code} .= qq("$item[1]"); push @{$thisparser->{strings}}, $item[1]; } | charlit { $thisparser->{code} .= qq('$item[1]'); push @{$thisparser->{strings}}, $item[1]; } C_code : m{( [^"/]+ # one or more non-delimiters ( # then (optionally)... / # a potential comment delimiter [^*/] # which is not an actual delimiter )? # )+ # all repeated once or more }x string : m{" # a leading delimiter (( # zero or more... \\. # escaped anything | # or [^"] # anything but a delimiter )* ) "}x { $return = $1 } charlit : m{' # a leading delimiter (( # zero or more... \\. # escaped anything | # or [^'] # anything but a delimiter )* ) '}x { $return = $1 } comment : m{\s* # optional whitespace // # comment delimiter [^\n]* # anything except a newline \n # then a newline }x | m{\s* # optional whitespace /\* # comment opener (?:[^*]+|\*(?!/))* # anything except */ \*/ # comment closer ([ \t]*)? # trailing blanks or tabs }x EOF } __DATA__ program test; // for decomment // using Parse::RecDescent int main() { /* this should be removed */ char *cp1 = ""; char *cp2 = "cp2"; char c3 = 'c'; int i; // a counter // remove this line altogehter int k; int more_indented; // keep indentation int l; /* a loop variable */ // should be completely removed char *str = "/* ceci n'est pas un commentaire */"; return 0; } Parse-RecDescent-1.967015/demo/demo_precalc.pl0000755000175000017500000000135611663610756020672 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws BEGIN { unless (-f "Calc.pm") { print "You first need to build Calc.pm.\n\n", "Try: perl -MParse::RecDescent - calc_grammar Calc\n", " or: demo_buildcalc.pl\n\n"; exit; } } use Parse::RecDescent; use Calc; sub evalop { # my (@list) = @{[@{$_[0]}]}; my (@list) = @{$_[0]}; my $val = shift(@list)->(); while (@list) { my ($op, $arg2) = splice @list, 0, 2; $op->($val,$arg2->()); } return $val; } my $parse = Calc->new() or die "bad grammar"; print "> "; while () { # FOR DEMO CHANGE TO: while () print $parse->main($_), "\n\n> "; } __DATA__ $x = 2 $y = 3 +1-1+1-1+1-1+1-1+1 7*7-6*8 121/(121/11)/121*11 1/(10-1/(1/(10-1))) $x * $y foreach $i (1..$y) $x = $x * 2 + $i $x Parse-RecDescent-1.967015/demo/demo_cpp.pl0000755000175000017500000000236611710167512020033 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # RECURSIVE #includes DURING A RECURSIVE DESCENT use Parse::RecDescent; sub loadfile($) { open FILE, $_[0] or die "Couldn't find included file: $_[0]\n"; my $contents = ; close FILE; return $contents; } %macro = (); sub demacro($) { my $text = shift; while (($macro,$defn) = each %macro ) { $text =~ s/$macro/$defn/; } return $text; } $grammar = q{ file : line(s) line : include | macrodef | linedir { $thisline = $item[1]; } | codeline { print "found: [$item[1]] at $thisline\n" } include : '#include' filename { print "pre: [$text] at $thisline\n"; $text = ::loadfile($item[-1]) . $text; Parse::RecDescent::LineCounter::resync $thisline; print "post: [$text] at $thisline\n"; } filename : '"' m#[a-z0-9_./-]+#i '"' { $return = $item[-2] } | '<' m#[a-z0-9_./-]+#i '>' { $return = $item[-2] } macrodef : '#define' /[a-z]\w*/i /.*/ { $::macro{$item[-2]} = $item[-1] } linedir: '#line' /\d+/ codeline : /.*\n/ { $return = ::demacro($item[-1]); } }; $parse = new Parse::RecDescent ($grammar); undef $/; $reinput = $input = <>; $parse->file($input) or die "Bad file! No biscuit!\n"; $parse->file($reinput) or die "Bad file! No biscuit!\n"; Parse-RecDescent-1.967015/demo/demo_simple.pl0000755000175000017500000000367611710167512020547 0ustar jtbraunjtbraunuse v5.10; use warnings; # WHO IS NEXT TO WHOM? use Parse::RecDescent; $grammar = q{ inputs : input(s) input : who_question "\n" {1} | is_question "\n" {1} | statement "\n" {1} | /bye|quit|exit/ { exit } | # ERROR IF NOT END OF TEXT | { print STDERR "resyncing\n" } { _error(@$_) foreach @{$thisparser->{errors}}; } statement: namelist are 'next' 'to' namelist { ::nextto $item[1], $item[6], $thisline; 1 } | who_question: 'who' are 'next' 'to' name '?' { ::whonextto $item[6] ; 1 } | is_question: 'is' name 'next' 'to' name '?' { ::isnextto($item[3], $item[6]); 1 } | namelist : name(s) 'and' namelist { [ @{$item[1]}, @{$item[3]} ] } | name(s) name : ...!'who' ...!'and' ...!are /[A-Za-z]+/ are : 'is' | 'are' }; $parse = new Parse::RecDescent ($grammar); $parse->{tokensep} = '[ \t]*'; $input = ''; print "> "; while (<>) { if (/^\.$/) { $parse->inputs($input) || print "huh?\n"; $input = '' } else { $input .= $_ } print "> "; } sub nextto($$$) { foreach $A ( @{$_[0]} ) { foreach $B ( @{$_[1]} ) { nexttoAB($A,$B,$_[2]); } } print "okay\n"; } sub nexttoAB($$$) { $nextto{$_[0]} or $nextto{$_[0]} = []; $nextto{$_[1]} or $nextto{$_[1]} = []; push @{$nextto{$_[0]}}, $_[1]; push @{$nextto{$_[1]}}, $_[0]; print "Learnt something from line $_[2]\n"; } sub whonextto($) { if (defined $nextto{$_[0]}) { print join(" and ", @{$nextto{$_[0]}}) . ".\n"; } else { print "sorry, I've never heard of $_[0].\n"; } } sub isnextto($$) { if (!$nextto{$_[0]}) { print "sorry, I've never heard of $_[0].\n"; } elsif (!$nextto{$_[1]}) { print "sorry, I've never heard of $_[1].\n"; } else { foreach $name (@{$nextto{$_[0]}}) { if ($name eq $_[1]) { print "yes\n"; return } } print "no\n"; } } Parse-RecDescent-1.967015/demo/demo_matchrule2.pl0000755000175000017500000000104311710167512021306 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw use vars qw($animal); use Parse::RecDescent; $RD_AUTOACTION = q{ $last = $item[1] }; $grammar = q { sequence: sequence: base (3..) { [ $item[1], @{$item[2]} ] } | base sequence { $item[2] } base: /[ACGT]/ after_A: /[C]/ after_C: /[AG]/ after_G: /[CT]/ after_T: /[G]/ }; $parser = new Parse::RecDescent( $grammar ) or die "bad grammar; bailing"; local $/; use AutoDump; show $parser->sequence(); __DATA__ AAACTTTAAAACGTGCGCACGTGTAAAAAA Parse-RecDescent-1.967015/demo/demo_embedding.pl0000755000175000017500000000173011710167512021161 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w use Parse::RecDescent; #$RD_TRACE=1; my $parser = Parse::RecDescent->new(<<'EOGRAMMAR'); file: item(s) item: directive | text directive: '<%' command arg(s?) m|/?%>| { $return = bless \%item, 'directive' } command: m|/?[a-z]\w*|i arg: argname '=' string { $return = \%item } | string { $return = \%item } | data { $return = \%item } argname: /[a-z]\w*/i string: '"' /[^\\"]*(\\.[^\\"]*)*/ '"' { $return = $item[2] } data: m|((?!/?%>)\S)+| text: /((?!<%).)+/s { $return = bless \$item[1], 'text' } EOGRAMMAR my $data = join '', ; my $info = $parser->file($data); use Data::Dumper; print Data::Dumper->Dump($info); __DATA__ <% If expr %> the if worked <% Else %> it didn't work <% /If %> > <% Include file="foo.txt" /%> > <% Run Function="myFunc" Attr="x" ...%> <% Arg Name="row" %> Name{name} <% /Arg %> <% /Run %> Parse-RecDescent-1.967015/demo/demo_errors.pl0000755000175000017500000000170511710167512020561 0ustar jtbraunjtbraunuse v5.10; use warnings; # THE ONLY TRUE MISTAKES ARE THE ONES YOU NEVER MAKE #BEGIN { # close STDERR and open STDERR, '>./STDERR' or die $!; #} use Parse::RecDescent; $grammar = q{ Para: Sentence(s) /\Z/ # Can also intercept the error messages like so: ## | { use Data::Dumper 'Dumper'; print "$_->[0]\n" for @{$thisparser->{errors}}; exit; } Sentence: Noun Verb | Verb Noun | Noun: Fish | Cat | 'dog' Verb: 'runs' Fish: 'fish' | Cat: 'cat' | }; $parse = new Parse::RecDescent ($grammar); while () { chomp; print "$_...\n"; $parse->Para($_); } __DATA__ rat runs dog runs cat purrs cat runs mouse squeaks Parse-RecDescent-1.967015/demo/demo_delete.pl0000755000175000017500000000161411663610756020520 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # DELETABLE PRODUCTIONS use Parse::RecDescent; $grammar = q{ commands : command(s) command : directive | evaluation directive: 'only prefix' { $thisparser->{deleted} = {infix=>1,postfix=>1} } | 'not prefix' { $thisparser->{deleted}{prefix} = 1 } evaluation: {deleted}{prefix}> op var var { print "prefix $item[3] $item[2] $item[4]\n" } | {deleted}{infix}> var op var { print "infix $item[2] $item[3] $item[4]\n" } | {deleted}{postfix}> var var op { print "postfix $item[2] $item[4] $item[3]\n" } op : '+' | '-' var : /(?!\d)\w+/ }; $parse = new Parse::RecDescent ($grammar); while () { $parse->commands($_); } __DATA__ a + b + a b a b + not prefix a + b + a b a b + only prefix a + b + a b a b + not prefix a + b + a b a b + Parse-RecDescent-1.967015/demo/demo.c0000644000175000017500000000017611710167512016772 0ustar jtbraunjtbraunint main() { int i; int k; int more_indented; int l; char *str = "/* this is no comment */"; return 0; } Parse-RecDescent-1.967015/demo/demo_another_Cgrammar.pl0000755000175000017500000005212211710167512022515 0ustar jtbraunjtbraun#!/usr/bin/perl -w #=============================================================================== # # FILE: csourceparser.pl # # USAGE: ./csourceparser.pl [Option] ... File ... # # # DESCRIPTION: Parse and extract specified elements from source-code # written in the C language # # OPTIONS: --- # REQUIREMENTS: Perl Version >= 5.8.0, Parse::RecDescent, Getopt::Long, Pod::Usage # BUGS: --- # NOTES: --- # AUTHOR: # COMPANY: FH-SWF # VERSION: 0.1.0 # CREATED: 07/10/05 12:34:53 CEST # REVISION: --- #=============================================================================== require 5.008000; use strict; use Parse::RecDescent; use Getopt::Long; use Pod::Usage; # use Data::Dumper 'Dumper'; #$::RD_HINT = 1; # Print hints on errors #$::RD_ERRORS = 1; # Print errors #$::RD_WARN = 1; #$::RD_TRACE = 1; # Print tracecode to STDERR #$::RD_AUTOSTUB = 1; ############################################################### # Grammar used to find and remove comments from C source code # ############################################################### my $decomment_grammar = <<'END_OF_DECOMMENT'; program : { @{$thisparser}{qw(code)} = () } part(s) {@{$thisparser}{code};} part : comment { $thisparser->{code} .= " "; } | C_code { $thisparser->{code} .= $item[1]; } | string { $thisparser->{code} .= qq("$item[1]"); } C_code : m{( [^"/]+ # one or more non-delimiters ( # then (optionally)... / # a potential comment delimiter [^*/] # which is not an actual delimiter )? # )+ # all repeated once or more }x string : m{" # a leading delimiter (( # zero or more... \\. # escaped anything | # or [^"] # anything but a delimiter )* ) "}x { $return = $1 } comment : m{\s* # optional whitespace // # comment delimiter [^\n]* # anything except a newline \n # then a newline }x | m{\s* # optional whitespace /\* # comment opener (?:[^*]+|\*(?!/))* # anything except */ \*/ # comment closer ([ \t]*)? # trailing blanks or tabs }x END_OF_DECOMMENT ###################################################################################### # Grammar used to parse C source code (without comments and preprocessor directives) # # #################################################################################### my $Cgrammar = <<'END_OF_C_GRAMMAR'; translation_unit: external_declaration(s) | external_declaration: function_definition | declaration | { if ($::opt_SKIPPEDLINES || (defined $::opt_VERBOSE and $::opt_VERBOSE >= 1 )) { print "Skipping line $thisline\n" # Try next line if possible... } } function_definition: declaration_specifiers(?) declarator declaration_list(?) compound_statement { if($::opt_FUNCTIONS) { $::functions_output .= ::flatten_list($item[1]); $::functions_output .= ::flatten_list($item[2]); $::functions_output .= ::flatten_list($item[3]) . ";\n"; } } declaration: declaration_specifiers init_declarator_list(?) ';' { if($::opt_DECLARATIONS) { $::declarations_output .= ::flatten_list($item[1]); $::declarations_output .= ::flatten_list($item[2]); $::declarations_output .= ::flatten_list($item[3]) . "\n"; } } declaration_list: declaration(s) declaration_specifiers: type_qualifier declaration_specifiers(?) | storage_class_specifier declaration_specifiers(?) | type_specifier declaration_specifiers(?) storage_class_specifier: 'auto' | 'register' | 'static' | 'extern' | 'typedef' type_specifier: 'int' | 'double' | 'void' | 'char' | 'long' | 'float' | 'signed' | 'unsigned' | 'short' | struct_or_union_specifier | enum_specifier | typedef_name ...typedef_name_lookahead { [$item[1] ] } typedef_name_lookahead: declarator # | pointer # | ',' ...parameter_type_list # | ')' type_qualifier: 'const' | 'volatile' struct_or_union_specifier: struct_or_union IDENTIFIER(?) '{' struct_declaration_list(?) '}' { if($::opt_STRUCTS){ $::structs_output .= ::flatten_list($item[1]) . " "; $::structs_output .= ::flatten_list($item[2]); $::structs_output .= ::flatten_list($item[3]) . "\n"; $::structs_output .= ::flatten_list_beautified($item[4]); $::structs_output .= ::flatten_list($item[5]) . ";\n\n"; } } | struct_or_union IDENTIFIER struct_or_union: 'struct' | 'union' struct_declaration_list: struct_declaration(s) init_declarator_list: init_declarator(s /(,)/) init_declarator: declarator '=' initializer | declarator struct_declaration: specifier_qualifier_list struct_declarator_list ';' specifier_qualifier_list: type_specifier specifier_qualifier_list(?) | type_qualifier specifier_qualifier_list(?) struct_declarator_list: struct_declarator(s /(,)/) struct_declarator: declarator(?) ':' constant_expression | declarator enum_specifier: 'enum' IDENTIFIER(?) '{' enumerator_list '}' { if($::opt_STRUCTS){ $::structs_output .= ::flatten_list($item[1]) . " "; $::structs_output .= ::flatten_list($item[2]); $::structs_output .= ::flatten_list($item[3]) . "\n"; $::structs_output .= ::flatten_list_beautified($item[4]); $::structs_output .= ::flatten_list($item[5]) . ";\n\n"; } } | 'enum' IDENTIFIER enumerator_list: enumerator(s /(,)/) enumerator: IDENTIFIER ('=' constant_expression)(?) declarator: pointer(?) direct_declarator function_signature: '[' constant_expression(?) ']' | '(' parameter_type_list ')' | '(' identifier_list(?) ')' direct_declarator: IDENTIFIER function_signature(s?) | '(' declarator ')' function_signature(s?) pointer: '*' type_qualifier_list(?) pointer(?) type_qualifier_list: type_qualifier(s) parameter_type_list: parameter_list (',' '...')(?) parameter_list: parameter_declaration(s /(,)/) parameter_declaration: declaration_specifiers declarator | declaration_specifiers abstract_declarator(?) identifier_list: IDENTIFIER(s /(,)/) initializer: assignment_expression | '{' initializer_list (',')(?) '}' initializer_list: initializer(s /(,)/) type_name: specifier_qualifier_list abstract_declarator(?) abstract_declarator: pointer(?) direct_abstract_declarator | pointer abstract_type: '[' constant_expression(?) ']' | '(' parameter_type_list(?) ')' direct_abstract_declarator: '(' abstract_declarator ')' abstract_type(s?) | abstract_type(s) typedef_name: IDENTIFIER statement: selection_statement | expression_statement | iteration_statement | compound_statement | jump_statement | labeled_statement labeled_statement: 'case' constant_expression ':' statement | IDENTIFIER ':' statement | 'default' ':' statement expression_statement: expression(?) ';' compound_statement: '{' declaration_list(?) statement_list(?) '}' statement_list: statement(s) selection_statement: 'if' '(' expression ')' statement ('else' statement)(?) | 'switch' '(' expression ')' statement iteration_statement: 'for' '(' expression(?) ';' expression(?) ';' expression(?) ')' statement | 'while' '(' expression ')' statement | 'do' statement 'while' '(' expression ')' jump_statement: 'return' expression(?) ';' | 'break' ';' | 'continue' ';' | 'goto' IDENTIFIER ';' expression: assignment_expression(s /(,)/) assignment_expression: unary_expression ASSIGNMENT_OPERATOR assignment_expression | conditional_expression conditional_expression: logical_OR_expression ('?' expression ':' conditional_expression)(?) constant_expression: conditional_expression logical_OR_expression: logical_AND_expression(s /(\|\|)/) logical_AND_expression: inclusive_OR_expression(s /(&&)/) inclusive_OR_expression: exclusive_OR_expression(s /(\|)/) exclusive_OR_expression: AND_expression(s /(\^)/) AND_expression: equality_expression(s /(&)/) equality_expression: relational_expression(s /(==|!=)/) relational_expression: shift_expression(s /(<=|>=|<|>)/) shift_expression: additive_expression(s /(<<|>>)/) additive_expression: multiplicative_expression(s /(\+|-)/) multiplicative_expression: cast_expression(s /(\*|\/|%)/) cast_expression: unary_expression | '(' type_name ')' cast_expression unary_expression: postfix_expression | '++' unary_expression | '--' unary_expression | 'sizeof' '(' type_name ')' | UNARY_OPERATOR cast_expression | 'sizeof' unary_expression postfix_expression: primary_expression postfix_expression_token(s?) postfix_expression_token: '[' expression ']' | '(' argument_expression_list(?)')' | '.' IDENTIFIER | '->' IDENTIFIER | '++' | '--' primary_expression: IDENTIFIER | constant | STRING | '(' expression ')' argument_expression_list: assignment_expression(s /(,)/) constant: CHARACTER_CONSTANT | FLOATING_CONSTANT | INTEGER_CONSTANT | ENUMERATION_CONSTANT ### TERMINALS INTEGER_CONSTANT: /(?:0[xX][\da-fA-F]+) # Hexadecimal |(?:0[0-7]*) # Octal or Zero |(?:[1-9]\d*) # Decimal [uUlL]? # Suffix /x CHARACTER_CONSTANT: /'([^\\'"] # None of these |\\['\\ntvbrfa'"] # or a backslash followed by one of those |\\[0-7]{1,3}|\\x\d+)' # or an octal or hex constant /x FLOATING_CONSTANT: /(?:\d+|(?=\.\d+)) # No leading digits only if '.moreDigits' follows (?:\.|(?=[eE])) # There may be no floating point only if an exponent is present \d* # Zero or more floating digits ([eE][+-]?\d+)? # expontent [lLfF]? # Suffix /x ENUMERATION_CONSTANT: INTEGER_CONSTANT STRING: /"(([^\\'"]) # None of these |(\\[\\ntvbrfa'"]) # or a backslash followed by one of those |(\\[0-7]{1,3})|(\\x\d+))*"/x # or an octal or hex IDENTIFIER: /(?!(auto|break|case|char|const|continue|default|do|double|else|enum|extern|float|for|goto # LOOKAHEAD FOR KEYWORDS |if|int|long|register|return|signed|sizeof|short|static|struct|switch|typedef # NONE OF THE KEYWORDS |union|unsigned|void|volatile|while)[^a-zA-Z_]) # SHOULD FULLY MATCH! (([a-zA-Z]\w*)|(_\w+))/x # Check for valid identifier ASSIGNMENT_OPERATOR: '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|=' UNARY_OPERATOR: '&' | '*' | '+' | '-' | '~' | '!' END_OF_C_GRAMMAR #=== FUNCTION ================================================================ # NAME: flatten_list # DESCRIPTION: Extracts values from a recursive list. Double whitespaces will # be reduced # PARAMETER 1: Array Reference #=============================================================================== sub flatten_list { ( my $tokens = join ' ', map { ref($_) ? flatten_list(@$_) : ($_) } @_ ) =~ s/\s+/ /g; $tokens; } #=== FUNCTION ================================================================ # NAME: flatten_list_beautified # DESCRIPTION: Like flatten_list but inserts a newline after each semicolon # PARAMETER 1: Array Reference #=============================================================================== sub flatten_list_beautified { ( my $tokens = join ' ', map { ref($_) ? flatten_list(@$_) : ($_) } @_ ) =~ s/\s+/ /g; $tokens =~ s/;/;\n/g; $tokens =~ s/^\s*/\t/mg; $tokens; } #--------------------------------------------------------------------------# # Parsing variables # #--------------------------------------------------------------------------# my $decommentParser; # Parser using decomment grammar my $Cparser; # Parser using C grammar my $C_source; # C source code my $decommented_C_source; # C source code without comments my $preprocessed_C_source; # C source code without preprocessor directives # preprocessor directives are just removed, not evalued #--------------------------------------------------------------------------# # Command line options # #--------------------------------------------------------------------------# our $opt_HELP = ''; our $opt_SKIPPEDLINES = ''; our $opt_ERRORS = ''; our $opt_TRACE = ''; our $opt_CODE = ''; our $opt_VERBOSE = 0; our $opt_PRECOMPILE = ''; our $opt_FUNCTIONS = ''; our $opt_DECLARATIONS = ''; our $opt_STRUCTS = ''; Getopt::Long::Configure("bundling"); # Enables option bundling GetOptions( # Parse command line options 'help|h' => \$opt_HELP, # --help -h 'skippedlines|s' => \$opt_SKIPPEDLINES, # --skippedlines -s 'errors|e' => \$opt_ERRORS, # --errors -e 'trace|t' => \$opt_TRACE, # --trace -t 'code|c' => \$opt_CODE, # --code -c 'verbose|v+' => \$opt_VERBOSE, # --verbose -v 'functions|f' => \$opt_FUNCTIONS, # --functions -f 'declarations|d' => \$opt_DECLARATIONS, # --declarations -d 'precompile|p' => \$opt_PRECOMPILE, # --precompile -p 'structs|u' => \$opt_STRUCTS # --structs -u ); # variables for parser output our $functions_output = ''; our $declarations_output = ''; our $structs_output = ''; $opt_HELP and pod2usage( -verbose => 2 ); # Set error reporting if ($opt_ERRORS) { $::RD_HINT = 1; # Print hints on errors $::RD_ERRORS = 1; # Print errors open( Parse::RecDescent::ERROR, ">errfile" ) or die "Can't open errfile: $!"; } # Die if no input files present @ARGV or pod2usage( -message => "Error: More arguments required.", -verbose => 0 ); { local $/; $C_source = <>; } # Set trace level if ( $opt_TRACE || ( $opt_VERBOSE >= 3 ) ) { $::RD_TRACE = 1; } # Generate precompiled parser modules if($opt_PRECOMPILE) { print("\nCreating precompiled parsers... \n"); Parse::RecDescent->Precompile( $decomment_grammar,"CSourceParser::DecommentGrammar" ); Parse::RecDescent->Precompile( $Cgrammar, "CSourceParser::Cgrammar" ); print("Done\n\n"); } #--------------------------------------------------------------------------# # Parse Level 1 (removes comments) # #--------------------------------------------------------------------------# if ( -e "DecommentGrammar.pm" ) { require DecommentGrammar; $decommentParser = new CSourceParser::DecommentGrammar or die "Malformed Decomment grammar!\n"; } else { $decommentParser = new Parse::RecDescent($decomment_grammar) or die "Malformed Decomment grammar!\n"; } defined( $decommented_C_source = $decommentParser->program($C_source) ) or die "Malformed C code found at parse level 1!\n"; #--------------------------------------------------------------------------# # Parse Level 2 (Removes preprocessor directives) # #--------------------------------------------------------------------------# open( PREPROCESS, "<", \$decommented_C_source ) # open string as filehandle or die "Can't open input string for parse level 2: $!"; $preprocessed_C_source = ""; my $skip_line = 0; foreach () { # Match preprocessor directives ... if ( m{\s* # Optional whitespace \# # Preprocessor opener \s* # Optional whitespace (?:(define|include|undef|ifdef|ifndef|if|endif|else|elif|line|error|pragma)\s) # Keyword followed by one or more whitespace .* # anything (optinal) }x || $skip_line ) { $skip_line = /.*\\[\n]/; # Ignore this line AND NEXT LINE ALSO $preprocessed_C_source .= " "; # if this line ends with backslash } else { $preprocessed_C_source .= $_; # OK, parse this line } } if ( $opt_CODE || ( $opt_VERBOSE >= 1 ) ) { # print sourcecode with linenumbers my $i = 1; foreach ( split( /\n/, $preprocessed_C_source ) ) { print "$i\t$_\n"; $i++; } } if ( $opt_VERBOSE >= 2 ) { $::RD_TRACE = 1; } #--------------------------------------------------------------------------# # Parse Level 3 (parses C code) # #--------------------------------------------------------------------------# $::RD_AUTOACTION = q { [ @item[1..$#item] ] }; # set default auto-action for grammar rules if ( -e "CGrammar.pm" ) { require CGrammar; $Cparser = new CSourceParser::CGrammar or die "Malformed C grammar!\n"; } else { $Cparser = new Parse::RecDescent($Cgrammar) or die "Malformed C grammar!\n"; } defined( $Cparser->translation_unit($preprocessed_C_source) ) or die "Malformed C code found at parse level 3!\n"; print "\nDefined Functions:\n\n$functions_output\n\n" if defined $functions_output and $opt_FUNCTIONS; print "\nDeclarations:\n\n$declarations_output\n\n" if defined $declarations_output and $opt_DECLARATIONS; print "\nStructures:\n\n$structs_output\n\n" if defined $structs_output and $opt_STRUCTS; __END__ #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Application Documentation #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% =head1 NAME csourceparser.pl - extract components from sourcecode written in the C programming language =head1 VERSION This documentation refers to csourceparser.pl version 0.1.0 =head1 SYNOPSIS B<./csourceparser.pl [OPTION] ... FILE ...> B Print signatures of functions defined in a C-file: ./csourceparser.pl -f myprog.c Print all declarations in a C-file: ./csourceparser.pl -d myprog.c Print all structures in a C-file: ./csourceparser.pl -u myheader.h =head1 REQUIRED ARGUMENTS One or more C-Sourcefiles to parse. =head1 OPTIONS =over 4 =item B<-c, --code> Show parsed source code with line numbers =item B<-d, --declarations> Prints (global) declarations in the sourcefile to stdout =item B<-e, --errors> Print error messages generated by Parse::RecDescent to the file 'errfile' =item B<-f, --functions> Prints the signatures of functions defined in the source file to stdout (the function bodys are ommited) =item B<-h, --help> Print this help =item B<-p, --precompile> Generate precompiled parsers Cgrammar.pm and DecommentGrammar.pm in the current working directory. Precompiled parsers will speed up parsing. If these files are available in the current working directory they will be used automatically. Every time the --precompile option is set the precompiled parsers are generateted newly so this option should only be used only once. Also, don't forget to recreate the precompiled parseres if you modify the grammar. =item B<-s, --skippedlines> Show which lines had been skipped during parse due to parser errors or unrecognized tokens in the C source code =item B<-t, --trace> Print full tracecode generated by Parse::RecDescent. Note: this can be B =item B<-u, --structs> Print all structs and unions defined in the C sourcefile =item B<-v, --verbose> Each use encreases verbosity level by one. =over 4 =item Level 1: Print parsed sourcecode and skipped lines (same as I<-sc>) =item Level 2: Print tracecode of parse level 3 (C code without preprocessor directives and comments) =item Level 3: Print full tracecode. =back =back =head1 DIAGNOSTICS If you don't get the output you expect try to set the -sc options to see what happens. When the parser can't handle the input it will silently get skipped (for empty lines this is a normal behaviour). If the parser doesn't behave as you expect take a look at the tracecode, e.g. ./csourceparser -t file.c 2> trace for full tracecode or ./csourceparser -vv file.c 2> trace to see only parser level 3 (C-parser) trace code. Depending on the size of the input file(s) this could take some time and may occupy some hd-space. Please refer to the Parse::RecDescent documentation if you get errors after modifying the grammar. =head1 DEPENDENCIES =over 4 =item * Perl >= v5.00800 =item * Parse::RecDescent =item * Getopt::Long =item * Pod::Usage =head1 BUGS AND LIMITATIONS Macros in C-Files could not be parsed at this time. Any declarations and definitions containing macros my cause errors. Perhaps this feature could be implemented in the future by using a "real" preprocessor like m4. Please report problems to Hendrik Sirges (hendrik.sirges at fh-swf.de) Patches are welcome. =head1 AUTHOR Hendrik Sirges =head1 LICENCE AND COPYRIGHT This program is Copyright 2005 by Hendrik Sirges. This program is free software; you can redistribute it and/or modify it under the terms of the Perl Artistic License or the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. If you do not have a copy of the GNU General Public License write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. =cut Parse-RecDescent-1.967015/demo/demo_whoson.pl0000755000175000017500000000516511710167512020566 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # WHO'S ON FIRST? use vars qw( @base @man @attempt ); # This is a comment use Parse::RecDescent; sub Parse::RecDescent::choose { $_[int rand @_]; } $abbott = new Parse::RecDescent <<'EOABBOTT'; Interpretation: ConfirmationRequest | NameRequest | BaseRequest ConfirmationRequest: Preface(s?) Name /[i']s on/ Base { (lc $::man{$item[4]} eq lc $item[2]) ? "Yes" : "No, $::man{$item[4]}\'s on $item[4]" } | Preface(s?) Name /[i']s the (name of the)?/ Man /('s name )?on/ Base { (lc $::man{$item[6]} eq lc $item[2]) ? "Certainly" : "No. \u$item[2] is on " . $::base{lc $item[2]} } BaseRequest: Preface(s?) Name /(is)?/ { "He's on " . $::base{lc $item[2]} } NameRequest: /(What's the name of )?the/i Base "baseman" { $::man{$item[2]} } Preface: ...!Name /\S*/ Name: /Who/i | /What/i | /I Don't Know/i Base: 'first' | 'second' | 'third' Man: 'man' | 'guy' | 'fellow' EOABBOTT $costello = new Parse::RecDescent <<'EOCOSTELLO'; Interpretation: Meaning {prev}> { $thisparser->{prev} = $item[1] } | { choose(@::attempt) } Meaning: Question | UnclearReferent | NonSequitur | { choose(@::attempt) } Question: Preface Interrogative /[i']s on/ Base { choose ("Yes, what is the name of the guy on $item[4]?", "The $item[4] baseman?", "I'm asking you! $item[2]?", "I don't know!") } | Interrogative { choose ("That's right, $item[1]?", "What?", "I don't know!") } UnclearReferent: "He's on" Base { choose ("Who's on $item[2]?", "Who is?", "So, what is the name of the guy on $item[2]?" ) } NonSequitur: ( "Yes" | 'Certainly' | /that's correct/i ) { choose("$item[1], who?", "What?", @::attempt) } Interrogative: /who/i | /what/i Base: 'first' | 'second' | 'third' Preface: ...!Interrogative /\S*/ EOCOSTELLO %man = ( first => "Who", second => "What", third => "I Don't Know" ); %base = map { lc } reverse %man; @attempt = ( "So, who's on first?", "I want to know who's on first?", "What's the name of the first baseman?", "Let's start again. What's the name of the guy on first?", "Okay, then, who's on second?", "Well then, who's on third?", "What's the name of the fellow on third?", ); $costello->{prev} = $line = "Who's on first?"; while (1) { print " ", $line, "\n"; $line = $abbott->Interpretation($line); sleep 1; print " ", $line, "\n"; $line = $costello->Interpretation($line); sleep 1; } Parse-RecDescent-1.967015/demo/demo_tokens.pl0000755000175000017500000000105711710167512020550 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w use Parse::RecDescent; use Data::Dumper; my $lexer = new Parse::RecDescent q { lex: token(s) token: /(I|you|he)\b/i | /(is|are)\b/ | 'dumbest' | 'Bill-loving' | 'clearly' | /the\b|a\b/ | /\w+/ | /\S+/ }; my $tokens = $lexer->lex(join "", ); print Data::Dumper->Dump($tokens); __DATA__ You are clearly the dumbest, Bill-loving script-kiddie I have ever seen! Parse-RecDescent-1.967015/demo/demo_codeblock.pl0000755000175000017500000000043711710167512021173 0ustar jtbraunjtbraun#! /usr/bin/perl -w use Parse::RecDescent; $RD_TRACE=1; my $parse = Parse::RecDescent->new(do{local$/;}); while (<>) { use Data::Dumper 'Dumper'; print Dumper [ $parse->line($_) ]; } __DATA__ line: block | call* block: call: "foo" Parse-RecDescent-1.967015/demo/demo_textgen.pl0000755000175000017500000001201311710167512020715 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws $|++; use Parse::RecDescent; # $::RD_TRACE = 1; my $start = "START"; # start symbol (my $parser = Parse::RecDescent->new(<<'END_OF_GRAMMAR')) or die "bad!"; ## return hashref ## { ident => { ## is => [ ## [weight => item, item, item, ...], ## [weight => item, item, item, ...], ... ## ], ## defined => { line-number => times } ## used => { line-number => times } ## }, ... ## } ## item is " literal" or ident ## ident is C-symbol or number (internal for nested rules) { my %grammar; my $internal = 0; } grammar: rule(s) /\Z/ { \%grammar; } ## rule returns identifier (not used) rule: identifier ":" defn ';' { push @{$grammar{$item[1]}{is}}, @{$item[3]}; $grammar{$item[1]}{defined}{$itempos[1]{line}{to}}++; $item[1]; } | ## defn returns listref of choices defn: ## choice returns a listref of [weight => @items] choice: unweightedchoice { [ 1 => @{$item[1]} ] } | /\d+(\.\d+)?/ /\@/ unweightedchoice { [ $item[1] => @{$item[3]} ] } ## unweightedchoice returns a listref of @items unweightedchoice: item(s) item: quoted_string | identifier ...!/:/ { $grammar{$item[1]}{used}{$itempos[1]{line}{to}}++; $item[1]; # non-leading space flags an identifier } | "(" defn ")" { # parens for recursion, gensym an internal ++$internal; push @{$grammar{$internal}{is}}, @{$item[2]}; $internal; } | quoted_string: /"/ quoted_char(s?) /"/ { " " . join "", @{$item[3]} # leading space flags a string } ## this should be expanded, but it works for this grammar :) quoted_char: /[^\\"]+/ | /\\n/ { "\n" } | /\\"/ { "\"" } identifier: /[A-Za-z_]\w*/ END_OF_GRAMMAR my @data = ; for (@data) { s/^\s*#.*//; } (my $parsed = $parser->grammar(join '', @data)) or die "bad parse"; for my $id (sort keys %$parsed) { next if $id =~ /^\d+$/; # skip internals my $id_ref = $parsed->{$id}; unless (exists $id_ref->{defined}) { print "$id used in @{[sort keys %{$id_ref->{used}}]} but not defined - FATAL\n"; } unless (exists $id_ref->{used} or $id eq $start) { print "$id defined in @{[sort keys %{$id_ref->{defined}}]} but not used - WARNING\n"; } } use Data::Dumper; print Dumper($parsed); show($start); sub show { my $defn = shift; die "missing defn for $defn" unless exists $parsed->{$defn}; my @choices = @{$parsed->{$defn}{is}}; my $weight = 0; my @keeper = (); while (@choices) { my ($thisweight, @thisitem) = @{pop @choices}; $thisweight = 0 if $thisweight < 0; # no funny stuff $weight += $thisweight; @keeper = @thisitem if rand($weight) < $thisweight; } for (@keeper) { ## should be a list of ids or defns die "huh $_ in $defn" if ref $defn; if (/^ (.*)/s) { print $1; } elsif (/^(\w+)$/) { show($1); } else { die "Can't show $_ in $defn\n"; } } } __END__ START: stanza "\n---\n" stanza "\n---\n" stanza; stanza: stanza " " exclaim " " stanza2 | stanza2; stanza2: sentence " " comparison " " question | sentence " " comparison | comparison " " comparison " " exclaim | address " " question " " question " " sentence; sentence: sentence "\n" sentence2 | sentence2; sentence2: "The " adjectiveNotHep " " personNotHep " " verbRelating " the " adjectiveHep " " personHep "." | "The " personHep " " verbRelating " the " adjectiveNotHep ", " adjectiveNotHep " " personNotHep "."; question: question " " question2 | question2; question2: ques_start " " adjectiveHep " " personNotHep "?" | ques_start " " adjectiveNotHep " " personHep "?"; comparison: comparison " " comparison2 | comparison2; comparison2: "One says '" compNotHep "' while the other says '" compHep "'." | "One thinks '" compNotHep "' while the other thinks '" compHep "'." | "They shout '" compNotHep "!' And we shout '" compHep "'." | "It's " compNotHep " versus " compHep "!" ; personNotHep: "capitalist" | "silk purse man" | "square" | "banker" | "Merchant King" | "pinstripe suit" ; personHep: "cat" | "beat soul" | "wordsmith" | "hep cat" | "free man" | "street poet" | "skin beater" | "reed man" ; adjectiveNotHep: "soul-sucking" | "commercial" | "cash-counting" | "bloody-handed" | "four-cornered" | "uncool" | "love-snuffing"; adjectiveHep: "love-drunk" | "cool, cool" | "happening" | "tuned-in" | "street wise" | "wise and learned"; verbRelating: "begrudges" | "fears" | "distresses" | "dodges" | "dislikes" | "evades" | "curses" | "belittles" | "avoids" | "battles"; compNotHep: "recreation" | "isolation" | "tranportation" | "sacred nation" | "complication" | "subordination"; compHep: "fornication" | "instigation" | "interpretation" | "elevation" | "animation" | "inebriation" | "true relation"; ques_start: 2 @ (5 @ "Could there ever" | 7 @ "How could there") " be a" | "Can you picture a" ; address: "Catch this:" | "Listen, cats," | "Dig it:" | "I lay this on you:"; exclaim: "Heavy, man."| "Heavy." | "Yow!" | "Snap 'em for me." | "Dig it."; Parse-RecDescent-1.967015/demo/demo_Cgrammar_v2.pl0000755000175000017500000001612011720262706021404 0ustar jtbraunjtbraun# This improved version of the C grammar was provided by Joe Buehler use Parse::RecDescent; local $/; my $grammar = ; my $parser = Parse::RecDescent->new($grammar); my $text = <>; my $parse_tree = $parser->translation_unit($text) or die "bad C code"; use Data::Dumper 'Dumper'; warn Dumper [ $parse_tree ]; __DATA__ primary_expression: IDENTIFIER | CONSTANT | STRING_LITERAL | '(' expression ')' postfix_expression_post: '[' expression ']' | '(' argument_expression_list(?) ')' | '.' IDENTIFIER | '->' IDENTIFIER | '++' | '--' postfix_expression: primary_expression postfix_expression_post(s?) argument_expression_list: assignment_expression (',' assignment_expression)(s?) unary_expression: postfix_expression | '++' unary_expression | '--' unary_expression | unary_operator cast_expression | SIZEOF unary_expression | SIZEOF '(' type_name ')' unary_operator: /[-&*+!~]/ cast_expression: '(' type_name ')' cast_expression | unary_expression multiplicative_expression_op: /[*\/%]/ multiplicative_expression: cast_expression (multiplicative_expression_op cast_expression)(s?) additive_expression_op: /[-+]/ additive_expression: multiplicative_expression (additive_expression_op multiplicative_expression)(s?) shift_expression_pre_op: /(<<|>>)(?!=)/ shift_expression: additive_expression (shift_expression_pre_op additive_expression)(s?) relational_expression_op: /(<=|>=|<|>)/ relational_expression: shift_expression (relational_expression_op shift_expression)(s?) equality_expression_pre_op: /==|!=/ equality_expression: relational_expression (equality_expression_pre_op relational_expression)(s?) and_expression: equality_expression ('&' equality_expression)(s?) exclusive_or_expression: and_expression ('^' and_expression)(s?) inclusive_or_expression: exclusive_or_expression ('|' exclusive_or_expression)(s?) logical_and_expression: inclusive_or_expression ('&&' inclusive_or_expression)(s?) logical_or_expression: logical_and_expression ('||' logical_and_expression)(s?) conditional_expression: logical_or_expression ('?' expression ':' conditional_expression)(?) assignment_expression: conditional_expression | unary_expression assignment_operator assignment_expression assignment_operator: /=(?!=)|\+=|\&=|\/=|<<=|\%=|\*=|\|=|>>=|-=|\^=/ expression: assignment_expression (',' assignment_expression)(s?) constant_expression: conditional_expression declaration: declaration_specifiers init_declarator_list ';' | declaration_specifiers ';' declaration_specifiers: ( storage_class_specifier | type_specifier | type_qualifier )(s) init_declarator_list: init_declarator (',' init_declarator)(s?) init_declarator: declarator '=' initializer | declarator storage_class_specifier: /(typedef|extern|static|auto|register)(?![a-zA-Z0-9_])/ type_specifier: /(void|char|short|int|long|float|double|signed|unsigned)(?![a-zA-Z0-9_])/ | struct_or_union_specifier | enum_specifier | TYPE_NAME struct_or_union_specifier: struct_or_union IDENTIFIER '{' struct_declaration_list '}' | struct_or_union '{' struct_declaration_list '}' | struct_or_union IDENTIFIER struct_or_union: /(struct|union)(?![a-zA-Z0-9_])/ struct_declaration_list: struct_declaration(s) struct_declaration: specifier_qualifier_list struct_declarator_list ';' specifier_qualifier_list: ( type_specifier | type_qualifier )(s) struct_declarator_list: struct_declarator (',' struct_declarator)(s?) struct_declarator: declarator ':' constant_expression | declarator | ':' constant_expression enum_specifier: ENUM '{' enumerator_list '}' | ENUM IDENTIFIER '{' enumerator_list '}' | ENUM IDENTIFIER enumerator_list: enumerator (',' enumerator)(s?) enumerator: IDENTIFIER '=' constant_expression | IDENTIFIER type_qualifier: /(const|volatile)(?![a-zA-Z0-9_])/ declarator: pointer(?) direct_declarator direct_declarator_pre: IDENTIFIER | '(' declarator ')' direct_declarator_post: '[' constant_expression ']' | '[' ']' | '(' parameter_type_list ')' | '(' identifier_list ')' | '(' ')' direct_declarator: direct_declarator_pre direct_declarator_post(s?) pointer: '*' type_qualifier_list(?) pointer(?) type_qualifier_list: type_qualifier(s) parameter_type_list: parameter_list ',' '...' | parameter_list parameter_list: parameter_declaration (',' parameter_declaration)(s?) parameter_declaration: declaration_specifiers declarator | declaration_specifiers abstract_declarator(?) identifier_list: IDENTIFIER (',' IDENTIFIER)(s?) type_name: specifier_qualifier_list abstract_declarator(?) abstract_declarator: pointer direct_abstract_declarator(?) | direct_abstract_declarator direct_abstract_declarator_pre: '(' abstract_declarator ')' | direct_abstract_declarator_post direct_abstract_declarator_post: '[' ']' | '[' constant_expression ']' | '(' ')' | '(' parameter_type_list ')' direct_abstract_declarator: direct_abstract_declarator_pre direct_abstract_declarator_post(s?) initializer: assignment_expression | '{' initializer_list '}' | '{' initializer_list ',' '}' initializer_list: initializer (',' initializer)(s?) statement: labeled_statement | compound_statement | expression_statement | selection_statement | iteration_statement | jump_statement labeled_statement: IDENTIFIER ':' statement | /case(?![a-zA-Z0-9_])/ constant_expression ':' statement | /default(?![a-zA-Z0-9_])/ ':' statement compound_statement: '{' declaration_list statement_list(?) '}' | '{' statement_list(?) '}' declaration_list: declaration(s) statement_list: statement(s) expression_statement: expression(?) ';' selection_statement: IF '(' expression ')' statement ELSE statement | IF '(' expression ')' statement | SWITCH '(' expression ')' statement iteration_statement: WHILE '(' expression ')' statement | DO statement WHILE '(' expression ')' ';' | FOR '(' expression_statement expression_statement expression(?) ')' statement jump_statement: /goto(?![a-zA-Z0-9_])/ IDENTIFIER ';' | /continue(?![a-zA-Z0-9_])/ ';' | /break(?![a-zA-Z0-9_])/ ';' | /return(?![a-zA-Z0-9_])/ expression(?) ';' translation_unit: external_declaration(s) external_declaration: function_definition { print "*** function\n"; } | declaration { print "*** declaration\n"; } function_definition: declaration_specifiers declarator(?) declaration_list(?) compound_statement reserved_word: /(auto|break|case|char|const|continue|default|do|double|enum|extern|float|for|goto|if|int|long|register|return|short|signed|sizeof|static|struct|switch|typedef|union|unsigned|void|volatile|while)(?![a-zA-Z0-9_])/ CONSTANT: /[+-]?(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?/ DO: /do(?![a-zA-Z0-9_])/ ELSE: /else(?![a-zA-Z0-9_])/ ENUM: /enum(?![a-zA-Z0-9_])/ FOR: /for(?![a-zA-Z0-9_])/ IDENTIFIER: ...!reserved_word /[a-zA-Z_][a-zA-Z_0-9]*/ IF: /if(?![a-zA-Z0-9_])/ SIZEOF: /sizeof(?![a-zA-Z0-9_])/ STRING_LITERAL: { extract_delimited($text,'"') } SWITCH: /switch(?![a-zA-Z0-9_])/ TYPE_NAME: ...!reserved_word /[a-zA-Z_][a-zA-Z_0-9]*/ ...IDENTIFIER | ...!reserved_word /[a-zA-Z_][a-zA-Z_0-9]*/ .../[:*)]/ WHILE: /while(?![a-zA-Z0-9_])/ Parse-RecDescent-1.967015/demo/demo_methods.pl0000755000175000017500000000143011663610756020715 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # DEMONSTRATE USING DERIVED CLASS METHODS IN A PARSER package MyParser; use Parse::RecDescent; @ISA = qw { Parse::RecDescent }; # CALL THIS METHOD THROUGH AN OBJECT... sub print_cat ($$) { print "CAT: $_[1]\n"; } # CALL THESE METHODS DIRECTLY (MUST QUALIFY)... sub print_dog ($) { print "DOG: $_[0]\n"; } sub print_rat ($) { print "RAT: $_[0]\n"; } package main; $grammar = q{ inputs : input(s) | input : "cat" { $thisparser->print_cat($item[1]); } | "dog" { MyParser::print_dog($item[1]); } | "rat" # ERROR (UNQUALIFIED METHOD CALL) { print_rat($item[1]); } }; $parse = new MyParser ($grammar); print "> "; while (<>) { $parse->inputs($_) || print "huh?\n"; print "> "; } Parse-RecDescent-1.967015/demo/demo_Cgrammar.pl0000755000175000017500000002412311710167512020775 0ustar jtbraunjtbraunuse Parse::RecDescent; local $/; my $grammar = ; my $parser = Parse::RecDescent->new($grammar); my $text = <>; my $parse_tree = $parser->translation_unit($text) or die "bad C code"; __DATA__ primary_expression: IDENTIFIER | CONSTANT | STRING_LITERAL | '(' expression ')' postfix_expression: primary_expression | (primary_expression)(s) '[' expression ']' | (primary_expression)(s) '(' ')' | (primary_expression)(s) '(' argument_expression_list ')' | (primary_expression)(s) '.' IDENTIFIER | (primary_expression)(s) PTR_OP IDENTIFIER | (primary_expression)(s) INC_OP | (primary_expression)(s) DEC_OP argument_expression_list: (assignment_expression ',')(s?) assignment_expression unary_expression: postfix_expression | INC_OP unary_expression | DEC_OP unary_expression | unary_operator cast_expression | SIZEOF unary_expression | SIZEOF '(' type_name ')' unary_operator: '&' | '*' | '+' | '-' | '~' | '!' cast_expression: unary_expression | '(' type_name ')' cast_expression multiplicative_expression: (cast_expression mul_ex_op)(s?) cast_expression mul_ex_op : '*' | '/' | '%' additive_expression: (multiplicative_expression add_op)(s?) multiplicative_expression add_op : '+' | '-' shift_expression: (additive_expression shift_op )(s?) additive_expression shift_op : LEFT_OP | RIGHT_OP relational_expression: (shift_expression rel_op)(s?) shift_expression rel_op: '<' | '>' | LE_OP | GE_OP equality_expression: (relational_expression eq_ex_op)(s?) relational_expression eq_ex_op : EQ_OP | NE_OP and_expression: (equality_expression '&')(s?) equality_expression exclusive_or_expression: (and_expression '^')(s?) and_expression inclusive_or_expression: (exclusive_or_expression '|')(s?) exclusive_or_expression logical_and_expression: (inclusive_or_expression AND_OP)(s?) inclusive_or_expression logical_or_expression: (logical_and_expression OR_OP)(s?) logical_and_expression conditional_expression: logical_or_expression | logical_or_expression '?' expression ':' conditional_expression assignment_expression: conditional_expression | unary_expression assignment_operator assignment_expression assignment_operator: '=' | MUL_ASSIGN | DIV_ASSIGN | MOD_ASSIGN | ADD_ASSIGN | SUB_ASSIGN | LEFT_ASSIGN | RIGHT_ASSIGN | AND_ASSIGN | XOR_ASSIGN | OR_ASSIGN expression: (assignment_expression ',')(s?) assignment_expression constant_expression: conditional_expression declaration: declaration_specifiers ';' { print "We have a match!\n"; } | declaration_specifiers init_declarator_list ';' declaration_specifiers: storage_class_specifier | storage_class_specifier declaration_specifiers | type_specifier | type_specifier declaration_specifiers | type_qualifier | type_qualifier declaration_specifiers init_declarator_list: (init_declarator ',')(s?) init_declarator init_declarator: declarator | declarator '=' initializer storage_class_specifier: TYPEDEF | EXTERN | STATIC | AUTO | REGISTER type_specifier: VOID | CHAR | SHORT | INT | LONG | FLOAT | DOUBLE | SIGNED | UNSIGNED | struct_or_union_specifier | enum_specifier | TYPE_NAME struct_or_union_specifier: struct_or_union IDENTIFIER '{' struct_declaration_list '}' | struct_or_union '{' struct_declaration_list '}' | struct_or_union IDENTIFIER struct_or_union: STRUCT | UNION struct_declaration_list: struct_declaration(s) struct_declaration: specifier_qualifier_list struct_declarator_list ';' specifier_qualifier_list: type_specifier specifier_qualifier_list | type_specifier | type_qualifier specifier_qualifier_list | type_qualifier struct_declarator_list: (struct_declarator ',')(s?) struct_declarator struct_declarator: declarator | ':' constant_expression | declarator ':' constant_expression enum_specifier: ENUM '{' enumerator_list '}' | ENUM IDENTIFIER '{' enumerator_list '}' | ENUM IDENTIFIER enumerator_list: (enumerator ',')(s?) enumerator enumerator: IDENTIFIER | IDENTIFIER '=' constant_expression type_qualifier: CONST | VOLATILE declarator: pointer direct_declarator | direct_declarator direct_declarator: IDENTIFIER | '(' declarator ')' | (IDENTIFIER)(s?) ('(' declarator ')')(s?) '[' constant_expression ']' | (IDENTIFIER)(s?) ('(' declarator ')')(s?) '[' ']' | (IDENTIFIER)(s?) ('(' declarator ')')(s?) '(' parameter_type_list ')' | (IDENTIFIER)(s?) ('(' declarator ')')(s?) '(' identifier_list ')' | (IDENTIFIER)(s?) ('(' declarator ')')(s?) '(' ')' pointer: '*' | '*' type_qualifier_list | '*' pointer | '*' type_qualifier_list pointer type_qualifier_list: type_qualifier(s) parameter_type_list: parameter_list | parameter_list ',' ELLIPSIS parameter_list: (parameter_declaration ',')(s?) parameter_declaration parameter_declaration: declaration_specifiers declarator | declaration_specifiers abstract_declarator | declaration_specifiers identifier_list: (IDENTIFIER ',')(s?) IDENTIFIER type_name: specifier_qualifier_list | specifier_qualifier_list abstract_declarator abstract_declarator: pointer | direct_abstract_declarator | pointer direct_abstract_declarator direct_abstract_declarator: '(' abstract_declarator ')' | '[' ']' | '[' constant_expression ']' | DAD '[' ']' | DAD '[' constant_expression ']' | '(' ')' | '(' parameter_type_list ')' | DAD '(' ')' | DAD '(' parameter_type_list ')' DAD: #macro for direct_abstract_declarator ( '(' abstract_declarator ')' )(s?) ( '[' ']' )(s?) ( '[' constant_expression ']' )(s?) ( '(' ')' )(s?) ( '(' parameter_type_list ')' )(s?) initializer: assignment_expression | '{' initializer_list '}' | '{' initializer_list ',' '}' initializer_list: (initializer ',')(s?) initializer statement: labeled_statement | compound_statement | expression_statement | selection_statement | iteration_statement | jump_statement labeled_statement: IDENTIFIER ':' statement | CASE constant_expression ':' statement | DEFAULT ':' statement compound_statement: '{' '}' | '{' statement_list '}' | '{' declaration_list '}' | '{' declaration_list statement_list '}' declaration_list: declaration(s) statement_list: statement(s) expression_statement: ';' | expression ';' selection_statement: IF '(' expression ')' statement | IF '(' expression ')' statement ELSE statement | SWITCH '(' expression ')' statement iteration_statement: WHILE '(' expression ')' statement | DO statement WHILE '(' expression ')' ';' | FOR '(' expression_statement expression_statement ')' statement | FOR '(' expression_statement expression_statement expression ')' statement jump_statement: GOTO IDENTIFIER ';' | CONTINUE ';' | BREAK ';' | RETURN ';' | RETURN expression ';' translation_unit: external_declaration(s) external_declaration: function_definition | declaration function_definition: declaration_specifiers declarator declaration_list compound_statement | declaration_specifiers declarator compound_statement | declarator declaration_list compound_statement | declarator compound_statement # TERMINALS reserved_word: AUTO | BREAK | CASE | CHAR | CONST | CONTINUE | DEFAULT | DO | DOUBLE | ENUM | EXTERN | FLOAT | FOR | GOTO | IF | INT | LONG | REGISTER | RETURN | SHORT | SIGNED | SIZEOF | STATIC | STRUCT | SWITCH | TYPEDEF | UNION | UNSIGNED | VOID | VOLATILE | WHILE ADD_ASSIGN: '+=' AND_ASSIGN: '&=' AND_OP: '&&' AUTO: 'auto' BREAK: 'break' CASE: 'case' CHAR: 'char' CONST: 'const' CONSTANT: /[+-]?(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?/ CONTINUE: 'continue' DEC_OP: '--' DEFAULT: 'default' DIV_ASSIGN: '/=' DO: 'do' DOUBLE: 'double' ELLIPSIS: '...' ELSE: 'else' ENUM: 'enum' EQ_OP: '==' EXTERN: 'extern' FLOAT: 'float' FOR: 'for' GE_OP: '>=' GOTO: 'goto' IDENTIFIER: ...!reserved_word /[a-z]\w*/i IF: 'if' INC_OP: '++' INT: 'int' LEFT_ASSIGN: '<<=' LEFT_OP: '<<' LE_OP: '<=' LONG: 'long' MOD_ASSIGN: '%=' MUL_ASSIGN: '*=' NE_OP: '!=' OR_ASSIGN: '|=' OR_OP: '||' PTR_OP: '->' REGISTER: 'register' RETURN: 'return' RIGHT_ASSIGN: '>>=' RIGHT_OP: '>>' SHORT: 'short' SIGNED: 'signed' SIZEOF: 'sizeof' STATIC: 'static' STRING_LITERAL: { extract_delimited($text,'"') } STRUCT: 'struct' SUB_ASSIGN: '-=' SWITCH: 'switch' TYPEDEF: 'typedef' TYPE_NAME: # NONE YET UNION: 'union' UNSIGNED: 'unsigned' VOID: 'void' VOLATILE: 'volatile' WHILE: 'while' XOR_ASSIGN: '^=' Parse-RecDescent-1.967015/demo/demo_simpleXML.pl0000755000175000017500000000357711710167512021130 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w package XML2DS; use Parse::RecDescent; @ISA = qw( Parse::RecDescent ); sub allow_nested { my ($parser, $tag, @nestedtags) = @_; my $nestedtags = join '|', map { '^'.$_.'$' } @nestedtags; $parser->{allow}{$tag} = qr/$nestedtags/; } sub new { bless Parse::RecDescent->new(<<'EOGRAMMAR'), XML2DS; xml: unitag(?) | tag content[$item[1]](s) endtag[$item[1]] { bless $item[2], $item[1]} unitag: m{<([a-zA-Z]+)/>} { bless [], $1 } tag: m{<([a-zA-Z]+)>} { $return = $1 } endtag: m{} | m{(\S+)} but found $1 instead> content: content: rawtext check[$arg[0], $item[1]] | xml check[$arg[0], $item[1]] | block> rawtext: m{[^<]+} { bless \$item[1], 'rawtext' } check: { my ($outertag, $innertag) = ($arg[0], ref $arg[1]); $return = $arg[1] if !$thisparser->{allow}{$outertag} || $innertag =~ $thisparser->{allow}{$outertag}; $error = ($innertag eq 'rawtext') ? "Raw text not valid in <$outertag> block" : "<$innertag> tag not valid in <$outertag> block"; undef; } EOGRAMMAR } package main; use Data::Dumper; my $parser = XML2DS->new(); $parser->allow_nested( Test => qw(Example) ); $parser->allow_nested( Example => qw(Data rawtext) ); $parser->allow_nested( Data => qw(SubData rawtext) ); $parser->allow_nested( SubData => qw(Example rawtext) ); my $xml = join '', ; if (my $tree = $parser->xml($xml)) { print Data::Dumper->Dump([$tree]); } __DATA__ raw data raw subdata more raw data still more raw data nested example data last rawtext Parse-RecDescent-1.967015/demo/demo_street.pl0000755000175000017500000000077511710167512020561 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; my $street_type = join '|', qw { Street St\.? Road Rd Avenue Ave\.? Lane Way Highway Hwy }; sub Parse::RecDescent::street_name { print join('|', @_), "\n"; $_[1] =~ s/\A\s*(([A-Z]+\s+)+($street_type))//io; return $1; } my $parser = Parse::RecDescent->new(<<'EOGRAMMAR'); addr: /\d+[A-Z]?/i street_name { print "Number $item[1] in $item{street_name}\n" } EOGRAMMAR while (<>) { $parser->addr($_); } Parse-RecDescent-1.967015/demo/demo_bad.pl0000755000175000017500000000146011663610756020003 0ustar jtbraunjtbraunuse v5.10; use warnings; # SHOWCASE VARIOUS ERROR MESSAGES WITH A VERY UNWELL GRAMMAR use Parse::RecDescent; open (Parse::RecDescent::ERROR, ">-") or die; $grammar = q{ typedef : a ... ...! ... b typedef : a ...!...!...!...! b | /type/ ident /has/ field(s) /end type/ { $result = $item[2]; } | /type/ ident /is/ ident { $result = $item[2]; } | quasit(s) | quasit(-1..3) | quasit(..0) | field(?) field end | quasit "here" ...!/why/ Extend: extend Replace: replace $%^@# field : /field/ ident /is/ ident | field quasit | !quasit(s?) !| #NOTHING typedef : whatever package somewhereelse !ident : /[A-Za-z]\w*???/ { $result = $item[1]; } quasit : field | typedef }; new Parse::RecDescent ($grammar) || die "Bad grammar! No biscuit!\n"; Parse-RecDescent-1.967015/demo/demo_opreps.pl0000755000175000017500000000201511710167512020550 0ustar jtbraunjtbraun#! /usr/local/bin/perl -w use Parse::RecDescent; #$RD_TRACE=1; #$RD_HINT=1; my $parser = Parse::RecDescent->new(<<'EOG') or die; list1N: (s) list12: (..2) list2N: (2..) list01: (?) list: list0N: (s?) list23: (2..3) list03: (0..3) term: 't' EOG while () { print; print "\tlist:\t", @{$parser->list($_)||['undef']}, "\n"; print "\tlist01:\t", @{$parser->list01($_)||['undef']}, "\n"; print "\tlist0N:\t", @{$parser->list0N($_)||['undef']}, "\n"; print "\tlist1N:\t", @{$parser->list1N($_)||['undef']}, "\n"; print "\tlist2N:\t", @{$parser->list2N($_)||['undef']}, "\n"; print "\tlist23:\t", @{$parser->list23($_)||['undef']}, "\n"; print "\tlist12:\t", @{$parser->list12($_)||['undef']}, "\n"; print "\tlist03:\t", @{$parser->list03($_)||['undef']}, "\n"; print "-----\n"; } __DATA__ t t,t t,t,t t,t,t,t Parse-RecDescent-1.967015/demo/demo_arithmetic.pl0000755000175000017500000000137511710167512021401 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # RPN ARITHMETIC EXPRESSIONS use Parse::RecDescent; sub Parse::RecDescent::rpn { my @list = @_; for (my $i=1; $i<@list; $i+=2) { @list[$i,$i+1] = @list[$i+1,$i]; } join " ", @list; } # $RD_TRACE=1; $grammar = q{ expr : { rpn(@{$item[1]}) } conj : { rpn(@{$item[1]}) } addn : { rpn(@{$item[1]}) } mult : { rpn(@{$item[1]}) } expo : { rpn(@{$item[1]}) } unary : '(' expr ')' { $item[2] } | value value : /\d+(\.\d+)?/ }; $parse = new Parse::RecDescent ($grammar) or die "bad grammar"; while (<>) { print $parse->expr($_), "\n" if $_; } Parse-RecDescent-1.967015/demo/demo_restructure_painful.pl0000755000175000017500000000374311710167512023356 0ustar jtbraunjtbraun#!/usr/local/bin/perl -w # CONVERT FROM ONE EXTERNAL STRUCTURE TO A # SLIGHTLY DIFFERENT INTERNAL STRUCTURE # # RETURNING NESTED LISTS AND THEN MUNGING THEM IS PAINFUL # SEE demo_restructure_easy.pl FOR A BETTER TECHNIQUE use strict; use Parse::RecDescent; use Data::Dumper; my $grammar = q( { sub consolidate { my $aref = shift; my @a = @{$aref}; my %new = (); foreach my $a (@a) { my %h = %{$a}; foreach my $k (keys %h) { if (ref($h{$k})) { foreach my $k2 (keys %{$h{$k}}) { $new{$k}->{$k2} = $h{$k}->{$k2}; } } else { $new{$k} = $h{$k}; } } } return \%new; } } file: section(s) { $return = consolidate($item[1]) } section: header '{' body '}' { $return = { $item[1] => $item[3] } } header: 'Domain=' /.+/ { $return = $item[2] } body: line(s) { $return = consolidate($item[1]) } line: lineA | lineB { $return = $item[1] } lineA: /[^\W_]+/ '=' /.+/ { $return = { $item[1] => $item[3] } } lineB: /[^\W_]+/ '_' /[^\W_]+/ '=' /.+/ { $return = { $item[1] => { $item[3] => $item[5] } } } ); my $parser = Parse::RecDescent->new($grammar); my $text; my @text = ; foreach (@text) { next if /^\#/; # Strip comments $text .= $_; } my $f = $parser->file($text); print Dumper ($f); __DATA__ # # Domain=domain1 { P1_Name=n1 P1_Address=host1:port1 P2_Name=n2 P2_Address=host2:port2 } Domain=domain2 { f1=v1 f2=v2a v2b #comment } Parse-RecDescent-1.967015/demo/demo_itemhash.pl0000755000175000017500000000322211710167512021043 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; sub evalop { my (@list) = @{[@{$_[0]}]}; my $val = shift(@list)->(); while (@list) { my ($op, $arg2) = splice @list, 0, 2; $op->($val,$arg2->()); } return $val; } my $parse = Parse::RecDescent->new(<<'EndGrammar'); main: expr /\s*\Z/ { $item{expr}->() } | expr: /for(each)?/ lvar range expr { my ($vname,$expr) = @item{"lvar","expr"}; my ($from, $to) = @{$item{range}}; sub { my $val; no strict "refs"; for $$vname ($from->()..$to->()) { $val = $expr->() } return $val; } } | lvar '=' addition { my ($vname, $expr) = @item{"lvar","addition"}; sub { no strict 'refs'; $$vname = $expr->() } } | addition range: "(" expr ".." expr ")" { [ @item[2,4] ] } addition: { my $add = $item[1]; sub { ::evalop $add } } add_op: '+' { sub { $_[0] += $_[1] } } | '-' { sub { $_[0] -= $_[1] } } multiplication: { my $mult = $item[1]; sub { ::evalop $mult } } mult_op: '*' { sub { $_[0] *= $_[1] } } | '/' { sub { $_[0] /= $_[1] } } factor: number | rvar | '(' expr ')' { $item{expr} } number: /[-+]?\d+(\.\d+)?/ { sub { $item[1] } } lvar: /\$([a-z]\w*)/ { $1 } rvar: lvar { sub { no strict 'refs'; ${$item{lvar}} } } EndGrammar print "> "; while (<>) { # FOR DEMO CHANGE TO: while () print eval {$parse->main($_)}||"", "\n\n> "; } __DATA__ $x = 2 $y = 3 +1-1+1-1+1-1+1-1+1 7*7-6*8 121/(121/11)/121*11 1/(10-1/(1/(10-1))) $x * $y foreach $i (1..$y) $x = $x * 2 + $i $x Parse-RecDescent-1.967015/demo/demo_autorule.pl0000755000175000017500000000047411710167512021107 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws # use strict; use Parse::RecDescent; print 1 if Parse::RecDescent->new(<<'EOGRAMMAR')->file(join "", ); file: defn(s) defn: 'def' ident ':' block block: '{' item(s) '}' EOGRAMMAR __DATA__ def ident : { item item item } Parse-RecDescent-1.967015/demo/demo_calc.pl0000755000175000017500000000351411710167512020147 0ustar jtbraunjtbraunuse v5.10; use warnings; use Parse::RecDescent; sub evalop { my (@list) = @{$_[0]}; my $val = shift(@list)->(); while (@list) { my ($op, $arg2) = splice @list, 0, 2; $op->($val,$arg2->()); } return $val; } my $parse = Parse::RecDescent->new(<<'EndGrammar'); main: expr /\s*\Z/ { $item[1]->() } | expr: /for(each)?/ lvar range expr { my ($vname,$expr) = @item[2,4]; my ($from, $to) = @{$item[3]}; sub { my $val; no strict "refs"; for $$vname ($from->()..$to->()) { $val = $expr->() } return $val; } } | lvar '=' addition { my ($vname, $expr) = @item[1,3]; sub { no strict 'refs'; $$vname = $expr->() } } | addition range: "(" expr ".." expr ")" { [ @item[2,4] ] } addition: { my $add = $item[1]; sub { ::evalop $add } } add_op: '+' { sub { $_[0] += $_[1] } } | '-' { sub { $_[0] -= $_[1] } } multiplication: { my $mult = $item[1]; sub { ::evalop $mult } } mult_op: '*' { sub { $_[0] *= $_[1] } } | '/' { sub { $_[0] /= $_[1] } } factor: number | rvar | '(' expr ')' { $item[2] } number: /[-+]?\d+(\.\d+)?/ { sub { $item[1] } } lvar: /\$([a-z]\w*)/ { $1 } rvar: lvar { sub { no strict 'refs'; ${$item[1]} } } EndGrammar print "> "; while (<>) { # FOR DEMO CHANGE TO: while () print $parse->main($_), "\n\n> "; } __DATA__ $x = 2 $y = 3 +1-1+1-1+1-1+1-1+1 7*7-6*8 121/(121/11)/121*11 1/(10-1/(1/(10-1))) $x * $y foreach $i (1..$y) $x = $x * 2 + $i $x Parse-RecDescent-1.967015/demo/demo_scoredsep.pl0000755000175000017500000000114511663610757021245 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws use Parse::RecDescent; $RD_WARN = undef; my $parse = Parse::RecDescent->new(<<'EOGRAMMAR'); line: seplist[sep=>','] | seplist[sep=>':'] | seplist[sep=>" "] seplist: EOGRAMMAR while () { chomp; my $res = $parse->line($_); print '[', join('][', @$res), "]\n"; } __DATA__ c,o,m,m,a,s,e,p,a,r,a,t,e,d c:o:l:o:n:s:e:p:a:r:a:t:e:d s p a c e s e p a r a t e d m u:l t i,s:ep ar:a,ted m u:l,t i,s:ep ar:a,ted m:u:l,t i,s:ep ar:a,ted Parse-RecDescent-1.967015/demo/demo_operator.pl0000755000175000017500000000151211710167512021074 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # PARSE LOGICAL EXPRESSIONS use Parse::RecDescent; $grammar = q{ expr : '[' (?) ']' { $item[2] } | '[[' atomlist ']]' { $item[2] } | disj no_garbage { $item[1] } | atomlist: no_garbage: /^\s*$/ | disj : conj : and : 'and' unary : 'not' atom | '(' expr ')' { $item[2] } | atom atom : /(?!and|or)[a-z]/ num : /\d+/ }; $parse = new Parse::RecDescent ($grammar) or die "bad grammar"; $input = ''; use Data::Dumper; print "> "; while (<>) { if (/^\.$/) { print STDERR Data::Dumper->Dump([$parse->expr($input)]); $input = '' } else { $input .= $_ } print "> "; } Parse-RecDescent-1.967015/demo/demo_leftassoc.pl0000755000175000017500000000210411710167512021222 0ustar jtbraunjtbraun#! /usr/local/bin/perl -ws # THE COMMONEST REASON FOR WANTING LEFT RECURSION use strict; use Parse::RecDescent; $::RD_HINT = 1; my $parse = Parse::RecDescent->new(<<'EndGrammar'); main: expr /\Z/ { $item[1] } | expr: left_assoc[qw{term add_op term}] | term add_op: '+' { sub { $_[0] + $_[1] } } | '-' { sub { $_[0] - $_[1] } } term: left_assoc[qw{factor mult_op factor}] | factor mult_op: '*' { sub { $_[0] * $_[1] } } | '/' { sub { $_[0] / $_[1] } } factor: number | '(' expr ')' { $item[2] } number: /[-+]?\d+(\.\d+)?/ # THE BLACK MAGIC THAT MAKES IT WORK... left_assoc: left_assoc_left[@arg[0,1]](s) { my @terms = $item[1] ? ((map { @$_ } @{$item[1]}), $item[2]) : $item[1]; splice @terms, 0, 3, $terms[1]->(@terms[0,2]) while @terms>1; $terms[0]; } left_assoc_left: { [ @item[1..2] ] } EndGrammar while () { print $parse->main($_), "\n"; } __DATA__ +1-1+1-1+1-1+1-1+1 7*7-6*8 121/(121/11)/121*11 1/(10-1/(1/(10-1))) Parse-RecDescent-1.967015/demo/demo_template.pl0000755000175000017500000000126711663610757021076 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw use Parse::RecDescent; $grammar = q { list: /$arg{sep}/ > function: 'func' ident '(' list[rule=>'param',sep=>';'] ')' {{ name=>$item{ident}, param=>$item{list} }} param: list[rule=>'ident',sep=>','] ':' typename {{ vars=>$item{list}, type=>$item{typename} }} ident: /\w+/ typename: /\w+/ }; unless( $parser = new Parse::RecDescent( $grammar )) { die "bad grammar; bailing\n"; } while (defined($input = )) { use Data::Dumper; print Data::Dumper->Dump([$parser->function( $input )]); } __DATA__ func f (a,b,c:int; d:float; e,f:string) func g (x:int) func h (y;x) Parse-RecDescent-1.967015/demo/demo_randomsentence.pl0000755000175000017500000000562311663610757022270 0ustar jtbraunjtbraun#!/usr/bin/perl $|++; use Parse::RecDescent; my $parser = Parse::RecDescent->new(<<'END_OF_GRAMMAR'); grammar: rule(s) /\Z/ { [ map { $_->[0], $_->[1] } @{$item[1]} ] } rule: identifier ":" defn ";" { [ $item[1], $item[3] ] } | defn: choice (barchoice)(s?) { [ $item[1], @{$item[2]} ] } barchoice: "|" choice choice: item(s) item: quoted_string | identifier | quoted_string: /"(.*?)"/s { " ".substr($item[1],1,-1) } identifier: /[A-Za-z_]\w*/ END_OF_GRAMMAR (my $parsed = $parser->grammar(join '', )) or die "bad parse"; my $top = $parsed->[0]; my %defns = @$parsed; for (1..5) { show($top); print "\n----\n"; } sub show { my $defn = shift; die "missing defn for $defn" unless exists $defns{$defn}; my @choices = @{$defns{$defn}}; for (@{$choices[rand @choices]}) { ## should be a list of ids or defns die "huh $_ in $defn" if ref $defn; if (/^ (.*)/s) { print $1; } elsif (/^(\w+)$/) { show($1); } else { die "Can't show $_ in $defn\n"; } } } __END__ stanza: stanza exclaim stanza2 | stanza2; stanza2: sentence comparison question | sentence comparison | comparison comparison exclaim | address question question sentence; sentence: sentence sentence2 | sentence2; sentence2: "The " adjectiveNotHep personNotHep verbRelating "the " adjectiveHep personHep ". " | "The " personHep verbRelating "the " adjectiveNotHep ", " adjectiveNotHep personNotHep ". "; question: question question2 | question2; question2: ques_start adjectiveHep personNotHep "? " | ques_start adjectiveNotHep personHep "? "; comparison: comparison comparison2 | comparison2; comparison2: "One says '" compNotHep "' while the other says '" compHep "'. "| "One thinks '" compNotHep "' while the other thinks '" compHep "'. "| "They shout '" compNotHep "!' And we shout'" compHep "'. "| "It's " compNotHep " versus " compHep "! "; personNotHep: "capitalist" | "silk purse man" | "square" | "banker" | "Merchant King" | "pinstripe suit" ; personHep: "cat" | "beat soul" | "wordsmith" | "hep cat" | "free man" | "street poet" | "skin beater" | "reed man" ; adjectiveNotHep: "soul-sucking" | "commercial" | "cash-counting" | "bloody-handed" | "four-cornered" | "uncool" | "love-snuffing"; adjectiveHep: "love-drunk" | "cool, cool" | "happening" | "tuned-in" | "street wise" | "wise and learned"; verbRelating: "begrudges" | "fears" | "distresses" | "dodges" | "dislikes" | "evades" | "curses" | "belittles" | "avoids" | "battles"; compNotHep: "recreation" | "isolation" | "tranportation" | "sacred nation" | "complication" | "subordination"; compHep: "fornication" | "instigation" | "interpretation" | "elevation" | "animation" | "inebriation" | "true relation"; ques_start: "Could there ever be a "|"How could there be a "| "Can you picture a "; address: "Catch this: " | "Listen, cats, " | "Dig it: " | "I lay this on you: "; exclaim: "Heavy, man. "|"Heavy. " | "Yow! " | "Snap 'em for me. " | "Dig it. "; Parse-RecDescent-1.967015/demo/demo_lexer.pl0000755000175000017500000000111011663610757020365 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # IT'S A PARSER *AND* A LEXER... use Parse::RecDescent; use Data::Dumper; $lexer = new Parse::RecDescent q { lex: token(s) token: 'I\b' | 'see\b' | 'on\b' | 'by\b' | /the\b|a\b/i | /\w+/ }; $data = join '', ; print_tokens($lexer->lex(\$data)); print "left: [$data]\n"; sub print_tokens { foreach $token ( @{$_[0]} ) { print Dumper($token), "\n"; } } __DATA__ I see a cat on the windowsill by the door!!!!! Parse-RecDescent-1.967015/demo/demo_matchrule.pl0000755000175000017500000000141311663610757021240 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw use vars qw($animal); use Parse::RecDescent; $grammar = q { object: thing[article=>'a'](s) | thing[article=>'the'](s) thing: a: 'a' the: 'the' cat: 'cat' { print "meow\n"; $::animal = 'god' } dog: 'dog' { print "woof\n" } }; unless( $parser = new Parse::RecDescent( $grammar )) { die "bad grammar; bailing\n"; } $/ = ""; while (defined ($input = )) { $::animal = reverse 'cat'; print STDERR "parsing...\n"; unless( defined $parser->object( $input )) { die "error in input; bailing\n"; } print STDERR "...parsed\n"; } __DATA__ a cat a cat a cat a dog the cat the dog the dog the dog the dog a dog Parse-RecDescent-1.967015/demo/demo_OOparsetree.pl0000755000175000017500000000234111710167512021472 0ustar jtbraunjtbraun#! /usr/local/bin/perl -sw # PARSE AND EVALUATE LOGICAL EXPRESSIONS WITH A OO PARSE TREE $::RD_AUTOACTION = q{ bless [$item[-1]], $item[0] }; use Parse::RecDescent; my $parse = Parse::RecDescent->new(<<'EOG'); expr : set | clear | disj set : 'set' atom clear : 'clear' atom disj : { bless $item[-1], $item[0] } conj : { bless $item[-1], $item[0] } unary : neg | bracket | atom bracket : '(' expr ')' { $item[2] } neg : 'not' unary atom : /[a-z]+/i EOG while (<>) { my $tree = $parse->expr($_); print $tree->eval(), "\n" if $tree; } BEGIN {@var{qw(a c e)} = (1,1,1);} sub returning { # local $^W; # print +(caller(1))[3], " returning ($_[0])\n"; $_[0]; } sub expr::eval { returning $_[0][0]->eval() } sub disj::eval { returning join '', map {$_->eval()} @{$_[0]} } sub conj::eval { returning ! join '', map {! $_->eval()} @{$_[0]} } sub unary::eval { returning $_[0][0]->eval() } sub neg::eval { returning ! $_[0][0]->eval() } sub set::eval { returning $::var{$_[0][0]->name()} = 1 } sub clear::eval { returning $::var{$_[0][0]->name()} = 0 } sub atom::eval { returning $::var{$_[0][0]} } sub atom::name { returning $_[0][0] } Parse-RecDescent-1.967015/Makefile.PL0000755000175000017500000000235313070730450016730 0ustar jtbraunjtbraunuse strict; use warnings; use ExtUtils::MakeMaker 6.62; WriteMakefile( NAME => 'Parse::RecDescent', AUTHOR => [ 'Damian Conway ', 'Jeremy T. Braun ', ], VERSION_FROM => 'lib/Parse/RecDescent.pm', ABSTRACT_FROM => 'lib/Parse/RecDescent.pm', LICENSE => 'perl_5', PL_FILES => {}, BUILD_REQUIRES => { 'ExtUtils::MakeMaker' => 6.5702, }, CONFIGURE_REQUIRES => { 'ExtUtils::MakeMaker' => 6.5702, }, PREREQ_PM => { 'Text::Balanced' => 1.95, 'Test::More' => 0, }, dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', }, clean => { FILES => 'Parse-RecDescent-* RD_TRACE' }, META_MERGE => { "meta-spec" => { version => 2 }, resources => { repository => { type => 'git', url => 'git://github.com/jtbraun/Parse-RecDescent', web => 'https://github.com/jtbraun/Parse-RecDescent', }, bugtracker => { web => 'https://rt.cpan.org/Dist/Display.html?Status=Active&Queue=Parse-RecDescent', }, }, }, ); Parse-RecDescent-1.967015/MANIFEST0000644000175000017500000000372513070731552016114 0ustar jtbraunjtbraunBuild.PL Changes MANIFEST Makefile.PL README ToDo demo/demo.c demo/demo.pl demo/demo_Cgrammar.pl demo/demo_Cgrammar_v2.pl demo/demo_LaTeXish.pl demo/demo_LaTeXish_autoact.pl demo/demo_NL2SQL.pl demo/demo_OOautoparsetree.pl demo/demo_OOparsetree.pl demo/demo_PerlCSV.pl demo/demo_another_Cgrammar.pl demo/demo_arithmetic.pl demo/demo_autorule.pl demo/demo_autoscoresep.pl demo/demo_autostub.pl demo/demo_bad.pl demo/demo_buildcalc.pl demo/demo_calc.pl demo/demo_codeblock.pl demo/demo_cpp.pl demo/demo_decomment.pl demo/demo_decomment_nonlocal.pl demo/demo_delete.pl demo/demo_derived.pl demo/demo_dot.pl demo/demo_embedding.pl demo/demo_errors.pl demo/demo_eval.pl demo/demo_implicit.pl demo/demo_itemhash.pl demo/demo_language.pl demo/demo_leftassoc.pl demo/demo_leftop.pl demo/demo_lexer.pl demo/demo_lisplike.pl demo/demo_logic.pl demo/demo_matchrule.pl demo/demo_matchrule2.pl demo/demo_mccoy.pl demo/demo_metaRD.pm demo/demo_methods.pl demo/demo_operator.pl demo/demo_opreps.pl demo/demo_parsetree.pl demo/demo_perlparsing.pl demo/demo_piecewise.pl demo/demo_precalc.pl demo/demo_quicklist.pl demo/demo_randomsentence.pl demo/demo_recipe.pl demo/demo_restructure_easy.pl demo/demo_restructure_painful.pl demo/demo_scoredsep.pl demo/demo_selfmod.pl demo/demo_separators.pl demo/demo_simple.pl demo/demo_simpleXML.pl demo/demo_simplequery.pl demo/demo_skipcomment.pl demo/demo_street.pl demo/demo_template.pl demo/demo_textgen.pl demo/demo_tokens.pl demo/demo_undumper.pl demo/demo_whoson.pl lib/Parse/RecDescent.pm t/00.load.t t/01.basics.t t/autotree.t t/leftop_cap.t t/pod.t t/precompile.t t/re_capture_return.t t/reentry.t t/reproducible.t t/separated_repetition.t t/skip.t t/skip_dynamic.t t/text.t t/util.pl tutorial/TPJ-PRD-proc.gif tutorial/TPJ-yacc-proc.gif tutorial/TPJ_maze.gif tutorial/TPJ_tree.gif tutorial/tutorial.html META.yml Module YAML meta-data (added by MakeMaker) META.json Module JSON meta-data (added by MakeMaker) Parse-RecDescent-1.967015/lib/0000755000175000017500000000000013070731552015522 5ustar jtbraunjtbraunParse-RecDescent-1.967015/lib/Parse/0000755000175000017500000000000013070731552016574 5ustar jtbraunjtbraunParse-RecDescent-1.967015/lib/Parse/RecDescent.pm0000644000175000017500000066153413070730576021175 0ustar jtbraunjtbraun# GENERATE RECURSIVE DESCENT PARSER OBJECTS FROM A GRAMMAR use 5.006; use strict; package Parse::RecDescent; use Text::Balanced qw ( extract_codeblock extract_bracketed extract_quotelike extract_delimited ); use vars qw ( $skip ); *defskip = \ '\s*'; # DEFAULT SEPARATOR IS OPTIONAL WHITESPACE $skip = '\s*'; # UNIVERSAL SEPARATOR IS OPTIONAL WHITESPACE my $MAXREP = 100_000_000; # REPETITIONS MATCH AT MOST 100,000,000 TIMES #ifndef RUNTIME sub import # IMPLEMENT PRECOMPILER BEHAVIOUR UNDER: # perl -MParse::RecDescent - [runtimeclassname] { local *_die = sub { print @_, "\n"; exit }; my ($package, $file, $line) = caller; if ($file eq '-' && $line == 0) { _die("Usage: perl -MLocalTest - ") unless @ARGV >= 2 and $ARGV <= 3; my ($sourcefile, $class, $runtime_class) = @ARGV; local *IN; open IN, $sourcefile or _die(qq{Can't open grammar file "$sourcefile"}); local $/; # my $grammar = ; close IN; Parse::RecDescent->Precompile({ -runtime_class => $runtime_class }, $grammar, $class, $sourcefile); exit; } } sub Save { my $self = shift; my %opt; if ('HASH' eq ref $_[0]) { %opt = (%opt, %{$_[0]}); shift; } my ($class) = @_; $self->{saving} = 1; $self->Precompile(undef,$class); $self->{saving} = 0; } sub PrecompiledRuntime { my ($self, $class) = @_; my $opt = { -standalone => 1, -runtime_class => $class, }; $self->Precompile($opt, '', $class); } sub Precompile { my $self = shift; my %opt = ( -standalone => 0, ); if ('HASH' eq ref $_[0]) { %opt = (%opt, %{$_[0]}); shift; } my ($grammar, $class, $sourcefile) = @_; $class =~ /^(\w+::)*\w+$/ or croak("Bad class name: $class"); my $modulefile = $class; $modulefile =~ s/.*:://; $modulefile .= ".pm"; my $code = ''; local *OUT; open OUT, ">", $modulefile or croak("Can't write to new module file '$modulefile'"); print OUT "#\n", "# This parser was generated with\n", "# Parse::RecDescent version $Parse::RecDescent::VERSION\n", "#\n\n"; print STDERR "precompiling grammar from file '$sourcefile'\n", "to class $class in module file '$modulefile'\n" if $grammar && $sourcefile; if ($grammar) { $self = Parse::RecDescent->new($grammar, # $grammar 1, # $compiling $class # $namespace ) || croak("Can't compile bad grammar") if $grammar; # Do not allow &DESTROY to remove the precompiled namespace delete $self->{_not_precompiled}; foreach ( keys %{$self->{rules}} ) { $self->{rules}{$_}{changed} = 1; } $code = $self->_code(); } # If a name for the runtime package was not provided, # generate one based on the module output name and the generated # code if (not defined($opt{-runtime_class})) { if ($opt{-standalone}) { my $basename = $class . '::_Runtime'; my $name = $basename; for (my $i = 0; $code =~ /$basename/; ++$i) { $name = sprintf("%s%06d", $basename, $i); } $opt{-runtime_class} = $name; } else { my $package = ref $self; local $::RD_HINT = defined $::RD_HINT ? $::RD_HINT : 1; _hint(<) { if ($_ =~ /^\s*#\s*ifndef\s+RUNTIME\s*$/) { ++$exclude; } if ($exclude) { if ($_ =~ /^\s*#\s*endif\s$/) { --$exclude; } } else { if ($_ =~ m/^__END__/) { last; } # Standalone parsers shouldn't trigger the CPAN # indexer to index the runtime, as it shouldn't be # exposed as a user-consumable package. # # Trick the indexer by including a newline in the package declarations s/^package /package # this should not be indexed by CPAN\n/gs; s/Parse::RecDescent/$opt{-runtime_class}/gs; print OUT $_; } } close IN; print OUT "}\n"; } if ($grammar) { print OUT "package $class;\n"; } if (not $opt{-standalone}) { print OUT "use $opt{-runtime_class};\n"; } if ($grammar) { print OUT "{ my \$ERRORS;\n\n"; print OUT $code; print OUT "}\npackage $class; sub new { "; print OUT "my "; $code = $self->_dump([$self], [qw(self)]); $code =~ s/Parse::RecDescent/$opt{-runtime_class}/gs; print OUT $code; print OUT "}"; } close OUT or croak("Can't write to new module file '$modulefile'"); } #endif package Parse::RecDescent::LineCounter; sub TIESCALAR # ($classname, \$text, $thisparser, $prevflag) { bless { text => $_[1], parser => $_[2], prev => $_[3]?1:0, }, $_[0]; } sub FETCH { my $parser = $_[0]->{parser}; my $cache = $parser->{linecounter_cache}; my $from = $parser->{fulltextlen}-length(${$_[0]->{text}})-$_[0]->{prev} ; unless (exists $cache->{$from}) { $parser->{lastlinenum} = $parser->{offsetlinenum} - Parse::RecDescent::_linecount(substr($parser->{fulltext},$from)) + 1; $cache->{$from} = $parser->{lastlinenum}; } return $cache->{$from}; } sub STORE { my $parser = $_[0]->{parser}; $parser->{offsetlinenum} -= $parser->{lastlinenum} - $_[1]; return undef; } sub resync # ($linecounter) { my $self = tied($_[0]); die "Tried to alter something other than a LineCounter\n" unless $self =~ /Parse::RecDescent::LineCounter/; my $parser = $self->{parser}; my $apparently = $parser->{offsetlinenum} - Parse::RecDescent::_linecount(${$self->{text}}) + 1; $parser->{offsetlinenum} += $parser->{lastlinenum} - $apparently; return 1; } package Parse::RecDescent::ColCounter; sub TIESCALAR # ($classname, \$text, $thisparser, $prevflag) { bless { text => $_[1], parser => $_[2], prev => $_[3]?1:0, }, $_[0]; } sub FETCH { my $parser = $_[0]->{parser}; my $missing = $parser->{fulltextlen}-length(${$_[0]->{text}})-$_[0]->{prev}+1; substr($parser->{fulltext},0,$missing) =~ m/^(.*)\Z/m; return length($1); } sub STORE { die "Can't set column number via \$thiscolumn\n"; } package Parse::RecDescent::OffsetCounter; sub TIESCALAR # ($classname, \$text, $thisparser, $prev) { bless { text => $_[1], parser => $_[2], prev => $_[3]?-1:0, }, $_[0]; } sub FETCH { my $parser = $_[0]->{parser}; return $parser->{fulltextlen}-length(${$_[0]->{text}})+$_[0]->{prev}; } sub STORE { die "Can't set current offset via \$thisoffset or \$prevoffset\n"; } package Parse::RecDescent::Rule; sub new ($$$$$) { my $class = ref($_[0]) || $_[0]; my $name = $_[1]; my $owner = $_[2]; my $line = $_[3]; my $replace = $_[4]; if (defined $owner->{"rules"}{$name}) { my $self = $owner->{"rules"}{$name}; if ($replace && !$self->{"changed"}) { $self->reset; } return $self; } else { return $owner->{"rules"}{$name} = bless { "name" => $name, "prods" => [], "calls" => [], "changed" => 0, "line" => $line, "impcount" => 0, "opcount" => 0, "vars" => "", }, $class; } } sub reset($) { @{$_[0]->{"prods"}} = (); @{$_[0]->{"calls"}} = (); $_[0]->{"changed"} = 0; $_[0]->{"impcount"} = 0; $_[0]->{"opcount"} = 0; $_[0]->{"vars"} = ""; } sub DESTROY {} sub hasleftmost($$) { my ($self, $ref) = @_; my $prod; foreach $prod ( @{$self->{"prods"}} ) { return 1 if $prod->hasleftmost($ref); } return 0; } sub leftmostsubrules($) { my $self = shift; my @subrules = (); my $prod; foreach $prod ( @{$self->{"prods"}} ) { push @subrules, $prod->leftmostsubrule(); } return @subrules; } sub expected($) { my $self = shift; my @expected = (); my $prod; foreach $prod ( @{$self->{"prods"}} ) { my $next = $prod->expected(); unless (! $next or _contains($next,@expected) ) { push @expected, $next; } } return join ', or ', @expected; } sub _contains($@) { my $target = shift; my $item; foreach $item ( @_ ) { return 1 if $target eq $item; } return 0; } sub addcall($$) { my ( $self, $subrule ) = @_; unless ( _contains($subrule, @{$self->{"calls"}}) ) { push @{$self->{"calls"}}, $subrule; } } sub addprod($$) { my ( $self, $prod ) = @_; push @{$self->{"prods"}}, $prod; $self->{"changed"} = 1; $self->{"impcount"} = 0; $self->{"opcount"} = 0; $prod->{"number"} = $#{$self->{"prods"}}; return $prod; } sub addvar { my ( $self, $var, $parser ) = @_; if ($var =~ /\A\s*local\s+([%@\$]\w+)/) { $parser->{localvars} .= " $1"; $self->{"vars"} .= "$var;\n" } else { $self->{"vars"} .= "my $var;\n" } $self->{"changed"} = 1; return 1; } sub addautoscore { my ( $self, $code ) = @_; $self->{"autoscore"} = $code; $self->{"changed"} = 1; return 1; } sub nextoperator($) { my $self = shift; my $prodcount = scalar @{$self->{"prods"}}; my $opcount = ++$self->{"opcount"}; return "_operator_${opcount}_of_production_${prodcount}_of_rule_$self->{name}"; } sub nextimplicit($) { my $self = shift; my $prodcount = scalar @{$self->{"prods"}}; my $impcount = ++$self->{"impcount"}; return "_alternation_${impcount}_of_production_${prodcount}_of_rule_$self->{name}"; } sub code { my ($self, $namespace, $parser, $check) = @_; eval 'undef &' . $namespace . '::' . $self->{"name"} unless $parser->{saving}; my $code = ' # ARGS ARE: ($parser, $text; $repeating, $_noactions, \@args, $_itempos) sub ' . $namespace . '::' . $self->{"name"} . ' { my $thisparser = $_[0]; use vars q{$tracelevel}; local $tracelevel = ($tracelevel||0)+1; $ERRORS = 0; my $thisrule = $thisparser->{"rules"}{"' . $self->{"name"} . '"}; Parse::RecDescent::_trace(q{Trying rule: [' . $self->{"name"} . ']}, Parse::RecDescent::_tracefirst($_[1]), q{' . $self->{"name"} . '}, $tracelevel) if defined $::RD_TRACE; ' . ($parser->{deferrable} ? 'my $def_at = @{$thisparser->{deferred}};' : '') . ' my $err_at = @{$thisparser->{errors}}; my $score; my $score_return; my $_tok; my $return = undef; my $_matched=0; my $commit=0; my @item = (); my %item = (); my $repeating = $_[2]; my $_noactions = $_[3]; my @arg = defined $_[4] ? @{ &{$_[4]} } : (); my $_itempos = $_[5]; my %arg = ($#arg & 01) ? @arg : (@arg, undef); my $text; my $lastsep; my $current_match; my $expectation = new Parse::RecDescent::Expectation(q{' . $self->expected() . '}); $expectation->at($_[1]); '. ($parser->{_check}{thisoffset}?' my $thisoffset; tie $thisoffset, q{Parse::RecDescent::OffsetCounter}, \$text, $thisparser; ':'') . ($parser->{_check}{prevoffset}?' my $prevoffset; tie $prevoffset, q{Parse::RecDescent::OffsetCounter}, \$text, $thisparser, 1; ':'') . ($parser->{_check}{thiscolumn}?' my $thiscolumn; tie $thiscolumn, q{Parse::RecDescent::ColCounter}, \$text, $thisparser; ':'') . ($parser->{_check}{prevcolumn}?' my $prevcolumn; tie $prevcolumn, q{Parse::RecDescent::ColCounter}, \$text, $thisparser, 1; ':'') . ($parser->{_check}{prevline}?' my $prevline; tie $prevline, q{Parse::RecDescent::LineCounter}, \$text, $thisparser, 1; ':'') . ' my $thisline; tie $thisline, q{Parse::RecDescent::LineCounter}, \$text, $thisparser; '. $self->{vars} .' '; my $prod; foreach $prod ( @{$self->{"prods"}} ) { $prod->addscore($self->{autoscore},0,0) if $self->{autoscore}; next unless $prod->checkleftmost(); $code .= $prod->code($namespace,$self,$parser); $code .= $parser->{deferrable} ? ' splice @{$thisparser->{deferred}}, $def_at unless $_matched; ' : ''; } $code .= ' unless ( $_matched || defined($score) ) { ' .($parser->{deferrable} ? ' splice @{$thisparser->{deferred}}, $def_at; ' : '') . ' $_[1] = $text; # NOT SURE THIS IS NEEDED Parse::RecDescent::_trace(q{<<'.Parse::RecDescent::_matchtracemessage($self,1).' rule>>}, Parse::RecDescent::_tracefirst($_[1]), q{' . $self->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; return undef; } if (!defined($return) && defined($score)) { Parse::RecDescent::_trace(q{>>Accepted scored production<<}, "", q{' . $self->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; $return = $score_return; } splice @{$thisparser->{errors}}, $err_at; $return = $item[$#item] unless defined $return; if (defined $::RD_TRACE) { Parse::RecDescent::_trace(q{>>'.Parse::RecDescent::_matchtracemessage($self).' rule<< (return value: [} . $return . q{])}, "", q{' . $self->{"name"} .'}, $tracelevel); Parse::RecDescent::_trace(q{(consumed: [} . Parse::RecDescent::_tracemax(substr($_[1],0,-length($text))) . q{])}, Parse::RecDescent::_tracefirst($text), , q{' . $self->{"name"} .'}, $tracelevel) } $_[1] = $text; return $return; } '; return $code; } my @left; sub isleftrec($$) { my ($self, $rules) = @_; my $root = $self->{"name"}; @left = $self->leftmostsubrules(); my $next; foreach $next ( @left ) { next unless defined $rules->{$next}; # SKIP NON-EXISTENT RULES return 1 if $next eq $root; my $child; foreach $child ( $rules->{$next}->leftmostsubrules() ) { push(@left, $child) if ! _contains($child, @left) ; } } return 0; } package Parse::RecDescent::Production; sub describe ($;$) { return join ' ', map { $_->describe($_[1]) or () } @{$_[0]->{items}}; } sub new ($$;$$) { my ($self, $line, $uncommit, $error) = @_; my $class = ref($self) || $self; bless { "items" => [], "uncommit" => $uncommit, "error" => $error, "line" => $line, strcount => 0, patcount => 0, dircount => 0, actcount => 0, }, $class; } sub expected ($) { my $itemcount = scalar @{$_[0]->{"items"}}; return ($itemcount) ? $_[0]->{"items"}[0]->describe(1) : ''; } sub hasleftmost ($$) { my ($self, $ref) = @_; return ${$self->{"items"}}[0] eq $ref if scalar @{$self->{"items"}}; return 0; } sub isempty($) { my $self = shift; return 0 == @{$self->{"items"}}; } sub leftmostsubrule($) { my $self = shift; if ( $#{$self->{"items"}} >= 0 ) { my $subrule = $self->{"items"}[0]->issubrule(); return $subrule if defined $subrule; } return (); } sub checkleftmost($) { my @items = @{$_[0]->{"items"}}; if (@items==1 && ref($items[0]) =~ /\AParse::RecDescent::Error/ && $items[0]->{commitonly} ) { Parse::RecDescent::_warn(2,"Lone in production treated as "); Parse::RecDescent::_hint("A production consisting of a single conditional directive would normally succeed (with the value zero) if the rule is not 'commited' when it is tried. Since you almost certainly wanted ' ' Parse::RecDescent supplied it for you."); push @{$_[0]->{items}}, Parse::RecDescent::UncondReject->new(0,0,''); } elsif (@items==1 && ($items[0]->describe||"") =~ /describe||"") =~ /describe ."]"); my $what = $items[0]->describe =~ / (which acts like an unconditional during parsing)" : $items[0]->describe =~ / (which acts like an unconditional during parsing)" : "an unconditional "; my $caveat = $items[0]->describe =~ / 1 ? "However, there were also other (useless) items after the leading " . $items[0]->describe . ", so you may have been expecting some other behaviour." : "You can safely ignore this message."; Parse::RecDescent::_hint("The production starts with $what. That means that the production can never successfully match, so it was optimized out of the final parser$caveat. $advice"); return 0; } return 1; } sub changesskip($) { my $item; foreach $item (@{$_[0]->{"items"}}) { if (ref($item) =~ /Parse::RecDescent::(Action|Directive)/) { return 1 if $item->{code} =~ /\$skip\s*=/; } } return 0; } sub adddirective { my ( $self, $whichop, $line, $name ) = @_; push @{$self->{op}}, { type=>$whichop, line=>$line, name=>$name, offset=> scalar(@{$self->{items}}) }; } sub addscore { my ( $self, $code, $lookahead, $line ) = @_; $self->additem(Parse::RecDescent::Directive->new( "local \$^W; my \$thisscore = do { $code } + 0; if (!defined(\$score) || \$thisscore>\$score) { \$score=\$thisscore; \$score_return=\$item[-1]; } undef;", $lookahead, $line,"") ) unless $self->{items}[-1]->describe =~ /{op}) { while (my $next = pop @{$self->{op}}) { Parse::RecDescent::_error("Incomplete <$next->{type}op:...>.", $line); Parse::RecDescent::_hint( "The current production ended without completing the <$next->{type}op:...> directive that started near line $next->{line}. Did you forget the closing '>'?"); } } return 1; } sub enddirective { my ( $self, $line, $minrep, $maxrep ) = @_; unless ($self->{op}) { Parse::RecDescent::_error("Unmatched > found.", $line); Parse::RecDescent::_hint( "A '>' angle bracket was encountered, which typically indicates the end of a directive. However no suitable preceding directive was encountered. Typically this indicates either a extra '>' in the grammar, or a problem inside the previous directive."); return; } my $op = pop @{$self->{op}}; my $span = @{$self->{items}} - $op->{offset}; if ($op->{type} =~ /left|right/) { if ($span != 3) { Parse::RecDescent::_error( "Incorrect <$op->{type}op:...> specification: expected 3 args, but found $span instead", $line); Parse::RecDescent::_hint( "The <$op->{type}op:...> directive requires a sequence of exactly three elements. For example: <$op->{type}op:leftarg /op/ rightarg>"); } else { push @{$self->{items}}, Parse::RecDescent::Operator->new( $op->{type}, $minrep, $maxrep, splice(@{$self->{"items"}}, -3)); $self->{items}[-1]->sethashname($self); $self->{items}[-1]{name} = $op->{name}; } } } sub prevwasreturn { my ( $self, $line ) = @_; unless (@{$self->{items}}) { Parse::RecDescent::_error( "Incorrect specification: expected item missing", $line); Parse::RecDescent::_hint( "The directive requires a sequence of at least one item. For example: "); return; } push @{$self->{items}}, Parse::RecDescent::Result->new(); } sub additem { my ( $self, $item ) = @_; $item->sethashname($self); push @{$self->{"items"}}, $item; return $item; } sub _duplicate_itempos { my ($src) = @_; my $dst = {}; foreach (keys %$src) { %{$dst->{$_}} = %{$src->{$_}}; } $dst; } sub _update_itempos { my ($dst, $src, $typekeys, $poskeys) = @_; my @typekeys = 'ARRAY' eq ref $typekeys ? @$typekeys : keys %$src; foreach my $k (keys %$src) { if ('ARRAY' eq ref $poskeys) { @{$dst->{$k}}{@$poskeys} = @{$src->{$k}}{@$poskeys}; } else { %{$dst->{$k}} = %{$src->{$k}}; } } } sub preitempos { return q { push @itempos, {'offset' => {'from'=>$thisoffset, 'to'=>undef}, 'line' => {'from'=>$thisline, 'to'=>undef}, 'column' => {'from'=>$thiscolumn, 'to'=>undef} }; } } sub incitempos { return q { $itempos[$#itempos]{'offset'}{'from'} += length($lastsep); $itempos[$#itempos]{'line'}{'from'} = $thisline; $itempos[$#itempos]{'column'}{'from'} = $thiscolumn; } } sub unincitempos { # the next incitempos will properly set these two fields, but # {'offset'}{'from'} needs to be decreased by length($lastsep) # $itempos[$#itempos]{'line'}{'from'} # $itempos[$#itempos]{'column'}{'from'} return q { $itempos[$#itempos]{'offset'}{'from'} -= length($lastsep) if defined $lastsep; } } sub postitempos { return q { $itempos[$#itempos]{'offset'}{'to'} = $prevoffset; $itempos[$#itempos]{'line'}{'to'} = $prevline; $itempos[$#itempos]{'column'}{'to'} = $prevcolumn; } } sub code($$$$) { my ($self,$namespace,$rule,$parser) = @_; my $code = ' while (!$_matched' . (defined $self->{"uncommit"} ? '' : ' && !$commit') . ') { ' . ($self->changesskip() ? 'local $skip = defined($skip) ? $skip : $Parse::RecDescent::skip;' : '') .' Parse::RecDescent::_trace(q{Trying production: [' . $self->describe . ']}, Parse::RecDescent::_tracefirst($_[1]), q{' . $rule ->{name}. '}, $tracelevel) if defined $::RD_TRACE; my $thisprod = $thisrule->{"prods"}[' . $self->{"number"} . ']; ' . (defined $self->{"error"} ? '' : '$text = $_[1];' ) . ' my $_savetext; @item = (q{' . $rule->{"name"} . '}); %item = (__RULE__ => q{' . $rule->{"name"} . '}); my $repcount = 0; '; $code .= ' my @itempos = ({}); ' if $parser->{_check}{itempos}; my $item; my $i; for ($i = 0; $i < @{$self->{"items"}}; $i++) { $item = ${$self->{items}}[$i]; $code .= preitempos() if $parser->{_check}{itempos}; $code .= $item->code($namespace,$rule,$parser->{_check}); $code .= postitempos() if $parser->{_check}{itempos}; } if ($parser->{_AUTOACTION} && defined($item) && !$item->isa("Parse::RecDescent::Action")) { $code .= $parser->{_AUTOACTION}->code($namespace,$rule); Parse::RecDescent::_warn(1,"Autogenerating action in rule \"$rule->{name}\": $parser->{_AUTOACTION}{code}") and Parse::RecDescent::_hint("The \$::RD_AUTOACTION was defined, so any production not ending in an explicit action has the specified \"auto-action\" automatically appended."); } elsif ($parser->{_AUTOTREE} && defined($item) && !$item->isa("Parse::RecDescent::Action")) { if ($i==1 && $item->isterminal) { $code .= $parser->{_AUTOTREE}{TERMINAL}->code($namespace,$rule); } else { $code .= $parser->{_AUTOTREE}{NODE}->code($namespace,$rule); } Parse::RecDescent::_warn(1,"Autogenerating tree-building action in rule \"$rule->{name}\"") and Parse::RecDescent::_hint("The directive was specified, so any production not ending in an explicit action has some parse-tree building code automatically appended."); } $code .= ' Parse::RecDescent::_trace(q{>>'.Parse::RecDescent::_matchtracemessage($self).' production: [' . $self->describe . ']<<}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{name} . '}, $tracelevel) if defined $::RD_TRACE; ' . ( $parser->{_check}{itempos} ? ' if ( defined($_itempos) ) { Parse::RecDescent::Production::_update_itempos($_itempos, $itempos[ 1], undef, [qw(from)]); Parse::RecDescent::Production::_update_itempos($_itempos, $itempos[-1], undef, [qw(to)]); } ' : '' ) . ' $_matched = 1; last; } '; return $code; } 1; package Parse::RecDescent::Action; sub describe { undef } sub sethashname { $_[0]->{hashname} = '__ACTION' . ++$_[1]->{actcount} .'__'; } sub new { my $class = ref($_[0]) || $_[0]; bless { "code" => $_[1], "lookahead" => $_[2], "line" => $_[3], }, $class; } sub issubrule { undef } sub isterminal { 0 } sub code($$$$) { my ($self, $namespace, $rule) = @_; ' Parse::RecDescent::_trace(q{Trying action}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{name} . '}, $tracelevel) if defined $::RD_TRACE; ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' $_tok = ($_noactions) ? 0 : do ' . $self->{"code"} . '; ' . ($self->{"lookahead"}<0?'if':'unless') . ' (defined $_tok) { Parse::RecDescent::_trace(q{<<'.Parse::RecDescent::_matchtracemessage($self,1).' action>> (return value: [undef])}) if defined $::RD_TRACE; last; } Parse::RecDescent::_trace(q{>>'.Parse::RecDescent::_matchtracemessage($self).' action<< (return value: [} . $_tok . q{])}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; push @item, $_tok; ' . ($self->{line}>=0 ? '$item{'. $self->{hashname} .'}=$_tok;' : '' ) .' ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' ' } 1; package Parse::RecDescent::Directive; sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } sub issubrule { undef } sub isterminal { 0 } sub describe { $_[1] ? '' : $_[0]->{name} } sub new ($$$$$) { my $class = ref($_[0]) || $_[0]; bless { "code" => $_[1], "lookahead" => $_[2], "line" => $_[3], "name" => $_[4], }, $class; } sub code($$$$) { my ($self, $namespace, $rule) = @_; ' ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' Parse::RecDescent::_trace(q{Trying directive: [' . $self->describe . ']}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{name} . '}, $tracelevel) if defined $::RD_TRACE; ' .' $_tok = do { ' . $self->{"code"} . ' }; if (defined($_tok)) { Parse::RecDescent::_trace(q{>>'.Parse::RecDescent::_matchtracemessage($self).' directive<< (return value: [} . $_tok . q{])}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; } else { Parse::RecDescent::_trace(q{<<'.Parse::RecDescent::_matchtracemessage($self,1).' directive>>}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; } ' . ($self->{"lookahead"} ? '$text = $_savetext and ' : '' ) .' last ' . ($self->{"lookahead"}<0?'if':'unless') . ' defined $_tok; push @item, $item{'.$self->{hashname}.'}=$_tok; ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' ' } 1; package Parse::RecDescent::UncondReject; sub issubrule { undef } sub isterminal { 0 } sub describe { $_[1] ? '' : $_[0]->{name} } sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } sub new ($$$;$) { my $class = ref($_[0]) || $_[0]; bless { "lookahead" => $_[1], "line" => $_[2], "name" => $_[3], }, $class; } # MARK, YOU MAY WANT TO OPTIMIZE THIS. sub code($$$$) { my ($self, $namespace, $rule) = @_; ' Parse::RecDescent::_trace(q{>>Rejecting production<< (found ' . $self->describe . ')}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{name} . '}, $tracelevel) if defined $::RD_TRACE; undef $return; ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' $_tok = undef; ' . ($self->{"lookahead"} ? '$text = $_savetext and ' : '' ) .' last ' . ($self->{"lookahead"}<0?'if':'unless') . ' defined $_tok; ' } 1; package Parse::RecDescent::Error; sub issubrule { undef } sub isterminal { 0 } sub describe { $_[1] ? '' : $_[0]->{commitonly} ? '' : '' } sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } sub new ($$$$$) { my $class = ref($_[0]) || $_[0]; bless { "msg" => $_[1], "lookahead" => $_[2], "commitonly" => $_[3], "line" => $_[4], }, $class; } sub code($$$$) { my ($self, $namespace, $rule) = @_; my $action = ''; if ($self->{"msg"}) # ERROR MESSAGE SUPPLIED { #WAS: $action .= "Parse::RecDescent::_error(qq{$self->{msg}}" . ',$thisline);'; $action .= 'push @{$thisparser->{errors}}, [qq{'.$self->{msg}.'},$thisline];'; } else # GENERATE ERROR MESSAGE DURING PARSE { $action .= ' my $rule = $item[0]; $rule =~ s/_/ /g; #WAS: Parse::RecDescent::_error("Invalid $rule: " . $expectation->message() ,$thisline); push @{$thisparser->{errors}}, ["Invalid $rule: " . $expectation->message() ,$thisline]; '; } my $dir = new Parse::RecDescent::Directive('if (' . ($self->{"commitonly"} ? '$commit' : '1') . ") { do {$action} unless ".' $_noactions; undef } else {0}', $self->{"lookahead"},0,$self->describe); $dir->{hashname} = $self->{hashname}; return $dir->code($namespace, $rule, 0); } 1; package Parse::RecDescent::Token; sub sethashname { $_[0]->{hashname} = '__PATTERN' . ++$_[1]->{patcount} . '__'; } sub issubrule { undef } sub isterminal { 1 } sub describe ($) { shift->{'description'}} # ARGS ARE: $self, $pattern, $left_delim, $modifiers, $lookahead, $linenum sub new ($$$$$$) { my $class = ref($_[0]) || $_[0]; my $pattern = $_[1]; my $pat = $_[1]; my $ldel = $_[2]; my $rdel = $ldel; $rdel =~ tr/{[(/; my $mod = $_[3]; my $desc; if ($ldel eq '/') { $desc = "$ldel$pattern$rdel$mod" } else { $desc = "m$ldel$pattern$rdel$mod" } $desc =~ s/\\/\\\\/g; $desc =~ s/\$$/\\\$/g; $desc =~ s/}/\\}/g; $desc =~ s/{/\\{/g; if (!eval "no strict; local \$SIG{__WARN__} = sub {0}; '' =~ m$ldel$pattern$rdel$mod" and $@) { Parse::RecDescent::_warn(3, "Token pattern \"m$ldel$pattern$rdel$mod\" may not be a valid regular expression", $_[5]); $@ =~ s/ at \(eval.*/./; Parse::RecDescent::_hint($@); } # QUIETLY PREVENT (WELL-INTENTIONED) CALAMITY $mod =~ s/[gc]//g; $pattern =~ s/(\A|[^\\])\\G/$1/g; bless { "pattern" => $pattern, "ldelim" => $ldel, "rdelim" => $rdel, "mod" => $mod, "lookahead" => $_[4], "line" => $_[5], "description" => $desc, }, $class; } sub code($$$$$) { my ($self, $namespace, $rule, $check) = @_; my $ldel = $self->{"ldelim"}; my $rdel = $self->{"rdelim"}; my $sdel = $ldel; my $mod = $self->{"mod"}; $sdel =~ s/[[{(<]/{}/; my $code = ' Parse::RecDescent::_trace(q{Trying terminal: [' . $self->describe . ']}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{name} . '}, $tracelevel) if defined $::RD_TRACE; undef $lastsep; $expectation->is(q{' . ($rule->hasleftmost($self) ? '' : $self->describe ) . '})->at($text); ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) . ' ' . ($self->{"lookahead"}<0?'if':'unless') . ' ($text =~ s/\A($skip)/$lastsep=$1 and ""/e and ' . ($check->{itempos}? 'do {'.Parse::RecDescent::Production::incitempos().' 1} and ' : '') . ' $text =~ m' . $ldel . '\A(?:' . $self->{"pattern"} . ')' . $rdel . $mod . ') { '.($self->{"lookahead"} ? '$text = $_savetext;' : '$text = $lastsep . $text if defined $lastsep;') . ($check->{itempos} ? Parse::RecDescent::Production::unincitempos() : '') . ' $expectation->failed(); Parse::RecDescent::_trace(q{<>}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; last; } $current_match = substr($text, $-[0], $+[0] - $-[0]); substr($text,0,length($current_match),q{}); Parse::RecDescent::_trace(q{>>Matched terminal<< (return value: [} . $current_match . q{])}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; push @item, $item{'.$self->{hashname}.'}=$current_match; ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' '; return $code; } 1; package Parse::RecDescent::Literal; sub sethashname { $_[0]->{hashname} = '__STRING' . ++$_[1]->{strcount} . '__'; } sub issubrule { undef } sub isterminal { 1 } sub describe ($) { shift->{'description'} } sub new ($$$$) { my $class = ref($_[0]) || $_[0]; my $pattern = $_[1]; my $desc = $pattern; $desc=~s/\\/\\\\/g; $desc=~s/}/\\}/g; $desc=~s/{/\\{/g; bless { "pattern" => $pattern, "lookahead" => $_[2], "line" => $_[3], "description" => "'$desc'", }, $class; } sub code($$$$) { my ($self, $namespace, $rule, $check) = @_; my $code = ' Parse::RecDescent::_trace(q{Trying terminal: [' . $self->describe . ']}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{name} . '}, $tracelevel) if defined $::RD_TRACE; undef $lastsep; $expectation->is(q{' . ($rule->hasleftmost($self) ? '' : $self->describe ) . '})->at($text); ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) . ' ' . ($self->{"lookahead"}<0?'if':'unless') . ' ($text =~ s/\A($skip)/$lastsep=$1 and ""/e and ' . ($check->{itempos}? 'do {'.Parse::RecDescent::Production::incitempos().' 1} and ' : '') . ' $text =~ m/\A' . quotemeta($self->{"pattern"}) . '/) { '.($self->{"lookahead"} ? '$text = $_savetext;' : '$text = $lastsep . $text if defined $lastsep;').' '. ($check->{itempos} ? Parse::RecDescent::Production::unincitempos() : '') . ' $expectation->failed(); Parse::RecDescent::_trace(qq{<>}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; last; } $current_match = substr($text, $-[0], $+[0] - $-[0]); substr($text,0,length($current_match),q{}); Parse::RecDescent::_trace(q{>>Matched terminal<< (return value: [} . $current_match . q{])}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; push @item, $item{'.$self->{hashname}.'}=$current_match; ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' '; return $code; } 1; package Parse::RecDescent::InterpLit; sub sethashname { $_[0]->{hashname} = '__STRING' . ++$_[1]->{strcount} . '__'; } sub issubrule { undef } sub isterminal { 1 } sub describe ($) { shift->{'description'} } sub new ($$$$) { my $class = ref($_[0]) || $_[0]; my $pattern = $_[1]; $pattern =~ s#/#\\/#g; my $desc = $pattern; $desc=~s/\\/\\\\/g; $desc=~s/}/\\}/g; $desc=~s/{/\\{/g; bless { "pattern" => $pattern, "lookahead" => $_[2], "line" => $_[3], "description" => "'$desc'", }, $class; } sub code($$$$) { my ($self, $namespace, $rule, $check) = @_; my $code = ' Parse::RecDescent::_trace(q{Trying terminal: [' . $self->describe . ']}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{name} . '}, $tracelevel) if defined $::RD_TRACE; undef $lastsep; $expectation->is(q{' . ($rule->hasleftmost($self) ? '' : $self->describe ) . '})->at($text); ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) . ' ' . ($self->{"lookahead"}<0?'if':'unless') . ' ($text =~ s/\A($skip)/$lastsep=$1 and ""/e and ' . ($check->{itempos}? 'do {'.Parse::RecDescent::Production::incitempos().' 1} and ' : '') . ' do { $_tok = "' . $self->{"pattern"} . '"; 1 } and substr($text,0,length($_tok)) eq $_tok and do { substr($text,0,length($_tok)) = ""; 1; } ) { '.($self->{"lookahead"} ? '$text = $_savetext;' : '$text = $lastsep . $text if defined $lastsep;').' '. ($check->{itempos} ? Parse::RecDescent::Production::unincitempos() : '') . ' $expectation->failed(); Parse::RecDescent::_trace(q{<>}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; last; } Parse::RecDescent::_trace(q{>>Matched terminal<< (return value: [} . $_tok . q{])}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; push @item, $item{'.$self->{hashname}.'}=$_tok; ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' '; return $code; } 1; package Parse::RecDescent::Subrule; sub issubrule ($) { return $_[0]->{"subrule"} } sub isterminal { 0 } sub sethashname {} sub describe ($) { my $desc = $_[0]->{"implicit"} || $_[0]->{"subrule"}; $desc = "" if $_[0]->{"matchrule"}; return $desc; } sub callsyntax($$) { if ($_[0]->{"matchrule"}) { return "&{'$_[1]'.qq{$_[0]->{subrule}}}"; } else { return $_[1].$_[0]->{"subrule"}; } } sub new ($$$$;$$$) { my $class = ref($_[0]) || $_[0]; bless { "subrule" => $_[1], "lookahead" => $_[2], "line" => $_[3], "implicit" => $_[4] || undef, "matchrule" => $_[5], "argcode" => $_[6] || undef, }, $class; } sub code($$$$) { my ($self, $namespace, $rule, $check) = @_; ' Parse::RecDescent::_trace(q{Trying subrule: [' . $self->{"subrule"} . ']}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} . '}, $tracelevel) if defined $::RD_TRACE; if (1) { no strict qw{refs}; $expectation->is(' . ($rule->hasleftmost($self) ? 'q{}' # WAS : 'qq{'.$self->describe.'}' ) . ')->at($text); : 'q{'.$self->describe.'}' ) . ')->at($text); ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) . ($self->{"lookahead"}<0?'if':'unless') . ' (defined ($_tok = ' . $self->callsyntax($namespace.'::') . '($thisparser,$text,$repeating,' . ($self->{"lookahead"}?'1':'$_noactions') . ($self->{argcode} ? ",sub { return $self->{argcode} }" : ',sub { \\@arg }') . ($check->{"itempos"}?',$itempos[$#itempos]':',undef') . '))) { '.($self->{"lookahead"} ? '$text = $_savetext;' : '').' Parse::RecDescent::_trace(q{<<'.Parse::RecDescent::_matchtracemessage($self,1).' subrule: [' . $self->{subrule} . ']>>}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; $expectation->failed(); last; } Parse::RecDescent::_trace(q{>>'.Parse::RecDescent::_matchtracemessage($self).' subrule: [' . $self->{subrule} . ']<< (return value: [} . $_tok . q{]}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; $item{q{' . $self->{subrule} . '}} = $_tok; push @item, $_tok; ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' } ' } package Parse::RecDescent::Repetition; sub issubrule ($) { return $_[0]->{"subrule"} } sub isterminal { 0 } sub sethashname { } sub describe ($) { my $desc = $_[0]->{"expected"} || $_[0]->{"subrule"}; $desc = "" if $_[0]->{"matchrule"}; return $desc; } sub callsyntax($$) { if ($_[0]->{matchrule}) { return "sub { goto &{''.qq{$_[1]$_[0]->{subrule}}} }"; } else { return "\\&$_[1]$_[0]->{subrule}"; } } sub new ($$$$$$$$$$) { my ($self, $subrule, $repspec, $min, $max, $lookahead, $line, $parser, $matchrule, $argcode) = @_; my $class = ref($self) || $self; ($max, $min) = ( $min, $max) if ($max<$min); my $desc; if ($subrule=~/\A_alternation_\d+_of_production_\d+_of_rule/) { $desc = $parser->{"rules"}{$subrule}->expected } if ($lookahead) { if ($min>0) { return new Parse::RecDescent::Subrule($subrule,$lookahead,$line,$desc,$matchrule,$argcode); } else { Parse::RecDescent::_error("Not symbol (\"!\") before \"$subrule\" doesn't make sense.",$line); Parse::RecDescent::_hint("Lookahead for negated optional repetitions (such as \"!$subrule($repspec)\" can never succeed, since optional items always match (zero times at worst). Did you mean a single \"!$subrule\", instead?"); } } bless { "subrule" => $subrule, "repspec" => $repspec, "min" => $min, "max" => $max, "lookahead" => $lookahead, "line" => $line, "expected" => $desc, "argcode" => $argcode || undef, "matchrule" => $matchrule, }, $class; } sub code($$$$) { my ($self, $namespace, $rule, $check) = @_; my ($subrule, $repspec, $min, $max, $lookahead) = @{$self}{ qw{subrule repspec min max lookahead} }; ' Parse::RecDescent::_trace(q{Trying repeated subrule: [' . $self->describe . ']}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} . '}, $tracelevel) if defined $::RD_TRACE; $expectation->is(' . ($rule->hasleftmost($self) ? 'q{}' # WAS : 'qq{'.$self->describe.'}' ) . ')->at($text); : 'q{'.$self->describe.'}' ) . ')->at($text); ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' unless (defined ($_tok = $thisparser->_parserepeat($text, ' . $self->callsyntax($namespace.'::') . ', ' . $min . ', ' . $max . ', ' . ($self->{"lookahead"}?'1':'$_noactions') . ',$expectation,' . ($self->{argcode} ? "sub { return $self->{argcode} }" : 'sub { \\@arg }') . ($check->{"itempos"}?',$itempos[$#itempos]':',undef') . '))) { Parse::RecDescent::_trace(q{<<'.Parse::RecDescent::_matchtracemessage($self,1).' repeated subrule: [' . $self->describe . ']>>}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; last; } Parse::RecDescent::_trace(q{>>'.Parse::RecDescent::_matchtracemessage($self).' repeated subrule: [' . $self->{subrule} . ']<< (} . @$_tok . q{ times)}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; $item{q{' . "$self->{subrule}($self->{repspec})" . '}} = $_tok; push @item, $_tok; ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' ' } package Parse::RecDescent::Result; sub issubrule { 0 } sub isterminal { 0 } sub describe { '' } sub new { my ($class, $pos) = @_; bless {}, $class; } sub code($$$$) { my ($self, $namespace, $rule) = @_; ' $return = $item[-1]; '; } package Parse::RecDescent::Operator; my @opertype = ( " non-optional", "n optional" ); sub issubrule { 0 } sub isterminal { 0 } sub describe { $_[0]->{"expected"} } sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } sub new { my ($class, $type, $minrep, $maxrep, $leftarg, $op, $rightarg) = @_; bless { "type" => "${type}op", "leftarg" => $leftarg, "op" => $op, "min" => $minrep, "max" => $maxrep, "rightarg" => $rightarg, "expected" => "<${type}op: ".$leftarg->describe." ".$op->describe." ".$rightarg->describe.">", }, $class; } sub code($$$$) { my ($self, $namespace, $rule, $check) = @_; my @codeargs = @_[1..$#_]; my ($leftarg, $op, $rightarg) = @{$self}{ qw{leftarg op rightarg} }; my $code = ' Parse::RecDescent::_trace(q{Trying operator: [' . $self->describe . ']}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} . '}, $tracelevel) if defined $::RD_TRACE; $expectation->is(' . ($rule->hasleftmost($self) ? 'q{}' # WAS : 'qq{'.$self->describe.'}' ) . ')->at($text); : 'q{'.$self->describe.'}' ) . ')->at($text); $_tok = undef; OPLOOP: while (1) { $repcount = 0; my @item; my %item; '; $code .= ' my $_itempos = $itempos[-1]; my $itemposfirst; ' if $check->{itempos}; if ($self->{type} eq "leftop" ) { $code .= ' # MATCH LEFTARG ' . $leftarg->code(@codeargs) . ' '; $code .= ' if (defined($_itempos) and !defined($itemposfirst)) { $itemposfirst = Parse::RecDescent::Production::_duplicate_itempos($_itempos); } ' if $check->{itempos}; $code .= ' $repcount++; my $savetext = $text; my $backtrack; # MATCH (OP RIGHTARG)(s) while ($repcount < ' . $self->{max} . ') { $backtrack = 0; ' . $op->code(@codeargs) . ' ' . ($op->isterminal() ? 'pop @item;' : '$backtrack=1;' ) . ' ' . (ref($op) eq 'Parse::RecDescent::Token' ? 'if (defined $1) {push @item, $item{'.($self->{name}||$self->{hashname}).'}=$1; $backtrack=1;}' : "" ) . ' ' . $rightarg->code(@codeargs) . ' $savetext = $text; $repcount++; } $text = $savetext; pop @item if $backtrack; '; } else { $code .= ' my $savetext = $text; my $backtrack; # MATCH (LEFTARG OP)(s) while ($repcount < ' . $self->{max} . ') { $backtrack = 0; ' . $leftarg->code(@codeargs) . ' '; $code .= ' if (defined($_itempos) and !defined($itemposfirst)) { $itemposfirst = Parse::RecDescent::Production::_duplicate_itempos($_itempos); } ' if $check->{itempos}; $code .= ' $repcount++; $backtrack = 1; ' . $op->code(@codeargs) . ' $savetext = $text; ' . ($op->isterminal() ? 'pop @item;' : "" ) . ' ' . (ref($op) eq 'Parse::RecDescent::Token' ? 'do { push @item, $item{'.($self->{name}||$self->{hashname}).'}=$1; } if defined $1;' : "" ) . ' } $text = $savetext; pop @item if $backtrack; # MATCH RIGHTARG ' . $rightarg->code(@codeargs) . ' $repcount++; '; } $code .= 'unless (@item) { undef $_tok; last }' unless $self->{min}==0; $code .= ' $_tok = [ @item ]; '; $code .= ' if (defined $itemposfirst) { Parse::RecDescent::Production::_update_itempos( $_itempos, $itemposfirst, undef, [qw(from)]); } ' if $check->{itempos}; $code .= ' last; } # end of OPLOOP '; $code .= ' unless ($repcount>='.$self->{min}.') { Parse::RecDescent::_trace(q{<<'.Parse::RecDescent::_matchtracemessage($self,1).' operator: [' . $self->describe . ']>>}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; $expectation->failed(); last; } Parse::RecDescent::_trace(q{>>'.Parse::RecDescent::_matchtracemessage($self).' operator: [' . $self->describe . ']<< (return value: [} . qq{@{$_tok||[]}} . q{]}, Parse::RecDescent::_tracefirst($text), q{' . $rule->{"name"} .'}, $tracelevel) if defined $::RD_TRACE; push @item, $item{'.($self->{name}||$self->{hashname}).'}=$_tok||[]; '; return $code; } package Parse::RecDescent::Expectation; sub new ($) { bless { "failed" => 0, "expected" => "", "unexpected" => "", "lastexpected" => "", "lastunexpected" => "", "defexpected" => $_[1], }; } sub is ($$) { $_[0]->{lastexpected} = $_[1]; return $_[0]; } sub at ($$) { $_[0]->{lastunexpected} = $_[1]; return $_[0]; } sub failed ($) { return unless $_[0]->{lastexpected}; $_[0]->{expected} = $_[0]->{lastexpected} unless $_[0]->{failed}; $_[0]->{unexpected} = $_[0]->{lastunexpected} unless $_[0]->{failed}; $_[0]->{failed} = 1; } sub message ($) { my ($self) = @_; $self->{expected} = $self->{defexpected} unless $self->{expected}; $self->{expected} =~ s/_/ /g; if (!$self->{unexpected} || $self->{unexpected} =~ /\A\s*\Z/s) { return "Was expecting $self->{expected}"; } else { $self->{unexpected} =~ /\s*(.*)/; return "Was expecting $self->{expected} but found \"$1\" instead"; } } 1; package Parse::RecDescent; use Carp; use vars qw ( $AUTOLOAD $VERSION $_FILENAME); my $ERRORS = 0; our $VERSION = '1.967015'; $VERSION = eval $VERSION; $_FILENAME=__FILE__; # BUILDING A PARSER my $nextnamespace = "namespace000001"; sub _nextnamespace() { return "Parse::RecDescent::" . $nextnamespace++; } # ARGS ARE: $class, $grammar, $compiling, $namespace sub new ($$$$) { my $class = ref($_[0]) || $_[0]; local $Parse::RecDescent::compiling = $_[2]; my $name_space_name = defined $_[3] ? "Parse::RecDescent::".$_[3] : _nextnamespace(); my $self = { "rules" => {}, "namespace" => $name_space_name, "startcode" => '', "localvars" => '', "_AUTOACTION" => undef, "_AUTOTREE" => undef, # Precompiled parsers used to set _precompiled, but that # wasn't present in some versions of Parse::RecDescent used to # build precompiled parsers. Instead, set a new # _not_precompiled flag, which is remove from future # Precompiled parsers at build time. "_not_precompiled" => 1, }; if ($::RD_AUTOACTION) { my $sourcecode = $::RD_AUTOACTION; $sourcecode = "{ $sourcecode }" unless $sourcecode =~ /\A\s*\{.*\}\s*\Z/; $self->{_check}{itempos} = $sourcecode =~ /\@itempos\b|\$itempos\s*\[/; $self->{_AUTOACTION} = new Parse::RecDescent::Action($sourcecode,0,-1) } bless $self, $class; return $self->Replace($_[1]) } sub Compile($$$$) { die "Compilation of Parse::RecDescent grammars not yet implemented\n"; } sub DESTROY { my ($self) = @_; my $namespace = $self->{namespace}; $namespace =~ s/Parse::RecDescent:://; if ($self->{_not_precompiled}) { # BEGIN WORKAROUND # Perl has a bug that creates a circular reference between # @ISA and that variable's stash: # https://rt.perl.org/rt3/Ticket/Display.html?id=92708 # Emptying the array before deleting the stash seems to # prevent the leak. Once the ticket above has been resolved, # these two lines can be removed. no strict 'refs'; @{$self->{namespace} . '::ISA'} = (); # END WORKAROUND # Some grammars may contain circular references between rules, # such as: # a: 'ID' | b # b: '(' a ')' # Unless these references are broken, the subs stay around on # stash deletion below. Iterate through the stash entries and # for each defined code reference, set it to reference sub {} # instead. { local $^W; # avoid 'sub redefined' warnings. my $blank_sub = sub {}; while (my ($name, $glob) = each %{"Parse::RecDescent::$namespace\::"}) { *$glob = $blank_sub if defined &$glob; } } # Delete the namespace's stash delete $Parse::RecDescent::{$namespace.'::'}; } } # BUILDING A GRAMMAR.... # ARGS ARE: $self, $grammar, $isimplicit, $isleftop sub Replace ($$) { # set $replace = 1 for _generate splice(@_, 2, 0, 1); return _generate(@_); } # ARGS ARE: $self, $grammar, $isimplicit, $isleftop sub Extend ($$) { # set $replace = 0 for _generate splice(@_, 2, 0, 0); return _generate(@_); } sub _no_rule ($$;$) { _error("Ruleless $_[0] at start of grammar.",$_[1]); my $desc = $_[2] ? "\"$_[2]\"" : ""; _hint("You need to define a rule for the $_[0] $desc to be part of."); } my $NEGLOOKAHEAD = '\G(\s*\.\.\.\!)'; my $POSLOOKAHEAD = '\G(\s*\.\.\.)'; my $RULE = '\G\s*(\w+)[ \t]*:'; my $PROD = '\G\s*([|])'; my $TOKEN = q{\G\s*/((\\\\/|\\\\\\\\|[^/])*)/([cgimsox]*)}; my $MTOKEN = q{\G\s*(m\s*[^\w\s])}; my $LITERAL = q{\G\s*'((\\\\['\\\\]|[^'])*)'}; my $INTERPLIT = q{\G\s*"((\\\\["\\\\]|[^"])*)"}; my $SUBRULE = '\G\s*(\w+)'; my $MATCHRULE = '\G(\s*{_check}{itempos} = ($grammar =~ /\@itempos\b|\$itempos\s*\[/) unless $self->{_check}{itempos}; for (qw(thisoffset thiscolumn prevline prevoffset prevcolumn)) { $self->{_check}{$_} = ($grammar =~ /\$$_/) || $self->{_check}{itempos} unless $self->{_check}{$_}; } my $line; my $rule = undef; my $prod = undef; my $item = undef; my $lastgreedy = ''; pos $grammar = 0; study $grammar; local $::RD_HINT = $::RD_HINT; local $::RD_WARN = $::RD_WARN; local $::RD_TRACE = $::RD_TRACE; local $::RD_CHECK = $::RD_CHECK; while (pos $grammar < length $grammar) { $line = $lines[-1] - _linecount($grammar) + 1; my $commitonly; my $code = ""; my @components = (); if ($grammar =~ m/$COMMENT/gco) { _parse("a comment",0,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); next; } elsif ($grammar =~ m/$NEGLOOKAHEAD/gco) { _parse("a negative lookahead",$aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $lookahead = $lookahead ? -$lookahead : -1; $lookaheadspec .= $1; next; # SKIP LOOKAHEAD RESET AT END OF while LOOP } elsif ($grammar =~ m/$POSLOOKAHEAD/gco) { _parse("a positive lookahead",$aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $lookahead = $lookahead ? $lookahead : 1; $lookaheadspec .= $1; next; # SKIP LOOKAHEAD RESET AT END OF while LOOP } elsif ($grammar =~ m/(?=$ACTION)/gco and do { ($code) = extract_codeblock($grammar); $code }) { _parse("an action", $aftererror, $line, $code); $item = new Parse::RecDescent::Action($code,$lookahead,$line); $prod and $prod->additem($item) or $self->_addstartcode($code); } elsif ($grammar =~ m/(?=$IMPLICITSUBRULE)/gco and do { ($code) = extract_codeblock($grammar,'{([',undef,'(',1); $code }) { $code =~ s/\A\s*\(|\)\Z//g; _parse("an implicit subrule", $aftererror, $line, "( $code )"); my $implicit = $rule->nextimplicit; return undef if !$self->_generate("$implicit : $code",$replace,1); my $pos = pos $grammar; substr($grammar,$pos,0,$implicit); pos $grammar = $pos;; } elsif ($grammar =~ m/$ENDDIRECTIVEMK/gco) { # EXTRACT TRAILING REPETITION SPECIFIER (IF ANY) my ($minrep,$maxrep) = (1,$MAXREP); if ($grammar =~ m/\G[(]/gc) { pos($grammar)--; if ($grammar =~ m/$OPTIONAL/gco) { ($minrep, $maxrep) = (0,1) } elsif ($grammar =~ m/$ANY/gco) { $minrep = 0 } elsif ($grammar =~ m/$EXACTLY/gco) { ($minrep, $maxrep) = ($1,$1) } elsif ($grammar =~ m/$BETWEEN/gco) { ($minrep, $maxrep) = ($1,$2) } elsif ($grammar =~ m/$ATLEAST/gco) { $minrep = $1 } elsif ($grammar =~ m/$ATMOST/gco) { $maxrep = $1 } elsif ($grammar =~ m/$MANY/gco) { } elsif ($grammar =~ m/$BADREP/gco) { _parse("an invalid repetition specifier", 0,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); _error("Incorrect specification of a repeated directive", $line); _hint("Repeated directives cannot have a maximum repetition of zero, nor can they have negative components in their ranges."); } } $prod && $prod->enddirective($line,$minrep,$maxrep); } elsif ($grammar =~ m/\G\s*<[^m]/gc) { pos($grammar)-=2; if ($grammar =~ m/$OPMK/gco) { # $DB::single=1; _parse("a $1-associative operator directive", $aftererror, $line, "<$1op:...>"); $prod->adddirective($1, $line,$2||''); } elsif ($grammar =~ m/$UNCOMMITMK/gco) { _parse("an uncommit marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Directive('$commit=0;1', $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/$QUOTELIKEMK/gco) { _parse("an perl quotelike marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Directive( 'my ($match,@res); ($match,$text,undef,@res) = Text::Balanced::extract_quotelike($text,$skip); $match ? \@res : undef; ', $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/$CODEBLOCKMK/gco) { my $outer = $1||"{}"; _parse("an perl codeblock marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Directive( 'Text::Balanced::extract_codeblock($text,undef,$skip,\''.$outer.'\'); ', $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/$VARIABLEMK/gco) { _parse("an perl variable marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Directive( 'Text::Balanced::extract_variable($text,$skip); ', $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/$NOCHECKMK/gco) { _parse("a disable checking marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); if ($rule) { _error(" directive not at start of grammar", $line); _hint("The directive can only be specified at the start of a grammar (before the first rule is defined."); } else { local $::RD_CHECK = 1; } } elsif ($grammar =~ m/$AUTOSTUBMK/gco) { _parse("an autostub marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $::RD_AUTOSTUB = ""; } elsif ($grammar =~ m/$AUTORULEMK/gco) { _parse("an autorule marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $::RD_AUTOSTUB = $1; } elsif ($grammar =~ m/$AUTOTREEMK/gco) { my $base = defined($1) ? $1 : ""; my $current_match = substr($grammar, $-[0], $+[0] - $-[0]); $base .= "::" if $base && $base !~ /::$/; _parse("an autotree marker", $aftererror,$line, $current_match); if ($rule) { _error(" directive not at start of grammar", $line); _hint("The directive can only be specified at the start of a grammar (before the first rule is defined."); } else { undef $self->{_AUTOACTION}; $self->{_AUTOTREE}{NODE} = new Parse::RecDescent::Action(q({bless \%item, ').$base.q('.$item[0]}),0,-1); $self->{_AUTOTREE}{TERMINAL} = new Parse::RecDescent::Action(q({bless {__VALUE__=>$item[1]}, ').$base.q('.$item[0]}),0,-1); } } elsif ($grammar =~ m/$REJECTMK/gco) { _parse("an reject marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::UncondReject($lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/(?=$CONDREJECTMK)/gco and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); $code }) { _parse("a (conditional) reject marker", $aftererror,$line, $code ); $code =~ /\A\s*\Z/s; my $cond = $1; $item = new Parse::RecDescent::Directive( "($1) ? undef : 1", $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/(?=$SCOREMK)/gco and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); $code }) { _parse("a score marker", $aftererror,$line, $code ); $code =~ /\A\s*\Z/s; $prod and $prod->addscore($1, $lookahead, $line) or _no_rule($code,$line); } elsif ($grammar =~ m/(?=$AUTOSCOREMK)/gco and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); $code; } ) { _parse("an autoscore specifier", $aftererror,$line,$code); $code =~ /\A\s*\Z/s; $rule and $rule->addautoscore($1,$self) or _no_rule($code,$line); $item = new Parse::RecDescent::UncondReject($lookahead,$line,$code); $prod and $prod->additem($item) or _no_rule($code,$line); } elsif ($grammar =~ m/$RESYNCMK/gco) { _parse("a resync to newline marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Directive( 'if ($text =~ s/(\A[^\n]*\n)//) { $return = 0; $1; } else { undef }', $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/(?=$RESYNCPATMK)/gco and do { ($code) = extract_bracketed($grammar,'<'); $code }) { _parse("a resync with pattern marker", $aftererror,$line, $code ); $code =~ /\A\s*\Z/s; $item = new Parse::RecDescent::Directive( 'if ($text =~ s/(\A'.$1.')//) { $return = 0; $1; } else { undef }', $lookahead,$line,$code); $prod and $prod->additem($item) or _no_rule($code,$line); } elsif ($grammar =~ m/(?=$SKIPMK)/gco and do { ($code) = extract_codeblock($grammar,'<'); $code }) { _parse("a skip marker", $aftererror,$line, $code ); $code =~ /\A\s*\Z/s; if ($rule) { $item = new Parse::RecDescent::Directive( 'my $oldskip = $skip; $skip='.$1.'; $oldskip', $lookahead,$line,$code); $prod and $prod->additem($item) or _no_rule($code,$line); } else { #global directive $self->{skip} = $1; } } elsif ($grammar =~ m/(?=$RULEVARPATMK)/gco and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); $code; } ) { _parse("a rule variable specifier", $aftererror,$line,$code); $code =~ /\A\s*\Z/s; $rule and $rule->addvar($1,$self) or _no_rule($code,$line); $item = new Parse::RecDescent::UncondReject($lookahead,$line,$code); $prod and $prod->additem($item) or _no_rule($code,$line); } elsif ($grammar =~ m/(?=$AUTOACTIONPATMK)/gco and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); $code; } ) { _parse("an autoaction specifier", $aftererror,$line,$code); $code =~ s/\A\s*\Z/$1/s; if ($code =~ /\A\s*[^{]|[^}]\s*\Z/) { $code = "{ $code }" } $self->{_check}{itempos} = $code =~ /\@itempos\b|\$itempos\s*\[/; $self->{_AUTOACTION} = new Parse::RecDescent::Action($code,0,-$line) } elsif ($grammar =~ m/(?=$DEFERPATMK)/gco and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); $code; } ) { _parse("a deferred action specifier", $aftererror,$line,$code); $code =~ s/\A\s*\Z/$1/s; if ($code =~ /\A\s*[^{]|[^}]\s*\Z/) { $code = "{ $code }" } $item = new Parse::RecDescent::Directive( "push \@{\$thisparser->{deferred}}, sub $code;", $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); $self->{deferrable} = 1; } elsif ($grammar =~ m/(?=$TOKENPATMK)/gco and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); $code; } ) { _parse("a token constructor", $aftererror,$line,$code); $code =~ s/\A\s*\Z/$1/s; my $types = eval 'no strict; local $SIG{__WARN__} = sub {0}; my @arr=('.$code.'); @arr' || (); if (!$types) { _error("Incorrect token specification: \"$@\"", $line); _hint("The directive requires a list of one or more strings representing possible types of the specified token. For example: "); } else { $item = new Parse::RecDescent::Directive( 'no strict; $return = { text => $item[-1] }; @{$return->{type}}{'.$code.'} = (1..'.$types.');', $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } } elsif ($grammar =~ m/$COMMITMK/gco) { _parse("an commit marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Directive('$commit = 1', $lookahead,$line,""); $prod and $prod->additem($item) or _no_rule("",$line); } elsif ($grammar =~ m/$NOCHECKMK/gco) { _parse("an hint request", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $::RD_CHECK = 0; } elsif ($grammar =~ m/$HINTMK/gco) { _parse("an hint request", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $::RD_HINT = $self->{__HINT__} = 1; } elsif ($grammar =~ m/$WARNMK/gco) { _parse("an warning request", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $::RD_WARN = $self->{__WARN__} = $1 ? $2+0 : 1; } elsif ($grammar =~ m/$TRACEBUILDMK/gco) { _parse("an grammar build trace request", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $::RD_TRACE = $1 ? $2+0 : 1; } elsif ($grammar =~ m/$TRACEPARSEMK/gco) { _parse("an parse trace request", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $self->{__TRACE__} = $1 ? $2+0 : 1; } elsif ($grammar =~ m/$AUTOERRORMK/gco) { $commitonly = $1; _parse("an error marker", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Error('',$lookahead,$1,$line); $prod and $prod->additem($item) or _no_rule("",$line); $aftererror = !$commitonly; } elsif ($grammar =~ m/(?=$MSGERRORMK)/gco and do { $commitonly = $1; ($code) = extract_bracketed($grammar,'<'); $code }) { _parse("an error marker", $aftererror,$line,$code); $code =~ /\A\s*\Z/s; $item = new Parse::RecDescent::Error($1,$lookahead,$commitonly,$line); $prod and $prod->additem($item) or _no_rule("$code",$line); $aftererror = !$commitonly; } elsif (do { $commitonly = $1; ($code) = extract_bracketed($grammar,'<'); $code }) { if ($code =~ /^<[A-Z_]+>$/) { _error("Token items are not yet supported: \"$code\"", $line); _hint("Items like $code that consist of angle brackets enclosing a sequence of uppercase characters will eventually be used to specify pre-lexed tokens in a grammar. That functionality is not yet implemented. Or did you misspell \"$code\"?"); } else { _error("Untranslatable item encountered: \"$code\"", $line); _hint("Did you misspell \"$code\" or forget to comment it out?"); } } } elsif ($grammar =~ m/$RULE/gco) { _parseunneg("a rule declaration", 0, $lookahead,$line, substr($grammar, $-[0], $+[0] - $-[0]) ) or next; my $rulename = $1; if ($rulename =~ /Replace|Extend|Precompile|PrecompiledRuntime|Save/ ) { _warn(2,"Rule \"$rulename\" hidden by method Parse::RecDescent::$rulename",$line) and _hint("The rule named \"$rulename\" cannot be directly called through the Parse::RecDescent object for this grammar (although it may still be used as a subrule of other rules). It can't be directly called because Parse::RecDescent::$rulename is already defined (it is the standard method of all parsers)."); } $rule = new Parse::RecDescent::Rule($rulename,$self,$line,$replace); $prod->check_pending($line) if $prod; $prod = $rule->addprod( new Parse::RecDescent::Production ); $aftererror = 0; } elsif ($grammar =~ m/$UNCOMMITPROD/gco) { pos($grammar)-=9; _parseunneg("a new (uncommitted) production", 0, $lookahead, $line, substr($grammar, $-[0], $+[0] - $-[0]) ) or next; $prod->check_pending($line) if $prod; $prod = new Parse::RecDescent::Production($line,1); $rule and $rule->addprod($prod) or _no_rule("",$line); $aftererror = 0; } elsif ($grammar =~ m/$ERRORPROD/gco) { pos($grammar)-=6; _parseunneg("a new (error) production", $aftererror, $lookahead,$line, substr($grammar, $-[0], $+[0] - $-[0]) ) or next; $prod->check_pending($line) if $prod; $prod = new Parse::RecDescent::Production($line,0,1); $rule and $rule->addprod($prod) or _no_rule("",$line); $aftererror = 0; } elsif ($grammar =~ m/$PROD/gco) { _parseunneg("a new production", 0, $lookahead,$line, substr($grammar, $-[0], $+[0] - $-[0]) ) or next; $rule and (!$prod || $prod->check_pending($line)) and $prod = $rule->addprod(new Parse::RecDescent::Production($line)) or _no_rule("production",$line); $aftererror = 0; } elsif ($grammar =~ m/$LITERAL/gco) { my $literal = $1; ($code = $literal) =~ s/\\\\/\\/g; _parse("a literal terminal", $aftererror,$line,$literal); $item = new Parse::RecDescent::Literal($code,$lookahead,$line); $prod and $prod->additem($item) or _no_rule("literal terminal",$line,"'$literal'"); } elsif ($grammar =~ m/$INTERPLIT/gco) { _parse("an interpolated literal terminal", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::InterpLit($1,$lookahead,$line); $prod and $prod->additem($item) or _no_rule("interpolated literal terminal",$line,"'$1'"); } elsif ($grammar =~ m/$TOKEN/gco) { _parse("a /../ pattern terminal", $aftererror,$line, substr($grammar, $-[0], $+[0] - $-[0]) ); $item = new Parse::RecDescent::Token($1,'/',$3?$3:'',$lookahead,$line); $prod and $prod->additem($item) or _no_rule("pattern terminal",$line,"/$1/"); } elsif ($grammar =~ m/(?=$MTOKEN)/gco and do { ($code, undef, @components) = extract_quotelike($grammar); $code } ) { _parse("an m/../ pattern terminal", $aftererror,$line,$code); $item = new Parse::RecDescent::Token(@components[3,2,8], $lookahead,$line); $prod and $prod->additem($item) or _no_rule("pattern terminal",$line,$code); } elsif ($grammar =~ m/(?=$MATCHRULE)/gco and do { ($code) = extract_bracketed($grammar,'<'); $code } or $grammar =~ m/$SUBRULE/gco and $code = $1) { my $name = $code; my $matchrule = 0; if (substr($name,0,1) eq '<') { $name =~ s/$MATCHRULE\s*//; $name =~ s/\s*>\Z//; $matchrule = 1; } # EXTRACT TRAILING ARG LIST (IF ANY) my ($argcode) = extract_codeblock($grammar, "[]",'') || ''; # EXTRACT TRAILING REPETITION SPECIFIER (IF ANY) if ($grammar =~ m/\G[(]/gc) { pos($grammar)--; if ($grammar =~ m/$OPTIONAL/gco) { _parse("an zero-or-one subrule match", $aftererror,$line,"$code$argcode($1)"); $item = new Parse::RecDescent::Repetition($name,$1,0,1, $lookahead,$line, $self, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("repetition",$line,"$code$argcode($1)"); !$matchrule and $rule and $rule->addcall($name); } elsif ($grammar =~ m/$ANY/gco) { _parse("a zero-or-more subrule match", $aftererror,$line,"$code$argcode($1)"); if ($2) { my $pos = pos $grammar; substr($grammar,$pos,0, "(s?) "); pos $grammar = $pos; } else { $item = new Parse::RecDescent::Repetition($name,$1,0,$MAXREP, $lookahead,$line, $self, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("repetition",$line,"$code$argcode($1)"); !$matchrule and $rule and $rule->addcall($name); _check_insatiable($name,$1,$grammar,$line) if $::RD_CHECK; } } elsif ($grammar =~ m/$MANY/gco) { _parse("a one-or-more subrule match", $aftererror,$line,"$code$argcode($1)"); if ($2) { # $DB::single=1; my $pos = pos $grammar; substr($grammar,$pos,0, " "); pos $grammar = $pos; } else { $item = new Parse::RecDescent::Repetition($name,$1,1,$MAXREP, $lookahead,$line, $self, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("repetition",$line,"$code$argcode($1)"); !$matchrule and $rule and $rule->addcall($name); _check_insatiable($name,$1,$grammar,$line) if $::RD_CHECK; } } elsif ($grammar =~ m/$EXACTLY/gco) { _parse("an exactly-$1-times subrule match", $aftererror,$line,"$code$argcode($1)"); if ($2) { my $pos = pos $grammar; substr($grammar,$pos,0, "($1) "); pos $grammar = $pos; } else { $item = new Parse::RecDescent::Repetition($name,$1,$1,$1, $lookahead,$line, $self, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("repetition",$line,"$code$argcode($1)"); !$matchrule and $rule and $rule->addcall($name); } } elsif ($grammar =~ m/$BETWEEN/gco) { _parse("a $1-to-$2 subrule match", $aftererror,$line,"$code$argcode($1..$2)"); if ($3) { my $pos = pos $grammar; substr($grammar,$pos,0, "($1..$2) "); pos $grammar = $pos; } else { $item = new Parse::RecDescent::Repetition($name,"$1..$2",$1,$2, $lookahead,$line, $self, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("repetition",$line,"$code$argcode($1..$2)"); !$matchrule and $rule and $rule->addcall($name); } } elsif ($grammar =~ m/$ATLEAST/gco) { _parse("a $1-or-more subrule match", $aftererror,$line,"$code$argcode($1..)"); if ($2) { my $pos = pos $grammar; substr($grammar,$pos,0, "($1..) "); pos $grammar = $pos; } else { $item = new Parse::RecDescent::Repetition($name,"$1..",$1,$MAXREP, $lookahead,$line, $self, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("repetition",$line,"$code$argcode($1..)"); !$matchrule and $rule and $rule->addcall($name); _check_insatiable($name,"$1..",$grammar,$line) if $::RD_CHECK; } } elsif ($grammar =~ m/$ATMOST/gco) { _parse("a one-to-$1 subrule match", $aftererror,$line,"$code$argcode(..$1)"); if ($2) { my $pos = pos $grammar; substr($grammar,$pos,0, "(..$1) "); pos $grammar = $pos; } else { $item = new Parse::RecDescent::Repetition($name,"..$1",1,$1, $lookahead,$line, $self, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("repetition",$line,"$code$argcode(..$1)"); !$matchrule and $rule and $rule->addcall($name); } } elsif ($grammar =~ m/$BADREP/gco) { my $current_match = substr($grammar, $-[0], $+[0] - $-[0]); _parse("an subrule match with invalid repetition specifier", 0,$line, $current_match); _error("Incorrect specification of a repeated subrule", $line); _hint("Repeated subrules like \"$code$argcode$current_match\" cannot have a maximum repetition of zero, nor can they have negative components in their ranges."); } } else { _parse("a subrule match", $aftererror,$line,$code); my $desc; if ($name=~/\A_alternation_\d+_of_production_\d+_of_rule/) { $desc = $self->{"rules"}{$name}->expected } $item = new Parse::RecDescent::Subrule($name, $lookahead, $line, $desc, $matchrule, $argcode); $prod and $prod->additem($item) or _no_rule("(sub)rule",$line,$name); !$matchrule and $rule and $rule->addcall($name); } } elsif ($grammar =~ m/$LONECOLON/gco ) { _error("Unexpected colon encountered", $line); _hint("Did you mean \"|\" (to start a new production)? Or perhaps you forgot that the colon in a rule definition must be on the same line as the rule name?"); } elsif ($grammar =~ m/$ACTION/gco ) # BAD ACTION, ALREADY FAILED { _error("Malformed action encountered", $line); _hint("Did you forget the closing curly bracket or is there a syntax error in the action?"); } elsif ($grammar =~ m/$OTHER/gco ) { _error("Untranslatable item encountered: \"$1\"", $line); _hint("Did you misspell \"$1\" or forget to comment it out?"); } if ($lookaheadspec =~ tr /././ > 3) { $lookaheadspec =~ s/\A\s+//; $lookahead = $lookahead<0 ? 'a negative lookahead ("...!")' : 'a positive lookahead ("...")' ; _warn(1,"Found two or more lookahead specifiers in a row.",$line) and _hint("Multiple positive and/or negative lookaheads are simply multiplied together to produce a single positive or negative lookahead specification. In this case the sequence \"$lookaheadspec\" was reduced to $lookahead. Was this your intention?"); } $lookahead = 0; $lookaheadspec = ""; $grammar =~ m/\G\s+/gc; } if ($must_pop_lines) { pop @lines; } unless ($ERRORS or $isimplicit or !$::RD_CHECK) { $self->_check_grammar(); } unless ($ERRORS or $isimplicit or $Parse::RecDescent::compiling) { my $code = $self->_code(); if (defined $::RD_TRACE) { my $mode = ($nextnamespace eq "namespace000002") ? '>' : '>>'; print STDERR "printing code (", length($code),") to RD_TRACE\n"; local *TRACE_FILE; open TRACE_FILE, $mode, "RD_TRACE" and print TRACE_FILE "my \$ERRORS;\n$code" and close TRACE_FILE; } unless ( eval "$code 1" ) { _error("Internal error in generated parser code!"); $@ =~ s/at grammar/in grammar at/; _hint($@); } } if ($ERRORS and !_verbosity("HINT")) { local $::RD_HINT = defined $::RD_HINT ? $::RD_HINT : 1; _hint('Set $::RD_HINT (or -RD_HINT if you\'re using "perl -s") for hints on fixing these problems. Use $::RD_HINT = 0 to disable this message.'); } if ($ERRORS) { $ERRORS=0; return } return $self; } sub _addstartcode($$) { my ($self, $code) = @_; $code =~ s/\A\s*\{(.*)\}\Z/$1/s; $self->{"startcode"} .= "$code;\n"; } # CHECK FOR GRAMMAR PROBLEMS.... sub _check_insatiable($$$$) { my ($subrule,$repspec,$grammar,$line) = @_; pos($grammar)=pos($_[2]); return if $grammar =~ m/$OPTIONAL/gco || $grammar =~ m/$ANY/gco; my $min = 1; if ( $grammar =~ m/$MANY/gco || $grammar =~ m/$EXACTLY/gco || $grammar =~ m/$ATMOST/gco || $grammar =~ m/$BETWEEN/gco && do { $min=$2; 1 } || $grammar =~ m/$ATLEAST/gco && do { $min=$2; 1 } || $grammar =~ m/$SUBRULE(?!\s*:)/gco ) { return unless $1 eq $subrule && $min > 0; my $current_match = substr($grammar, $-[0], $+[0] - $-[0]); _warn(3,"Subrule sequence \"$subrule($repspec) $current_match\" will (almost certainly) fail.",$line) and _hint("Unless subrule \"$subrule\" performs some cunning lookahead, the repetition \"$subrule($repspec)\" will insatiably consume as many matches of \"$subrule\" as it can, leaving none to match the \"$current_match\" that follows."); } } sub _check_grammar ($) { my $self = shift; my $rules = $self->{"rules"}; my $rule; foreach $rule ( values %$rules ) { next if ! $rule->{"changed"}; # CHECK FOR UNDEFINED RULES my $call; foreach $call ( @{$rule->{"calls"}} ) { if (!defined ${$rules}{$call} &&!defined &{"Parse::RecDescent::$call"}) { if (!defined $::RD_AUTOSTUB) { _warn(3,"Undefined (sub)rule \"$call\" used in a production.") and _hint("Will you be providing this rule later, or did you perhaps misspell \"$call\"? Otherwise it will be treated as an immediate ."); eval "sub $self->{namespace}::$call {undef}"; } else # EXPERIMENTAL { my $rule = qq{'$call'}; if ($::RD_AUTOSTUB and $::RD_AUTOSTUB ne "1") { $rule = $::RD_AUTOSTUB; } _warn(1,"Autogenerating rule: $call") and _hint("A call was made to a subrule named \"$call\", but no such rule was specified. However, since \$::RD_AUTOSTUB was defined, a rule stub ($call : $rule) was automatically created."); $self->_generate("$call: $rule",0,1); } } } # CHECK FOR LEFT RECURSION if ($rule->isleftrec($rules)) { _error("Rule \"$rule->{name}\" is left-recursive."); _hint("Redesign the grammar so it's not left-recursive. That will probably mean you need to re-implement repetitions using the '(s)' notation. For example: \"$rule->{name}(s)\"."); next; } # CHECK FOR PRODUCTIONS FOLLOWING EMPTY PRODUCTIONS { my $hasempty; my $prod; foreach $prod ( @{$rule->{"prods"}} ) { if ($hasempty) { _error("Production " . $prod->describe . " for \"$rule->{name}\" will never be reached (preceding empty production will always match first)."); _hint("Reorder the grammar so that the empty production is last in the list or productions."); last; } $hasempty ||= $prod->isempty(); } } } } # GENERATE ACTUAL PARSER CODE sub _code($) { my $self = shift; my $initial_skip = defined($self->{skip}) ? '$skip = ' . $self->{skip} . ';' : $self->_dump([$skip],[qw(skip)]); my $code = qq! package $self->{namespace}; use strict; use vars qw(\$skip \$AUTOLOAD $self->{localvars} ); \@$self->{namespace}\::ISA = (); $initial_skip $self->{startcode} { local \$SIG{__WARN__} = sub {0}; # PRETEND TO BE IN Parse::RecDescent NAMESPACE *$self->{namespace}::AUTOLOAD = sub { no strict 'refs'; ! # This generated code uses ${"AUTOLOAD"} rather than $AUTOLOAD in # order to avoid the circular reference documented here: # https://rt.perl.org/rt3/Public/Bug/Display.html?id=110248 # As a result of the investigation of # https://rt.cpan.org/Ticket/Display.html?id=53710 . qq! \${"AUTOLOAD"} =~ s/^$self->{namespace}/Parse::RecDescent/; goto &{\${"AUTOLOAD"}}; } } !; $code .= "push \@$self->{namespace}\::ISA, 'Parse::RecDescent';"; $self->{"startcode"} = ''; my $rule; # sort the rules to ensure the output is reproducible foreach $rule ( sort { $a->{name} cmp $b->{name} } values %{$self->{"rules"}} ) { if ($rule->{"changed"}) { $code .= $rule->code($self->{"namespace"},$self); $rule->{"changed"} = 0; } } return $code; } # A wrapper for Data::Dumper->Dump, which localizes some variables to # keep the output in a form suitable for Parse::RecDescent. # # List of variables and their defaults taken from # $Data::Dumper::VERSION == 2.158 sub _dump { require Data::Dumper; # # Allow the user's settings to persist for some features in case # RD_TRACE is set. These shouldn't affect the eval()-ability of # the resulting parser. # #local $Data::Dumper::Indent = 2; #local $Data::Dumper::Useqq = 0; #local $Data::Dumper::Quotekeys = 1; #local $Data::Dumper::Useperl = 0; # # These may affect whether the output is valid perl code for # eval(), and must be controlled. Set them to their default # values. # local $Data::Dumper::Purity = 0; local $Data::Dumper::Pad = ""; local $Data::Dumper::Varname = "VAR"; local $Data::Dumper::Terse = 0; local $Data::Dumper::Freezer = ""; local $Data::Dumper::Toaster = ""; local $Data::Dumper::Deepcopy = 0; local $Data::Dumper::Bless = "bless"; local $Data::Dumper::Maxdepth = 0; local $Data::Dumper::Pair = ' => '; local $Data::Dumper::Deparse = 0; local $Data::Dumper::Sparseseen = 0; # # Modify the below options from their defaults. # # Sort the keys to ensure the output is reproducible local $Data::Dumper::Sortkeys = 1; # Don't stop recursing local $Data::Dumper::Maxrecurse = 0; return Data::Dumper->Dump(@_[1..$#_]); } # EXECUTING A PARSE.... sub AUTOLOAD # ($parser, $text; $linenum, @args) { croak "Could not find method: $AUTOLOAD\n" unless ref $_[0]; my $class = ref($_[0]) || $_[0]; my $text = ref($_[1]) eq 'SCALAR' ? ${$_[1]} : "$_[1]"; $_[0]->{lastlinenum} = _linecount($text); $_[0]->{lastlinenum} += ($_[2]||0) if @_ > 2; $_[0]->{offsetlinenum} = $_[0]->{lastlinenum}; $_[0]->{fulltext} = $text; $_[0]->{fulltextlen} = length $text; $_[0]->{linecounter_cache} = {}; $_[0]->{deferred} = []; $_[0]->{errors} = []; my @args = @_[3..$#_]; my $args = sub { [ @args ] }; $AUTOLOAD =~ s/$class/$_[0]->{namespace}/; no strict "refs"; local $::RD_WARN = $::RD_WARN || $_[0]->{__WARN__}; local $::RD_HINT = $::RD_HINT || $_[0]->{__HINT__}; local $::RD_TRACE = $::RD_TRACE || $_[0]->{__TRACE__}; croak "Unknown starting rule ($AUTOLOAD) called\n" unless defined &$AUTOLOAD; my $retval = &{$AUTOLOAD}( $_[0], # $parser $text, # $text undef, # $repeating undef, # $_noactions $args, # \@args undef, # $_itempos ); if (defined $retval) { foreach ( @{$_[0]->{deferred}} ) { &$_; } } else { foreach ( @{$_[0]->{errors}} ) { _error(@$_); } } if (ref $_[1] eq 'SCALAR') { ${$_[1]} = $text } $ERRORS = 0; return $retval; } sub _parserepeat($$$$$$$$$) # RETURNS A REF TO AN ARRAY OF MATCHES { my ($parser, $text, $prod, $min, $max, $_noactions, $expectation, $argcode, $_itempos) = @_; my @tokens = (); my $itemposfirst; my $reps; for ($reps=0; $reps<$max;) { $expectation->at($text); my $_savetext = $text; my $prevtextlen = length $text; my $_tok; if (! defined ($_tok = &$prod($parser,$text,1,$_noactions,$argcode,$_itempos))) { $text = $_savetext; last; } if (defined($_itempos) and !defined($itemposfirst)) { $itemposfirst = Parse::RecDescent::Production::_duplicate_itempos($_itempos); } push @tokens, $_tok if defined $_tok; last if ++$reps >= $min and $prevtextlen == length $text; } do { $expectation->failed(); return undef} if $reps<$min; if (defined $itemposfirst) { Parse::RecDescent::Production::_update_itempos($_itempos, $itemposfirst, undef, [qw(from)]); } $_[1] = $text; return [@tokens]; } sub set_autoflush { my $orig_selected = select $_[0]; $| = 1; select $orig_selected; return; } # ERROR REPORTING.... sub _write_ERROR { my ($errorprefix, $errortext) = @_; return if $errortext !~ /\S/; $errorprefix =~ s/\s+\Z//; local $^A = q{}; formline(<<'END_FORMAT', $errorprefix, $errortext); @>>>>>>>>>>>>>>>>>>>>: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< END_FORMAT formline(<<'END_FORMAT', $errortext); ~~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< END_FORMAT print {*STDERR} $^A; } # TRACING my $TRACE_FORMAT = <<'END_FORMAT'; @>|@|||||||||@^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<| | ~~ |^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<| END_FORMAT my $TRACECONTEXT_FORMAT = <<'END_FORMAT'; @>|@|||||||||@ |^<<<<<<<<<<<<<<<<<<<<<<<<<<< | ~~ | |^<<<<<<<<<<<<<<<<<<<<<<<<<<< END_FORMAT sub _write_TRACE { my ($tracelevel, $tracerulename, $tracemsg) = @_; return if $tracemsg !~ /\S/; $tracemsg =~ s/\s*\Z//; local $^A = q{}; my $bar = '|'; formline($TRACE_FORMAT, $tracelevel, $tracerulename, $bar, $tracemsg, $tracemsg); print {*STDERR} $^A; } sub _write_TRACECONTEXT { my ($tracelevel, $tracerulename, $tracecontext) = @_; return if $tracecontext !~ /\S/; $tracecontext =~ s/\s*\Z//; local $^A = q{}; my $bar = '|'; formline($TRACECONTEXT_FORMAT, $tracelevel, $tracerulename, $bar, $tracecontext, $tracecontext); print {*STDERR} $^A; } sub _verbosity($) { defined $::RD_TRACE or defined $::RD_HINT and $::RD_HINT and $_[0] =~ /ERRORS|WARN|HINT/ or defined $::RD_WARN and $::RD_WARN and $_[0] =~ /ERRORS|WARN/ or defined $::RD_ERRORS and $::RD_ERRORS and $_[0] =~ /ERRORS/ } sub _error($;$) { $ERRORS++; return 0 if ! _verbosity("ERRORS"); my $errortext = $_[0]; my $errorprefix = "ERROR" . ($_[1] ? " (line $_[1])" : ""); $errortext =~ s/\s+/ /g; print {*STDERR} "\n" if _verbosity("WARN"); _write_ERROR($errorprefix, $errortext); return 1; } sub _warn($$;$) { return 0 unless _verbosity("WARN") && ($::RD_HINT || $_[0] >= ($::RD_WARN||1)); my $errortext = $_[1]; my $errorprefix = "Warning" . ($_[2] ? " (line $_[2])" : ""); print {*STDERR} "\n" if _verbosity("HINT"); $errortext =~ s/\s+/ /g; _write_ERROR($errorprefix, $errortext); return 1; } sub _hint($) { return 0 unless $::RD_HINT; my $errortext = $_[0]; my $errorprefix = "Hint" . ($_[1] ? " (line $_[1])" : ""); $errortext =~ s/\s+/ /g; _write_ERROR($errorprefix, $errortext); return 1; } sub _tracemax($) { if (defined $::RD_TRACE && $::RD_TRACE =~ /\d+/ && $::RD_TRACE>1 && $::RD_TRACE+10..." . substr($_[0],-$::RD_TRACE/2); } else { return substr($_[0],0,500); } } sub _tracefirst($) { if (defined $::RD_TRACE && $::RD_TRACE =~ /\d+/ && $::RD_TRACE>1 && $::RD_TRACE+10"; } else { return substr($_[0],0,500); } } my $lastcontext = ''; my $lastrulename = ''; my $lastlevel = ''; sub _trace($;$$$) { my $tracemsg = $_[0]; my $tracecontext = $_[1]||$lastcontext; my $tracerulename = $_[2]||$lastrulename; my $tracelevel = $_[3]||$lastlevel; if ($tracerulename) { $lastrulename = $tracerulename } if ($tracelevel) { $lastlevel = $tracelevel } $tracecontext =~ s/\n/\\n/g; $tracecontext =~ s/\s+/ /g; $tracerulename = qq{$tracerulename}; _write_TRACE($tracelevel, $tracerulename, $tracemsg); if ($tracecontext ne $lastcontext) { if ($tracecontext) { $lastcontext = _tracefirst($tracecontext); $tracecontext = qq{"$tracecontext"}; } else { $tracecontext = qq{}; } _write_TRACECONTEXT($tracelevel, $tracerulename, $tracecontext); } } sub _matchtracemessage { my ($self, $reject) = @_; my $prefix = ''; my $postfix = ''; my $matched = not $reject; my @t = ("Matched", "Didn't match"); if (exists $self->{lookahead} and $self->{lookahead}) { $postfix = $reject ? "(reject)" : "(keep)"; $prefix = "..."; if ($self->{lookahead} < 0) { $prefix .= '!'; $matched = not $matched; } } $prefix . ($matched ? $t[0] : $t[1]) . $postfix; } sub _parseunneg($$$$$) { _parse($_[0],$_[1],$_[3],$_[4]); if ($_[2]<0) { _error("Can't negate \"$_[4]\".",$_[3]); _hint("You can't negate $_[0]. Remove the \"...!\" before \"$_[4]\"."); return 0; } return 1; } sub _parse($$$$) { my $what = $_[3]; $what =~ s/^\s+//; if ($_[1]) { _warn(3,"Found $_[0] ($what) after an unconditional ",$_[2]) and _hint("An unconditional always causes the production containing it to immediately fail. \u$_[0] that follows an will never be reached. Did you mean to use instead?"); } return if ! _verbosity("TRACE"); my $errortext = "Treating \"$what\" as $_[0]"; my $errorprefix = "Parse::RecDescent"; $errortext =~ s/\s+/ /g; _write_ERROR($errorprefix, $errortext); } sub _linecount($) { scalar substr($_[0], pos $_[0]||0) =~ tr/\n// } package main; use vars qw ( $RD_ERRORS $RD_WARN $RD_HINT $RD_TRACE $RD_CHECK ); $::RD_CHECK = 1; $::RD_ERRORS = 1; $::RD_WARN = 3; 1; __END__ =head1 NAME Parse::RecDescent - Generate Recursive-Descent Parsers =head1 VERSION This document describes version 1.967015 of Parse::RecDescent released April 4th, 2017. =head1 SYNOPSIS use Parse::RecDescent; # Generate a parser from the specification in $grammar: $parser = new Parse::RecDescent ($grammar); # Generate a parser from the specification in $othergrammar $anotherparser = new Parse::RecDescent ($othergrammar); # Parse $text using rule 'startrule' (which must be # defined in $grammar): $parser->startrule($text); # Parse $text using rule 'otherrule' (which must also # be defined in $grammar): $parser->otherrule($text); # Change the universal token prefix pattern # before building a grammar # (the default is: '\s*'): $Parse::RecDescent::skip = '[ \t]+'; # Replace productions of existing rules (or create new ones) # with the productions defined in $newgrammar: $parser->Replace($newgrammar); # Extend existing rules (or create new ones) # by adding extra productions defined in $moregrammar: $parser->Extend($moregrammar); # Global flags (useful as command line arguments under -s): $::RD_ERRORS # unless undefined, report fatal errors $::RD_WARN # unless undefined, also report non-fatal problems $::RD_HINT # if defined, also suggestion remedies $::RD_TRACE # if defined, also trace parsers' behaviour $::RD_AUTOSTUB # if defined, generates "stubs" for undefined rules $::RD_AUTOACTION # if defined, appends specified action to productions =head1 DESCRIPTION =head2 Overview Parse::RecDescent incrementally generates top-down recursive-descent text parsers from simple I-like grammar specifications. It provides: =over 4 =item * Regular expressions or literal strings as terminals (tokens), =item * Multiple (non-contiguous) productions for any rule, =item * Repeated and optional subrules within productions, =item * Full access to Perl within actions specified as part of the grammar, =item * Simple automated error reporting during parser generation and parsing, =item * The ability to commit to, uncommit to, or reject particular productions during a parse, =item * The ability to pass data up and down the parse tree ("down" via subrule argument lists, "up" via subrule return values) =item * Incremental extension of the parsing grammar (even during a parse), =item * Precompilation of parser objects, =item * User-definable reduce-reduce conflict resolution via "scoring" of matching productions. =back =head2 Using C Parser objects are created by calling C, passing in a grammar specification (see the following subsections). If the grammar is correct, C returns a blessed reference which can then be used to initiate parsing through any rule specified in the original grammar. A typical sequence looks like this: $grammar = q { # GRAMMAR SPECIFICATION HERE }; $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n"; # acquire $text defined $parser->startrule($text) or print "Bad text!\n"; The rule through which parsing is initiated must be explicitly defined in the grammar (i.e. for the above example, the grammar must include a rule of the form: "startrule: ". If the starting rule succeeds, its value (see below) is returned. Failure to generate the original parser or failure to match a text is indicated by returning C. Note that it's easy to set up grammars that can succeed, but which return a value of 0, "0", or "". So don't be tempted to write: $parser->startrule($text) or print "Bad text!\n"; Normally, the parser has no effect on the original text. So in the previous example the value of $text would be unchanged after having been parsed. If, however, the text to be matched is passed by reference: $parser->startrule(\$text) then any text which was consumed during the match will be removed from the start of $text. =head2 Rules In the grammar from which the parser is built, rules are specified by giving an identifier (which must satisfy /[A-Za-z]\w*/), followed by a colon I, followed by one or more productions, separated by single vertical bars. The layout of the productions is entirely free-format: rule1: production1 | production2 | production3 | production4 At any point in the grammar previously defined rules may be extended with additional productions. This is achieved by redeclaring the rule with the new productions. Thus: rule1: a | b | c rule2: d | e | f rule1: g | h is exactly equivalent to: rule1: a | b | c | g | h rule2: d | e | f Each production in a rule consists of zero or more items, each of which may be either: the name of another rule to be matched (a "subrule"), a pattern or string literal to be matched directly (a "token"), a block of Perl code to be executed (an "action"), a special instruction to the parser (a "directive"), or a standard Perl comment (which is ignored). A rule matches a text if one of its productions matches. A production matches if each of its items match consecutive substrings of the text. The productions of a rule being matched are tried in the same order that they appear in the original grammar, and the first matching production terminates the match attempt (successfully). If all productions are tried and none matches, the match attempt fails. Note that this behaviour is quite different from the "prefer the longer match" behaviour of I. For example, if I were parsing the rule: seq : 'A' 'B' | 'A' 'B' 'C' upon matching "AB" it would look ahead to see if a 'C' is next and, if so, will match the second production in preference to the first. In other words, I effectively tries all the productions of a rule breadth-first in parallel, and selects the "best" match, where "best" means longest (note that this is a gross simplification of the true behaviour of I but it will do for our purposes). In contrast, C tries each production depth-first in sequence, and selects the "best" match, where "best" means first. This is the fundamental difference between "bottom-up" and "recursive descent" parsing. Each successfully matched item in a production is assigned a value, which can be accessed in subsequent actions within the same production (or, in some cases, as the return value of a successful subrule call). Unsuccessful items don't have an associated value, since the failure of an item causes the entire surrounding production to immediately fail. The following sections describe the various types of items and their success values. =head2 Subrules A subrule which appears in a production is an instruction to the parser to attempt to match the named rule at that point in the text being parsed. If the named subrule is not defined when requested the production containing it immediately fails (unless it was "autostubbed" - see L). A rule may (recursively) call itself as a subrule, but I as the left-most item in any of its productions (since such recursions are usually non-terminating). The value associated with a subrule is the value associated with its C<$return> variable (see L<"Actions"> below), or with the last successfully matched item in the subrule match. Subrules may also be specified with a trailing repetition specifier, indicating that they are to be (greedily) matched the specified number of times. The available specifiers are: subrule(?) # Match one-or-zero times subrule(s) # Match one-or-more times subrule(s?) # Match zero-or-more times subrule(N) # Match exactly N times for integer N > 0 subrule(N..M) # Match between N and M times subrule(..M) # Match between 1 and M times subrule(N..) # Match at least N times Repeated subrules keep matching until either the subrule fails to match, or it has matched the minimal number of times but fails to consume any of the parsed text (this second condition prevents the subrule matching forever in some cases). Since a repeated subrule may match many instances of the subrule itself, the value associated with it is not a simple scalar, but rather a reference to a list of scalars, each of which is the value associated with one of the individual subrule matches. In other words in the rule: program: statement(s) the value associated with the repeated subrule "statement(s)" is a reference to an array containing the values matched by each call to the individual subrule "statement". Repetition modifiers may include a separator pattern: program: statement(s /;/) specifying some sequence of characters to be skipped between each repetition. This is really just a shorthand for the Eleftop:...E directive (see below). =head2 Tokens If a quote-delimited string or a Perl regex appears in a production, the parser attempts to match that string or pattern at that point in the text. For example: typedef: "typedef" typename identifier ';' identifier: /[A-Za-z_][A-Za-z0-9_]*/ As in regular Perl, a single quoted string is uninterpolated, whilst a double-quoted string or a pattern is interpolated (at the time of matching, I when the parser is constructed). Hence, it is possible to define rules in which tokens can be set at run-time: typedef: "$::typedefkeyword" typename identifier ';' identifier: /$::identpat/ Note that, since each rule is implemented inside a special namespace belonging to its parser, it is necessary to explicitly quantify variables from the main package. Regex tokens can be specified using just slashes as delimiters or with the explicit CdelimiterE......EdelimiterE> syntax: typedef: "typedef" typename identifier ';' typename: /[A-Za-z_][A-Za-z0-9_]*/ identifier: m{[A-Za-z_][A-Za-z0-9_]*} A regex of either type can also have any valid trailing parameter(s) (that is, any of [cgimsox]): typedef: "typedef" typename identifier ';' identifier: / [a-z_] # LEADING ALPHA OR UNDERSCORE [a-z0-9_]* # THEN DIGITS ALSO ALLOWED /ix # CASE/SPACE/COMMENT INSENSITIVE The value associated with any successfully matched token is a string containing the actual text which was matched by the token. It is important to remember that, since each grammar is specified in a Perl string, all instances of the universal escape character '\' within a grammar must be "doubled", so that they interpolate to single '\'s when the string is compiled. For example, to use the grammar: word: /\S+/ | backslash line: prefix word(s) "\n" backslash: '\\' the following code is required: $parser = new Parse::RecDescent (q{ word: /\\S+/ | backslash line: prefix word(s) "\\n" backslash: '\\\\' }); =head2 Anonymous subrules Parentheses introduce a nested scope that is very like a call to an anonymous subrule. Hence they are useful for "in-lining" subroutine calls, and other kinds of grouping behaviour. For example, instead of: word: /\S+/ | backslash line: prefix word(s) "\n" you could write: line: prefix ( /\S+/ | backslash )(s) "\n" and get exactly the same effects. Parentheses are also use for collecting unrepeated alternations within a single production. secret_identity: "Mr" ("Incredible"|"Fantastic"|"Sheen") ", Esq." =head2 Terminal Separators For the purpose of matching, each terminal in a production is considered to be preceded by a "prefix" - a pattern which must be matched before a token match is attempted. By default, the prefix is optional whitespace (which always matches, at least trivially), but this default may be reset in any production. The variable C<$Parse::RecDescent::skip> stores the universal prefix, which is the default for all terminal matches in all parsers built with C. If you want to change the universal prefix using C<$Parse::RecDescent::skip>, be careful to set it I creating the grammar object, because it is applied statically (when a grammar is built) rather than dynamically (when the grammar is used). Alternatively you can provide a global Cskip:...E> directive in your grammar before any rules (described later). The prefix for an individual production can be altered by using the Cskip:...E> directive (described later). Setting this directive in the top-level rule is an alternative approach to setting C<$Parse::RecDescent::skip> before creating the object, but in this case you don't get the intended skipping behaviour if you directly invoke methods different from the top-level rule. =head2 Actions An action is a block of Perl code which is to be executed (as the block of a C statement) when the parser reaches that point in a production. The action executes within a special namespace belonging to the active parser, so care must be taken in correctly qualifying variable names (see also L below). The action is considered to succeed if the final value of the block is defined (that is, if the implied C statement evaluates to a defined value - I). Note that the value associated with a successful action is also the final value in the block. An action will I if its last evaluated value is C. This is surprisingly easy to accomplish by accident. For instance, here's an infuriating case of an action that makes its production fail, but only when debugging I activated: description: name rank serial_number { print "Got $item[2] $item[1] ($item[3])\n" if $::debugging } If C<$debugging> is false, no statement in the block is executed, so the final value is C, and the entire production fails. The solution is: description: name rank serial_number { print "Got $item[2] $item[1] ($item[3])\n" if $::debugging; 1; } Within an action, a number of useful parse-time variables are available in the special parser namespace (there are other variables also accessible, but meddling with them will probably just break your parser. As a general rule, if you avoid referring to unqualified variables - especially those starting with an underscore - inside an action, things should be okay): =over 4 =item C<@item> and C<%item> The array slice C<@item[1..$#item]> stores the value associated with each item (that is, each subrule, token, or action) in the current production. The analogy is to C<$1>, C<$2>, etc. in a I grammar. Note that, for obvious reasons, C<@item> only contains the values of items I the current point in the production. The first element (C<$item[0]>) stores the name of the current rule being matched. C<@item> is a standard Perl array, so it can also be indexed with negative numbers, representing the number of items I from the current position in the parse: stuff: /various/ bits 'and' pieces "then" data 'end' { print $item[-2] } # PRINTS data # (EASIER THAN: $item[6]) The C<%item> hash complements the <@item> array, providing named access to the same item values: stuff: /various/ bits 'and' pieces "then" data 'end' { print $item{data} # PRINTS data # (EVEN EASIER THAN USING @item) The results of named subrules are stored in the hash under each subrule's name (including the repetition specifier, if any), whilst all other items are stored under a "named positional" key that indicates their ordinal position within their item type: __STRINGI__, __PATTERNI__, __DIRECTIVEI__, __ACTIONI__: stuff: /various/ bits 'and' pieces "then" data 'end' { save } { print $item{__PATTERN1__}, # PRINTS 'various' $item{__STRING2__}, # PRINTS 'then' $item{__ACTION1__}, # PRINTS RETURN # VALUE OF save } If you want proper I access to patterns or literals, you need to turn them into separate rules: stuff: various bits 'and' pieces "then" data 'end' { print $item{various} # PRINTS various } various: /various/ The special entry C<$item{__RULE__}> stores the name of the current rule (i.e. the same value as C<$item[0]>. The advantage of using C<%item>, instead of C<@items> is that it removes the need to track items positions that may change as a grammar evolves. For example, adding an interim CskipE> directive of action can silently ruin a trailing action, by moving an C<@item> element "down" the array one place. In contrast, the named entry of C<%item> is unaffected by such an insertion. A limitation of the C<%item> hash is that it only records the I value of a particular subrule. For example: range: '(' number '..' number )' { $return = $item{number} } will return only the value corresponding to the I match of the C subrule. In other words, successive calls to a subrule overwrite the corresponding entry in C<%item>. Once again, the solution is to rename each subrule in its own rule: range: '(' from_num '..' to_num ')' { $return = $item{from_num} } from_num: number to_num: number =item C<@arg> and C<%arg> The array C<@arg> and the hash C<%arg> store any arguments passed to the rule from some other rule (see L). Changes to the elements of either variable do not propagate back to the calling rule (data can be passed back from a subrule via the C<$return> variable - see next item). =item C<$return> If a value is assigned to C<$return> within an action, that value is returned if the production containing the action eventually matches successfully. Note that setting C<$return> I cause the current production to succeed. It merely tells it what to return if it I succeed. Hence C<$return> is analogous to C<$$> in a I grammar. If C<$return> is not assigned within a production, the value of the last component of the production (namely: C<$item[$#item]>) is returned if the production succeeds. =item C<$commit> The current state of commitment to the current production (see L<"Directives"> below). =item C<$skip> The current terminal prefix (see L<"Directives"> below). =item C<$text> The remaining (unparsed) text. Changes to C<$text> I out of unsuccessful productions, but I survive successful productions. Hence it is possible to dynamically alter the text being parsed - for example, to provide a C<#include>-like facility: hash_include: '#include' filename { $text = ::loadfile($item[2]) . $text } filename: '<' /[a-z0-9._-]+/i '>' { $return = $item[2] } | '"' /[a-z0-9._-]+/i '"' { $return = $item[2] } =item C<$thisline> and C<$prevline> C<$thisline> stores the current line number within the current parse (starting from 1). C<$prevline> stores the line number for the last character which was already successfully parsed (this will be different from C<$thisline> at the end of each line). For efficiency, C<$thisline> and C<$prevline> are actually tied hashes, and only recompute the required line number when the variable's value is used. Assignment to C<$thisline> adjusts the line number calculator, so that it believes that the current line number is the value being assigned. Note that this adjustment will be reflected in all subsequent line numbers calculations. Modifying the value of the variable C<$text> (as in the previous C example, for instance) will confuse the line counting mechanism. To prevent this, you should call C I after any assignment to the variable C<$text> (or, at least, before the next attempt to use C<$thisline>). Note that if a production fails after assigning to or resync'ing C<$thisline>, the parser's line counter mechanism will usually be corrupted. Also see the entry for C<@itempos>. The line number can be set to values other than 1, by calling the start rule with a second argument. For example: $parser = new Parse::RecDescent ($grammar); $parser->input($text, 10); # START LINE NUMBERS AT 10 =item C<$thiscolumn> and C<$prevcolumn> C<$thiscolumn> stores the current column number within the current line being parsed (starting from 1). C<$prevcolumn> stores the column number of the last character which was actually successfully parsed. Usually C<$prevcolumn == $thiscolumn-1>, but not at the end of lines. For efficiency, C<$thiscolumn> and C<$prevcolumn> are actually tied hashes, and only recompute the required column number when the variable's value is used. Assignment to C<$thiscolumn> or C<$prevcolumn> is a fatal error. Modifying the value of the variable C<$text> (as in the previous C example, for instance) may confuse the column counting mechanism. Note that C<$thiscolumn> reports the column number I any whitespace that might be skipped before reading a token. Hence if you wish to know where a token started (and ended) use something like this: rule: token1 token2 startcol token3 endcol token4 { print "token3: columns $item[3] to $item[5]"; } startcol: '' { $thiscolumn } # NEED THE '' TO STEP PAST TOKEN SEP endcol: { $prevcolumn } Also see the entry for C<@itempos>. =item C<$thisoffset> and C<$prevoffset> C<$thisoffset> stores the offset of the current parsing position within the complete text being parsed (starting from 0). C<$prevoffset> stores the offset of the last character which was actually successfully parsed. In all cases C<$prevoffset == $thisoffset-1>. For efficiency, C<$thisoffset> and C<$prevoffset> are actually tied hashes, and only recompute the required offset when the variable's value is used. Assignment to C<$thisoffset> or <$prevoffset> is a fatal error. Modifying the value of the variable C<$text> will I affect the offset counting mechanism. Also see the entry for C<@itempos>. =item C<@itempos> The array C<@itempos> stores a hash reference corresponding to each element of C<@item>. The elements of the hash provide the following: $itempos[$n]{offset}{from} # VALUE OF $thisoffset BEFORE $item[$n] $itempos[$n]{offset}{to} # VALUE OF $prevoffset AFTER $item[$n] $itempos[$n]{line}{from} # VALUE OF $thisline BEFORE $item[$n] $itempos[$n]{line}{to} # VALUE OF $prevline AFTER $item[$n] $itempos[$n]{column}{from} # VALUE OF $thiscolumn BEFORE $item[$n] $itempos[$n]{column}{to} # VALUE OF $prevcolumn AFTER $item[$n] Note that the various C<$itempos[$n]...{from}> values record the appropriate value I any token prefix has been skipped. Hence, instead of the somewhat tedious and error-prone: rule: startcol token1 endcol startcol token2 endcol startcol token3 endcol { print "token1: columns $item[1] to $item[3] token2: columns $item[4] to $item[6] token3: columns $item[7] to $item[9]" } startcol: '' { $thiscolumn } # NEED THE '' TO STEP PAST TOKEN SEP endcol: { $prevcolumn } it is possible to write: rule: token1 token2 token3 { print "token1: columns $itempos[1]{column}{from} to $itempos[1]{column}{to} token2: columns $itempos[2]{column}{from} to $itempos[2]{column}{to} token3: columns $itempos[3]{column}{from} to $itempos[3]{column}{to}" } Note however that (in the current implementation) the use of C<@itempos> anywhere in a grammar implies that item positioning information is collected I during the parse. Depending on the grammar and the size of the text to be parsed, this may be prohibitively expensive and the explicit use of C<$thisline>, C<$thiscolumn>, etc. may be a better choice. =item C<$thisparser> A reference to the S> object through which parsing was initiated. The value of C<$thisparser> propagates down the subrules of a parse but not back up. Hence, you can invoke subrules from another parser for the scope of the current rule as follows: rule: subrule1 subrule2 | { $thisparser = $::otherparser } | subrule3 subrule4 | subrule5 The result is that the production calls "subrule1" and "subrule2" of the current parser, and the remaining productions call the named subrules from C<$::otherparser>. Note, however that "Bad Things" will happen if C<::otherparser> isn't a blessed reference and/or doesn't have methods with the same names as the required subrules! =item C<$thisrule> A reference to the S> object corresponding to the rule currently being matched. =item C<$thisprod> A reference to the S> object corresponding to the production currently being matched. =item C<$score> and C<$score_return> $score stores the best production score to date, as specified by an earlier Cscore:...E> directive. $score_return stores the corresponding return value for the successful production. See L. =back B the parser relies on the information in the various C objects in some non-obvious ways. Tinkering with the other members of these objects will probably cause Bad Things to happen, unless you I know what you're doing. The only exception to this advice is that the use of C<$this...-E{local}> is always safe. =head2 Start-up Actions Any actions which appear I the first rule definition in a grammar are treated as "start-up" actions. Each such action is stripped of its outermost brackets and then evaluated (in the parser's special namespace) just before the rules of the grammar are first compiled. The main use of start-up actions is to declare local variables within the parser's special namespace: { my $lastitem = '???'; } list: item(s) { $return = $lastitem } item: book { $lastitem = 'book'; } bell { $lastitem = 'bell'; } candle { $lastitem = 'candle'; } but start-up actions can be used to execute I valid Perl code within a parser's special namespace. Start-up actions can appear within a grammar extension or replacement (that is, a partial grammar installed via C or C - see L), and will be executed before the new grammar is installed. Note, however, that a particular start-up action is only ever executed once. =head2 Autoactions It is sometimes desirable to be able to specify a default action to be taken at the end of every production (for example, in order to easily build a parse tree). If the variable C<$::RD_AUTOACTION> is defined when C is called, the contents of that variable are treated as a specification of an action which is to appended to each production in the corresponding grammar. Alternatively, you can hard-code the autoaction within a grammar, using the C<< >> directive. So, for example, to construct a simple parse tree you could write: $::RD_AUTOACTION = q { [@item] }; parser = Parse::RecDescent->new(q{ expression: and_expr '||' expression | and_expr and_expr: not_expr '&&' and_expr | not_expr not_expr: '!' brack_expr | brack_expr brack_expr: '(' expression ')' | identifier identifier: /[a-z]+/i }); or: parser = Parse::RecDescent->new(q{ expression: and_expr '||' expression | and_expr and_expr: not_expr '&&' and_expr | not_expr not_expr: '!' brack_expr | brack_expr brack_expr: '(' expression ')' | identifier identifier: /[a-z]+/i }); Either of these is equivalent to: parser = new Parse::RecDescent (q{ expression: and_expr '||' expression { [@item] } | and_expr { [@item] } and_expr: not_expr '&&' and_expr { [@item] } | not_expr { [@item] } not_expr: '!' brack_expr { [@item] } | brack_expr { [@item] } brack_expr: '(' expression ')' { [@item] } | identifier { [@item] } identifier: /[a-z]+/i { [@item] } }); Alternatively, we could take an object-oriented approach, use different classes for each node (and also eliminating redundant intermediate nodes): $::RD_AUTOACTION = q { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) }; parser = Parse::RecDescent->new(q{ expression: and_expr '||' expression | and_expr and_expr: not_expr '&&' and_expr | not_expr not_expr: '!' brack_expr | brack_expr brack_expr: '(' expression ')' | identifier identifier: /[a-z]+/i }); or: parser = Parse::RecDescent->new(q{ new(@item[1..$#item]) > expression: and_expr '||' expression | and_expr and_expr: not_expr '&&' and_expr | not_expr not_expr: '!' brack_expr | brack_expr brack_expr: '(' expression ')' | identifier identifier: /[a-z]+/i }); which are equivalent to: parser = Parse::RecDescent->new(q{ expression: and_expr '||' expression { "expression_node"->new(@item[1..3]) } | and_expr and_expr: not_expr '&&' and_expr { "and_expr_node"->new(@item[1..3]) } | not_expr not_expr: '!' brack_expr { "not_expr_node"->new(@item[1..2]) } | brack_expr brack_expr: '(' expression ')' { "brack_expr_node"->new(@item[1..3]) } | identifier identifier: /[a-z]+/i { "identifer_node"->new(@item[1]) } }); Note that, if a production already ends in an action, no autoaction is appended to it. For example, in this version: $::RD_AUTOACTION = q { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) }; parser = Parse::RecDescent->new(q{ expression: and_expr '&&' expression | and_expr and_expr: not_expr '&&' and_expr | not_expr not_expr: '!' brack_expr | brack_expr brack_expr: '(' expression ')' | identifier identifier: /[a-z]+/i { 'terminal_node'->new($item[1]) } }); each C match produces a C object, I an C object. A level 1 warning is issued each time an "autoaction" is added to some production. =head2 Autotrees A commonly needed autoaction is one that builds a parse-tree. It is moderately tricky to set up such an action (which must treat terminals differently from non-terminals), so Parse::RecDescent simplifies the process by providing the CautotreeE> directive. If this directive appears at the start of grammar, it causes Parse::RecDescent to insert autoactions at the end of any rule except those which already end in an action. The action inserted depends on whether the production is an intermediate rule (two or more items), or a terminal of the grammar (i.e. a single pattern or string item). So, for example, the following grammar: file : command(s) command : get | set | vet get : 'get' ident ';' set : 'set' ident 'to' value ';' vet : 'check' ident 'is' value ';' ident : /\w+/ value : /\d+/ is equivalent to: file : command(s) { bless \%item, $item[0] } command : get { bless \%item, $item[0] } | set { bless \%item, $item[0] } | vet { bless \%item, $item[0] } get : 'get' ident ';' { bless \%item, $item[0] } set : 'set' ident 'to' value ';' { bless \%item, $item[0] } vet : 'check' ident 'is' value ';' { bless \%item, $item[0] } ident : /\w+/ { bless {__VALUE__=>$item[1]}, $item[0] } value : /\d+/ { bless {__VALUE__=>$item[1]}, $item[0] } Note that each node in the tree is blessed into a class of the same name as the rule itself. This makes it easy to build object-oriented processors for the parse-trees that the grammar produces. Note too that the last two rules produce special objects with the single attribute '__VALUE__'. This is because they consist solely of a single terminal. This autoaction-ed grammar would then produce a parse tree in a data structure like this: { file => { command => { [ get => { identifier => { __VALUE__ => 'a' }, }, set => { identifier => { __VALUE__ => 'b' }, value => { __VALUE__ => '7' }, }, vet => { identifier => { __VALUE__ => 'b' }, value => { __VALUE__ => '7' }, }, ], }, } } (except, of course, that each nested hash would also be blessed into the appropriate class). You can also specify a base class for the CautotreeE> directive. The supplied prefix will be prepended to the rule names when creating tree nodes. The following are equivalent: And will produce a root node blessed into the C package in the example above. =head2 Autostubbing Normally, if a subrule appears in some production, but no rule of that name is ever defined in the grammar, the production which refers to the non-existent subrule fails immediately. This typically occurs as a result of misspellings, and is a sufficiently common occurrence that a warning is generated for such situations. However, when prototyping a grammar it is sometimes useful to be able to use subrules before a proper specification of them is really possible. For example, a grammar might include a section like: function_call: identifier '(' arg(s?) ')' identifier: /[a-z]\w*/i where the possible format of an argument is sufficiently complex that it is not worth specifying in full until the general function call syntax has been debugged. In this situation it is convenient to leave the real rule C undefined and just slip in a placeholder (or "stub"): arg: 'arg' so that the function call syntax can be tested with dummy input such as: f0() f1(arg) f2(arg arg) f3(arg arg arg) et cetera. Early in prototyping, many such "stubs" may be required, so C provides a means of automating their definition. If the variable C<$::RD_AUTOSTUB> is defined when a parser is built, a subrule reference to any non-existent rule (say, C), will cause a "stub" rule to be automatically defined in the generated parser. If C<$::RD_AUTOSTUB eq '1'> or is false, a stub rule of the form: subrule: 'subrule' will be generated. The special-case for a value of C<'1'> is to allow the use of the B with B<-RD_AUTOSTUB> without generating C per below. If C<$::RD_AUTOSTUB> is true, a stub rule of the form: subrule: $::RD_AUTOSTUB will be generated. C<$::RD_AUTOSTUB> must contain a valid production item, no checking is performed. No lazy evaluation of C<$::RD_AUTOSTUB> is performed, it is evaluated at the time the Parser is generated. Hence, with C<$::RD_AUTOSTUB> defined, it is possible to only partially specify a grammar, and then "fake" matches of the unspecified (sub)rules by just typing in their name, or a literal value that was assigned to C<$::RD_AUTOSTUB>. =head2 Look-ahead If a subrule, token, or action is prefixed by "...", then it is treated as a "look-ahead" request. That means that the current production can (as usual) only succeed if the specified item is matched, but that the matching I. This is very similar to the C look-ahead construct in Perl patterns. Thus, the rule: inner_word: word ...word will match whatever the subrule "word" matches, provided that match is followed by some more text which subrule "word" would also match (although this second substring is not actually consumed by "inner_word") Likewise, a "...!" prefix, causes the following item to succeed (without consuming any text) if and only if it would normally fail. Hence, a rule such as: identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/ matches a string of characters which satisfies the pattern C, but only if the same sequence of characters would not match either subrule "keyword" or the literal token '_'. Sequences of look-ahead prefixes accumulate, multiplying their positive and/or negative senses. Hence: inner_word: word ...!......!word is exactly equivalent to the original example above (a warning is issued in cases like these, since they often indicate something left out, or misunderstood). Note that actions can also be treated as look-aheads. In such cases, the state of the parser text (in the local variable C<$text>) I the look-ahead action is guaranteed to be identical to its state I the action, regardless of how it's changed I the action (unless you actually undefine C<$text>, in which case you get the disaster you deserve :-). =head2 Directives Directives are special pre-defined actions which may be used to alter the behaviour of the parser. There are currently twenty-three directives: CcommitE>, CuncommitE>, CrejectE>, CscoreE>, CautoscoreE>, CskipE>, CresyncE>, CerrorE>, CwarnE>, ChintE>, Ctrace_buildE>, Ctrace_parseE>, CnocheckE>, CrulevarE>, CmatchruleE>, CleftopE>, CrightopE>, CdeferE>, CnocheckE>, Cperl_quotelikeE>, Cperl_codeblockE>, Cperl_variableE>, and CtokenE>. =over 4 =item Committing and uncommitting The CcommitE> and CuncommitE> directives permit the recursive descent of the parse tree to be pruned (or "cut") for efficiency. Within a rule, a CcommitE> directive instructs the rule to ignore subsequent productions if the current production fails. For example: command: 'find' filename | 'open' filename | 'move' filename filename Clearly, if the leading token 'find' is matched in the first production but that production fails for some other reason, then the remaining productions cannot possibly match. The presence of the CcommitE> causes the "command" rule to fail immediately if an invalid "find" command is found, and likewise if an invalid "open" command is encountered. It is also possible to revoke a previous commitment. For example: if_statement: 'if' condition 'then' block 'else' block | 'if' condition 'then' block In this case, a failure to find an "else" block in the first production shouldn't preclude trying the second production, but a failure to find a "condition" certainly should. As a special case, any production in which the I item is an CuncommitE> immediately revokes a preceding CcommitE> (even though the production would not otherwise have been tried). For example, in the rule: request: 'explain' expression | 'explain' keyword | 'save' | 'quit' | term '?' if the text being matched was "explain?", and the first two productions failed, then the CcommitE> in production two would cause productions three and four to be skipped, but the leading CuncommitE> in the production five would allow that production to attempt a match. Note in the preceding example, that the CcommitE> was only placed in production two. If production one had been: request: 'explain' expression then production two would be (inappropriately) skipped if a leading "explain..." was encountered. Both CcommitE> and CuncommitE> directives always succeed, and their value is always 1. =item Rejecting a production The CrejectE> directive immediately causes the current production to fail (it is exactly equivalent to, but more obvious than, the action C<{undef}>). A CrejectE> is useful when it is desirable to get the side effects of the actions in one production, without prejudicing a match by some other production later in the rule. For example, to insert tracing code into the parse: complex_rule: { print "In complex rule...\n"; } complex_rule: simple_rule '+' 'i' '*' simple_rule | 'i' '*' simple_rule | simple_rule It is also possible to specify a conditional rejection, using the form Creject:IE>, which only rejects if the specified condition is true. This form of rejection is exactly equivalent to the action C<{(I)?undef:1}E>. For example: command: save_command | restore_command | { exit } | A CrejectE> directive never succeeds (and hence has no associated value). A conditional rejection may succeed (if its condition is not satisfied), in which case its value is 1. As an extra optimization, C ignores any production which I with an unconditional CrejectE> directive, since any such production can never successfully match or have any useful side-effects. A level 1 warning is issued in all such cases. Note that productions beginning with conditional Creject:...E> directives are I "optimized away" in this manner, even if they are always guaranteed to fail (for example: Creject:1E>) Due to the way grammars are parsed, there is a minor restriction on the condition of a conditional Creject:...E>: it cannot contain any raw '<' or '>' characters. For example: line: cmd max> data results in an error when a parser is built from this grammar (since the grammar parser has no way of knowing whether the first > is a "less than" or the end of the Creject:...E>. To overcome this problem, put the condition inside a do{} block: line: cmd max}> data Note that the same problem may occur in other directives that take arguments. The same solution will work in all cases. =item Skipping between terminals The CskipE> directive enables the terminal prefix used in a production to be changed. For example: OneLiner: Command Arg(s) /;/ causes only blanks and tabs to be skipped before terminals in the C subrule (and any of I subrules>, and also before the final C terminal. Once the production is complete, the previous terminal prefix is reinstated. Note that this implies that distinct productions of a rule must reset their terminal prefixes individually. The CskipE> directive evaluates to the I terminal prefix, so it's easy to reinstate a prefix later in a production: Command: CSV(s) Modifier The value specified after the colon is interpolated into a pattern, so all of the following are equivalent (though their efficiency increases down the list): # ASSUMING THE VARS HOLD THE OBVIOUS VALUES There is no way of directly setting the prefix for an entire rule, except as follows: Rule: Prod1 | Prod2a Prod2b | Prod3 or, better: Rule: ( Prod1 | Prod2a Prod2b | Prod3 ) The skip pattern is passed down to subrules, so setting the skip for the top-level rule as described above actually sets the prefix for the entire grammar (provided that you only call the method corresponding to the top-level rule itself). Alternatively, or if you have more than one top-level rule in your grammar, you can provide a global CskipE> directive prior to defining any rules in the grammar. These are the preferred alternatives to setting C<$Parse::RecDescent::skip>. Additionally, using CskipE> actually allows you to have a completely dynamic skipping behaviour. For example: Rule_with_dynamic_skip: Rule Then you can set C<$::skip_pattern> before invoking C and have it skip whatever you specified. B BskipE> directive added in 1.967_004 did not interpolate the pattern argument, instead the pattern was placed inside of single quotes and then interpolated. This behavior was changed in 1.967_010 so that all CskipE> directives behavior similarly.> =item Resynchronization The CresyncE> directive provides a visually distinctive means of consuming some of the text being parsed, usually to skip an erroneous input. In its simplest form CresyncE> simply consumes text up to and including the next newline (C<"\n">) character, succeeding only if the newline is found, in which case it causes its surrounding rule to return zero on success. In other words, a CresyncE> is exactly equivalent to the token C followed by the action S> (except that productions beginning with a CresyncE> are ignored when generating error messages). A typical use might be: script : command(s) command: save_command | restore_command | # TRY NEXT LINE, IF POSSIBLE It is also possible to explicitly specify a resynchronization pattern, using the Cresync:IE> variant. This version succeeds only if the specified pattern matches (and consumes) the parsed text. In other words, Cresync:IE> is exactly equivalent to the token C/> (followed by a S> action). For example, if commands were terminated by newlines or semi-colons: command: save_command | restore_command | The value of a successfully matched CresyncE> directive (of either type) is the text that it consumed. Note, however, that since the directive also sets C<$return>, a production consisting of a lone CresyncE> succeeds but returns the value zero (which a calling rule may find useful to distinguish between "true" matches and "tolerant" matches). Remember that returning a zero value indicates that the rule I (since only an C denotes failure within C parsers. =item Error handling The CerrorE> directive provides automatic or user-defined generation of error messages during a parse. In its simplest form CerrorE> prepares an error message based on the mismatch between the last item expected and the text which cause it to fail. For example, given the rule: McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!' | pronoun 'dead,' name '!' | the following strings would produce the following messages: =over 4 =item "Amen, Jim!" ERROR (line 1): Invalid McCoy: Expected curse or pronoun not found =item "Dammit, Jim, I'm a doctor!" ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a" but found ", I'm a doctor!" instead =item "He's dead,\n" ERROR (line 2): Invalid McCoy: Expected name not found =item "He's alive!" ERROR (line 1): Invalid McCoy: Expected 'dead,' but found "alive!" instead =item "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!" ERROR (line 1): Invalid McCoy: Expected a profession but found "pointy-eared Vulcan!" instead =back Note that, when autogenerating error messages, all underscores in any rule name used in a message are replaced by single spaces (for example "a_production" becomes "a production"). Judicious choice of rule names can therefore considerably improve the readability of automatic error messages (as well as the maintainability of the original grammar). If the automatically generated error is not sufficient, it is possible to provide an explicit message as part of the error directive. For example: Spock: "Fascinating ',' (name | 'Captain') '.' | "Highly illogical, doctor." | which would result in I failures to parse a "Spock" subrule printing the following message: ERROR (line ): Invalid Spock: He never said that! The error message is treated as a "qq{...}" string and interpolated when the error is generated (I when the directive is specified!). Hence: would correctly insert the ambient text string which caused the error. There are two other forms of error directive: Cerror?E> and Serror?: msgE>>. These behave just like CerrorE> and Serror: msgE>> respectively, except that they are only triggered if the rule is "committed" at the time they are encountered. For example: Scotty: "Ya kenna change the Laws of Phusics," name | name ',' 'she's goanta blaw!' | will only generate an error for a string beginning with "Ya kenna change the Laws o' Phusics," or a valid name, but which still fails to match the corresponding production. That is, C<$parser-EScotty("Aye, Cap'ain")> will fail silently (since neither production will "commit" the rule on that input), whereas SScotty("Mr Spock, ah jest kenna do'ut!")>> will fail with the error message: ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!' but found 'I jest kenna do'ut!' instead. since in that case the second production would commit after matching the leading name. Note that to allow this behaviour, all CerrorE> directives which are the first item in a production automatically uncommit the rule just long enough to allow their production to be attempted (that is, when their production fails, the commitment is reinstated so that subsequent productions are skipped). In order to I uncommit the rule before an error message, it is necessary to put an explicit CuncommitE> before the CerrorE>. For example: line: 'Kirk:' Kirk | 'Spock:' Spock | 'McCoy:' McCoy | | Error messages generated by the various Cerror...E> directives are not displayed immediately. Instead, they are "queued" in a buffer and are only displayed once parsing ultimately fails. Moreover, Cerror...E> directives that cause one production of a rule to fail are automatically removed from the message queue if another production subsequently causes the entire rule to succeed. This means that you can put Cerror...E> directives wherever useful diagnosis can be done, and only those associated with actual parser failure will ever be displayed. Also see L<"GOTCHAS">. As a general rule, the most useful diagnostics are usually generated either at the very lowest level within the grammar, or at the very highest. A good rule of thumb is to identify those subrules which consist mainly (or entirely) of terminals, and then put an Cerror...E> directive at the end of any other rule which calls one or more of those subrules. There is one other situation in which the output of the various types of error directive is suppressed; namely, when the rule containing them is being parsed as part of a "look-ahead" (see L<"Look-ahead">). In this case, the error directive will still cause the rule to fail, but will do so silently. An unconditional CerrorE> directive always fails (and hence has no associated value). This means that encountering such a directive always causes the production containing it to fail. Hence an CerrorE> directive will inevitably be the last (useful) item of a rule (a level 3 warning is issued if a production contains items after an unconditional CerrorE> directive). An Cerror?E> directive will I (that is: fail to fail :-), if the current rule is uncommitted when the directive is encountered. In that case the directive's associated value is zero. Hence, this type of error directive I be used before the end of a production. For example: command: 'do' something | 'report' something | B The Cerror?E> directive does I mean "always fail (but do so silently unless committed)". It actually means "only fail (and report) if committed, otherwise I". To achieve the "fail silently if uncommitted" semantics, it is necessary to use: rule: item item(s) | # FAIL SILENTLY UNLESS COMMITTED However, because people seem to expect a lone Cerror?E> directive to work like this: rule: item item(s) | | Parse::RecDescent automatically appends a CrejectE> directive if the Cerror?E> directive is the only item in a production. A level 2 warning (see below) is issued when this happens. The level of error reporting during both parser construction and parsing is controlled by the presence or absence of four global variables: C<$::RD_ERRORS>, C<$::RD_WARN>, C<$::RD_HINT>, and <$::RD_TRACE>. If C<$::RD_ERRORS> is defined (and, by default, it is) then fatal errors are reported. Whenever C<$::RD_WARN> is defined, certain non-fatal problems are also reported. Warnings have an associated "level": 1, 2, or 3. The higher the level, the more serious the warning. The value of the corresponding global variable (C<$::RD_WARN>) determines the I level of warning to be displayed. Hence, to see I warnings, set C<$::RD_WARN> to 1. To see only the most serious warnings set C<$::RD_WARN> to 3. By default C<$::RD_WARN> is initialized to 3, ensuring that serious but non-fatal errors are automatically reported. There is also a grammar directive to turn on warnings from within the grammar: C<< >>. It takes an optional argument, which specifies the warning level: C<< >>. See F<"DIAGNOSTICS"> for a list of the various error and warning messages that Parse::RecDescent generates when these two variables are defined. Defining any of the remaining variables (which are not defined by default) further increases the amount of information reported. Defining C<$::RD_HINT> causes the parser generator to offer more detailed analyses and hints on both errors and warnings. Note that setting C<$::RD_HINT> at any point automagically sets C<$::RD_WARN> to 1. There is also a C<< >> directive, which can be hard-coded into a grammar. Defining C<$::RD_TRACE> causes the parser generator and the parser to report their progress to STDERR in excruciating detail (although, without hints unless $::RD_HINT is separately defined). This detail can be moderated in only one respect: if C<$::RD_TRACE> has an integer value (I) greater than 1, only the I characters of the "current parsing context" (that is, where in the input string we are at any point in the parse) is reported at any time. C<$::RD_TRACE> is mainly useful for debugging a grammar that isn't behaving as you expected it to. To this end, if C<$::RD_TRACE> is defined when a parser is built, any actual parser code which is generated is also written to a file named "RD_TRACE" in the local directory. There are two directives associated with the C<$::RD_TRACE> variable. If a grammar contains a C<< >> directive anywhere in its specification, C<$::RD_TRACE> is turned on during the parser construction phase. If a grammar contains a C<< >> directive anywhere in its specification, C<$::RD_TRACE> is turned on during any parse the parser performs. Note that the four variables belong to the "main" package, which makes them easier to refer to in the code controlling the parser, and also makes it easy to turn them into command line flags ("-RD_ERRORS", "-RD_WARN", "-RD_HINT", "-RD_TRACE") under B. The corresponding directives are useful to "hardwire" the various debugging features into a particular grammar (rather than having to set and reset external variables). =item Redirecting diagnostics The diagnostics provided by the tracing mechanism always go to STDERR. If you need them to go elsewhere, localize and reopen STDERR prior to the parse. For example: { local *STDERR = IO::File->new(">$filename") or die $!; my $result = $parser->startrule($text); } =item Consistency checks Whenever a parser is build, Parse::RecDescent carries out a number of (potentially expensive) consistency checks. These include: verifying that the grammar is not left-recursive and that no rules have been left undefined. These checks are important safeguards during development, but unnecessary overheads when the grammar is stable and ready to be deployed. So Parse::RecDescent provides a directive to disable them: C<< >>. If a grammar contains a C<< >> directive anywhere in its specification, the extra compile-time checks are by-passed. =item Specifying local variables It is occasionally convenient to specify variables which are local to a single rule. This may be achieved by including a Crulevar:...E> directive anywhere in the rule. For example: markup: markup: tag {($tag=$item[1]) =~ s/^<|>$//g} body[$tag] The example Crulevar: $tagE> directive causes a "my" variable named C<$tag> to be declared at the start of the subroutine implementing the C rule (that is, I the first production, regardless of where in the rule it is specified). Specifically, any directive of the form: Crulevar:IE> causes a line of the form C;> to be added at the beginning of the rule subroutine, immediately after the definitions of the following local variables: $thisparser $commit $thisrule @item $thisline @arg $text %arg This means that the following CrulevarE> directives work as expected: If a variable that is also visible to subrules is required, it needs to be C'd, not C'd. C defaults to C, but if C is explicitly specified: then a C-ized variable is declared instead, and will be available within subrules. Note however that, because all such variables are "my" variables, their values I between match attempts on a given rule. To preserve values between match attempts, values can be stored within the "local" member of the C<$thisrule> object: countedrule: { $thisrule->{"local"}{"count"}++ } | subrule1 | subrule2 | {"local"}{"count"} == 1> subrule3 When matching a rule, each CrulevarE> directive is matched as if it were an unconditional CrejectE> directive (that is, it causes any production in which it appears to immediately fail to match). For this reason (and to improve readability) it is usual to specify any CrulevarE> directive in a separate production at the start of the rule (this has the added advantage that it enables C to optimize away such productions, just as it does for the CrejectE> directive). =item Dynamically matched rules Because regexes and double-quoted strings are interpolated, it is relatively easy to specify productions with "context sensitive" tokens. For example: command: keyword body "end $item[1]" which ensures that a command block is bounded by a "IkeywordE>...end Isame keywordE>" pair. Building productions in which subrules are context sensitive is also possible, via the Cmatchrule:...E> directive. This directive behaves identically to a subrule item, except that the rule which is invoked to match it is determined by the string specified after the colon. For example, we could rewrite the C rule like this: command: keyword "end $item[1]" Whatever appears after the colon in the directive is treated as an interpolated string (that is, as if it appeared in C operator) and the value of that interpolated string is the name of the subrule to be matched. Of course, just putting a constant string like C in a Cmatchrule:...E> directive is of little interest or benefit. The power of directive is seen when we use a string that interpolates to something interesting. For example: command: keyword "end $item[1]" keyword: 'while' | 'if' | 'function' while_body: condition block if_body: condition block ('else' block)(?) function_body: arglist block Now the C rule selects how to proceed on the basis of the keyword that is found. It is as if C were declared: command: 'while' while_body "end while" | 'if' if_body "end if" | 'function' function_body "end function" When a Cmatchrule:...E> directive is used as a repeated subrule, the rule name expression is "late-bound". That is, the name of the rule to be called is re-evaluated I a match attempt is made. Hence, the following grammar: { $::species = 'dogs' } pair: 'two' (s) dogs: /dogs/ { $::species = 'cats' } cats: /cats/ will match the string "two dogs cats cats" completely, whereas it will only match the string "two dogs dogs dogs" up to the eighth letter. If the rule name were "early bound" (that is, evaluated only the first time the directive is encountered in a production), the reverse behaviour would be expected. Note that the C directive takes a string that is to be treated as a rule name, I as a rule invocation. That is, it's like a Perl symbolic reference, not an C. Just as you can say: $subname = 'foo'; # and later... &{$foo}(@args); but not: $subname = 'foo(@args)'; # and later... &{$foo}; likewise you can say: $rulename = 'foo'; # and in the grammar... [@args] but not: $rulename = 'foo[@args]'; # and in the grammar... =item Deferred actions The Cdefer:...E> directive is used to specify an action to be performed when (and only if!) the current production ultimately succeeds. Whenever a Cdefer:...E> directive appears, the code it specifies is converted to a closure (an anonymous subroutine reference) which is queued within the active parser object. Note that, because the deferred code is converted to a closure, the values of any "local" variable (such as C<$text>, <@item>, etc.) are preserved until the deferred code is actually executed. If the parse ultimately succeeds I the production in which the Cdefer:...E> directive was evaluated formed part of the successful parse, then the deferred code is executed immediately before the parse returns. If however the production which queued a deferred action fails, or one of the higher-level rules which called that production fails, then the deferred action is removed from the queue, and hence is never executed. For example, given the grammar: sentence: noun trans noun | noun intrans noun: 'the dog' { print "$item[1]\t(noun)\n" } | 'the meat' { print "$item[1]\t(noun)\n" } trans: 'ate' { print "$item[1]\t(transitive)\n" } intrans: 'ate' { print "$item[1]\t(intransitive)\n" } | 'barked' { print "$item[1]\t(intransitive)\n" } then parsing the sentence C<"the dog ate"> would produce the output: the dog (noun) ate (transitive) the dog (noun) ate (intransitive) This is because, even though the first production of C ultimately fails, its initial subrules C and C do match, and hence they execute their associated actions. Then the second production of C succeeds, causing the actions of the subrules C and C to be executed as well. On the other hand, if the actions were replaced by Cdefer:...E> directives: sentence: noun trans noun | noun intrans noun: 'the dog' | 'the meat' trans: 'ate' intrans: 'ate' | 'barked' the output would be: the dog (noun) ate (intransitive) since deferred actions are only executed if they were evaluated in a production which ultimately contributes to the successful parse. In this case, even though the first production of C caused the subrules C and C to match, that production ultimately failed and so the deferred actions queued by those subrules were subsequently discarded. The second production then succeeded, causing the entire parse to succeed, and so the deferred actions queued by the (second) match of the C subrule and the subsequent match of C I preserved and eventually executed. Deferred actions provide a means of improving the performance of a parser, by only executing those actions which are part of the final parse-tree for the input data. Alternatively, deferred actions can be viewed as a mechanism for building (and executing) a customized subroutine corresponding to the given input data, much in the same way that autoactions (see L<"Autoactions">) can be used to build a customized data structure for specific input. Whether or not the action it specifies is ever executed, a Cdefer:...E> directive always succeeds, returning the number of deferred actions currently queued at that point. =item Parsing Perl Parse::RecDescent provides limited support for parsing subsets of Perl, namely: quote-like operators, Perl variables, and complete code blocks. The Cperl_quotelikeE> directive can be used to parse any Perl quote-like operator: C<'a string'>, C, C, etc. It does this by calling Text::Balanced::quotelike(). If a quote-like operator is found, a reference to an array of eight elements is returned. Those elements are identical to the last eight elements returned by Text::Balanced::extract_quotelike() in an array context, namely: =over 4 =item [0] the name of the quotelike operator -- 'q', 'qq', 'm', 's', 'tr' -- if the operator was named; otherwise C, =item [1] the left delimiter of the first block of the operation, =item [2] the text of the first block of the operation (that is, the contents of a quote, the regex of a match, or substitution or the target list of a translation), =item [3] the right delimiter of the first block of the operation, =item [4] the left delimiter of the second block of the operation if there is one (that is, if it is a C, C, or C); otherwise C, =item [5] the text of the second block of the operation if there is one (that is, the replacement of a substitution or the translation list of a translation); otherwise C, =item [6] the right delimiter of the second block of the operation (if any); otherwise C, =item [7] the trailing modifiers on the operation (if any); otherwise C. =back If a quote-like expression is not found, the directive fails with the usual C value. The Cperl_variableE> directive can be used to parse any Perl variable: $scalar, @array, %hash, $ref->{field}[$index], etc. It does this by calling Text::Balanced::extract_variable(). If the directive matches text representing a valid Perl variable specification, it returns that text. Otherwise it fails with the usual C value. The Cperl_codeblockE> directive can be used to parse curly-brace-delimited block of Perl code, such as: { $a = 1; f() =~ m/pat/; }. It does this by calling Text::Balanced::extract_codeblock(). If the directive matches text representing a valid Perl code block, it returns that text. Otherwise it fails with the usual C value. You can also tell it what kind of brackets to use as the outermost delimiters. For example: arglist: causes an arglist to match a perl code block whose outermost delimiters are C<(...)> (rather than the default C<{...}>). =item Constructing tokens Eventually, Parse::RecDescent will be able to parse tokenized input, as well as ordinary strings. In preparation for this joyous day, the Ctoken:...E> directive has been provided. This directive creates a token which will be suitable for input to a Parse::RecDescent parser (when it eventually supports tokenized input). The text of the token is the value of the immediately preceding item in the production. A Ctoken:...E> directive always succeeds with a return value which is the hash reference that is the new token. It also sets the return value for the production to that hash ref. The Ctoken:...E> directive makes it easy to build a Parse::RecDescent-compatible lexer in Parse::RecDescent: my $lexer = new Parse::RecDescent q { lex: token(s) token: /a\b/ | /the\b/ | /fly\b/ | /[a-z]+/i { lc $item[1] } | }; which will eventually be able to be used with a regular Parse::RecDescent grammar: my $parser = new Parse::RecDescent q { startrule: subrule1 subrule 2 # ETC... }; either with a pre-lexing phase: $parser->startrule( $lexer->lex($data) ); or with a lex-on-demand approach: $parser->startrule( sub{$lexer->token(\$data)} ); But at present, only the Ctoken:...E> directive is actually implemented. The rest is vapourware. =item Specifying operations One of the commonest requirements when building a parser is to specify binary operators. Unfortunately, in a normal grammar, the rules for such things are awkward: disjunction: conjunction ('or' conjunction)(s?) { $return = [ $item[1], @{$item[2]} ] } conjunction: atom ('and' atom)(s?) { $return = [ $item[1], @{$item[2]} ] } or inefficient: disjunction: conjunction 'or' disjunction { $return = [ $item[1], @{$item[2]} ] } | conjunction { $return = [ $item[1] ] } conjunction: atom 'and' conjunction { $return = [ $item[1], @{$item[2]} ] } | atom { $return = [ $item[1] ] } and either way is ugly and hard to get right. The Cleftop:...E> and Crightop:...E> directives provide an easier way of specifying such operations. Using Cleftop:...E> the above examples become: disjunction: conjunction: The Cleftop:...E> directive specifies a left-associative binary operator. It is specified around three other grammar elements (typically subrules or terminals), which match the left operand, the operator itself, and the right operand respectively. A Cleftop:...E> directive such as: disjunction: is converted to the following: disjunction: ( conjunction ('or' conjunction)(s?) { $return = [ $item[1], @{$item[2]} ] } ) In other words, a Cleftop:...E> directive matches the left operand followed by zero or more repetitions of both the operator and the right operand. It then flattens the matched items into an anonymous array which becomes the (single) value of the entire Cleftop:...E> directive. For example, an Cleftop:...E> directive such as: output: when given a string such as: cout << var << "str" << 3 would match, and C<$item[1]> would be set to: [ 'cout', 'var', '"str"', '3' ] In other words: output: is equivalent to a left-associative operator: output: ident { $return = [$item[1]] } | ident '<<' expr { $return = [@item[1,3]] } | ident '<<' expr '<<' expr { $return = [@item[1,3,5]] } | ident '<<' expr '<<' expr '<<' expr { $return = [@item[1,3,5,7]] } # ...etc... Similarly, the Crightop:...E> directive takes a left operand, an operator, and a right operand: assign: leftop:...E> and Crightop:...E> directives, the directive does not normally return the operator itself, just a list of the operands involved. This is particularly handy for specifying lists: list: '(' ')' { $return = $item[2] } There is, however, a problem: sometimes the operator is itself significant. For example, in a Perl list a comma and a C<=E> are both valid separators, but the C<=E> has additional stringification semantics. Hence it's important to know which was used in each case. To solve this problem the Cleftop:...E> and Crightop:...E> directives I return the operator(s) as well, under two circumstances. The first case is where the operator is specified as a subrule. In that instance, whatever the operator matches is returned (on the assumption that if the operator is important enough to have its own subrule, then it's important enough to return). The second case is where the operator is specified as a regular expression. In that case, if the first bracketed subpattern of the regular expression matches, that matching value is returned (this is analogous to the behaviour of the Perl C function, except that only the first subpattern is returned). In other words, given the input: ( a=>1, b=>2 ) the specifications: list: '(' ')' separator: ',' | '=>' or: list: '(' )/ list_item> ')' cause the list separators to be interleaved with the operands in the anonymous array in C<$item[2]>: [ 'a', '=>', '1', ',', 'b', '=>', '2' ] But the following version: list: '(' / list_item> ')' returns only the operators: [ 'a', '1', 'b', '2' ] Of course, none of the above specifications handle the case of an empty list, since the Cleftop:...E> and Crightop:...E> directives require at least a single right or left operand to match. To specify that the operator can match "trivially", it's necessary to add a C<(s?)> qualifier to the directive: list: '(' )/ list_item>(s?) ')' Note that in almost all the above examples, the first and third arguments of the C<> directive were the same subrule. That is because C<>'s are frequently used to specify "separated" lists of the same type of item. To make such lists easier to specify, the following syntax: list: element(s /,/) is exactly equivalent to: list: Note that the separator must be specified as a raw pattern (i.e. not a string or subrule). =item Scored productions By default, Parse::RecDescent grammar rules always accept the first production that matches the input. But if two or more productions may potentially match the same input, choosing the first that does so may not be optimal. For example, if you were parsing the sentence "time flies like an arrow", you might use a rule like this: sentence: verb noun preposition article noun { [@item] } | adjective noun verb article noun { [@item] } | noun verb preposition article noun { [@item] } Each of these productions matches the sentence, but the third one is the most likely interpretation. However, if the sentence had been "fruit flies like a banana", then the second production is probably the right match. To cater for such situations, the Cscore:...E> can be used. The directive is equivalent to an unconditional CrejectE>, except that it allows you to specify a "score" for the current production. If that score is numerically greater than the best score of any preceding production, the current production is cached for later consideration. If no later production matches, then the cached production is treated as having matched, and the value of the item immediately before its Cscore:...E> directive is returned as the result. In other words, by putting a Cscore:...E> directive at the end of each production, you can select which production matches using criteria other than specification order. For example: sentence: verb noun preposition article noun { [@item] } | adjective noun verb article noun { [@item] } | noun verb preposition article noun { [@item] } Now, when each production reaches its respective Cscore:...E> directive, the subroutine C will be called to evaluate the matched items (somehow). Once all productions have been tried, the one which C scored most highly will be the one that is accepted as a match for the rule. The variable $score always holds the current best score of any production, and the variable $score_return holds the corresponding return value. As another example, the following grammar matches lines that may be separated by commas, colons, or semi-colons. This can be tricky if a colon-separated line also contains commas, or vice versa. The grammar resolves the ambiguity by selecting the rule that results in the fewest fields: line: seplist[sep=>','] | seplist[sep=>':'] | seplist[sep=>" "] seplist: Note the use of negation within the Cscore:...E> directive to ensure that the seplist with the most items gets the lowest score. As the above examples indicate, it is often the case that all productions in a rule use exactly the same Cscore:...E> directive. It is tedious to have to repeat this identical directive in every production, so Parse::RecDescent also provides the Cautoscore:...E> directive. If an Cautoscore:...E> directive appears in any production of a rule, the code it specifies is used as the scoring code for every production of that rule, except productions that already end with an explicit Cscore:...E> directive. Thus the rules above could be rewritten: line: line: seplist[sep=>','] | seplist[sep=>':'] | seplist[sep=>" "] sentence: | verb noun preposition article noun { [@item] } | adjective noun verb article noun { [@item] } | noun verb preposition article noun { [@item] } Note that the Cautoscore:...E> directive itself acts as an unconditional CrejectE>, and (like the Crulevar:...E> directive) is pruned at compile-time wherever possible. =item Dispensing with grammar checks During the compilation phase of parser construction, Parse::RecDescent performs a small number of checks on the grammar it's given. Specifically it checks that the grammar is not left-recursive, that there are no "insatiable" constructs of the form: rule: subrule(s) subrule and that there are no rules missing (i.e. referred to, but never defined). These checks are important during development, but can slow down parser construction in stable code. So Parse::RecDescent provides the EnocheckE directive to turn them off. The directive can only appear before the first rule definition, and switches off checking throughout the rest of the current grammar. Typically, this directive would be added when a parser has been thoroughly tested and is ready for release. =back =head2 Subrule argument lists It is occasionally useful to pass data to a subrule which is being invoked. For example, consider the following grammar fragment: classdecl: keyword decl keyword: 'struct' | 'class'; decl: # WHATEVER The C rule might wish to know which of the two keywords was used (since it may affect some aspect of the way the subsequent declaration is interpreted). C allows the grammar designer to pass data into a rule, by placing that data in an I (that is, in square brackets) immediately after any subrule item in a production. Hence, we could pass the keyword to C as follows: classdecl: keyword decl[ $item[1] ] keyword: 'struct' | 'class'; decl: # WHATEVER The argument list can consist of any number (including zero!) of comma-separated Perl expressions. In other words, it looks exactly like a Perl anonymous array reference. For example, we could pass the keyword, the name of the surrounding rule, and the literal 'keyword' to C like so: classdecl: keyword decl[$item[1],$item[0],'keyword'] keyword: 'struct' | 'class'; decl: # WHATEVER Within the rule to which the data is passed (C in the above examples) that data is available as the elements of a local variable C<@arg>. Hence C might report its intentions as follows: classdecl: keyword decl[$item[1],$item[0],'keyword'] keyword: 'struct' | 'class'; decl: { print "Declaring $arg[0] (a $arg[2])\n"; print "(this rule called by $arg[1])" } Subrule argument lists can also be interpreted as hashes, simply by using the local variable C<%arg> instead of C<@arg>. Hence we could rewrite the previous example: classdecl: keyword decl[keyword => $item[1], caller => $item[0], type => 'keyword'] keyword: 'struct' | 'class'; decl: { print "Declaring $arg{keyword} (a $arg{type})\n"; print "(this rule called by $arg{caller})" } Both C<@arg> and C<%arg> are always available, so the grammar designer may choose whichever convention (or combination of conventions) suits best. Subrule argument lists are also useful for creating "rule templates" (especially when used in conjunction with the Cmatchrule:...E> directive). For example, the subrule: list: /$arg{sep}/ list[%arg] { $return = [ $item[1], @{$item[3]} ] } | { $return = [ $item[1]] } is a handy template for the common problem of matching a separated list. For example: function: 'func' name '(' list[rule=>'param',sep=>';'] ')' param: list[rule=>'name',sep=>','] ':' typename name: /\w+/ typename: name When a subrule argument list is used with a repeated subrule, the argument list goes I the repetition specifier: list: /some|many/ thing[ $item[1] ](s) The argument list is "late bound". That is, it is re-evaluated for every repetition of the repeated subrule. This means that each repeated attempt to match the subrule may be passed a completely different set of arguments if the value of the expression in the argument list changes between attempts. So, for example, the grammar: { $::species = 'dogs' } pair: 'two' animal[$::species](s) animal: /$arg[0]/ { $::species = 'cats' } will match the string "two dogs cats cats" completely, whereas it will only match the string "two dogs dogs dogs" up to the eighth letter. If the value of the argument list were "early bound" (that is, evaluated only the first time a repeated subrule match is attempted), one would expect the matching behaviours to be reversed. Of course, it is possible to effectively "early bind" such argument lists by passing them a value which does not change on each repetition. For example: { $::species = 'dogs' } pair: 'two' { $::species } animal[$item[2]](s) animal: /$arg[0]/ { $::species = 'cats' } Arguments can also be passed to the start rule, simply by appending them to the argument list with which the start rule is called (I the "line number" parameter). For example, given: $parser = new Parse::RecDescent ( $grammar ); $parser->data($text, 1, "str", 2, \@arr); # ^^^^^ ^ ^^^^^^^^^^^^^^^ # | | | # TEXT TO BE PARSED | | # STARTING LINE NUMBER | # ELEMENTS OF @arg WHICH IS PASSED TO RULE data then within the productions of the rule C, the array C<@arg> will contain C<("str", 2, \@arr)>. =head2 Alternations Alternations are implicit (unnamed) rules defined as part of a production. An alternation is defined as a series of '|'-separated productions inside a pair of round brackets. For example: character: 'the' ( good | bad | ugly ) /dude/ Every alternation implicitly defines a new subrule, whose automatically-generated name indicates its origin: "_alternation__of_production_

_of_rule" for the appropriate values of ,

, and . A call to this implicit subrule is then inserted in place of the brackets. Hence the above example is merely a convenient short-hand for: character: 'the' _alternation_1_of_production_1_of_rule_character /dude/ _alternation_1_of_production_1_of_rule_character: good | bad | ugly Since alternations are parsed by recursively calling the parser generator, any type(s) of item can appear in an alternation. For example: character: 'the' ( 'high' "plains" # Silent, with poncho | /no[- ]name/ # Silent, no poncho | vengeance_seeking # Poncho-optional | ) drifter In this case, if an error occurred, the automatically generated message would be: ERROR (line ): Invalid implicit subrule: Expected 'high' or /no[- ]name/ or generic, but found "pacifist" instead Since every alternation actually has a name, it's even possible to extend or replace them: parser->Replace( "_alternation_1_of_production_1_of_rule_character: 'generic Eastwood'" ); More importantly, since alternations are a form of subrule, they can be given repetition specifiers: character: 'the' ( good | bad | ugly )(?) /dude/ =head2 Incremental Parsing C provides two methods - C and C - which can be used to alter the grammar matched by a parser. Both methods take the same argument as C, namely a grammar specification string C interprets the grammar specification and adds any productions it finds to the end of the rules for which they are specified. For example: $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/"; parser->Extend($add); adds two productions to the rule "name" (creating it if necessary) and one production to the rule "desc". C is identical, except that it first resets are rule specified in the additional grammar, removing any existing productions. Hence after: $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/"; parser->Replace($add); there are I valid "name"s and the one possible description. A more interesting use of the C and C methods is to call them inside the action of an executing parser. For example: typedef: 'typedef' type_name identifier ';' { $thisparser->Extend("type_name: '$item[3]'") } | identifier: ...!type_name /[A-Za-z_]w*/ which automatically prevents type names from being typedef'd, or: command: 'map' key_name 'to' abort_key { $thisparser->Replace("abort_key: '$item[2]'") } | 'map' key_name 'to' key_name { map_key($item[2],$item[4]) } | abort_key { exit if confirm("abort?") } abort_key: 'q' key_name: ...!abort_key /[A-Za-z]/ which allows the user to change the abort key binding, but not to unbind it. The careful use of such constructs makes it possible to reconfigure a a running parser, eliminating the need for semantic feedback by providing syntactic feedback instead. However, as currently implemented, C and C have to regenerate and re-C the entire parser whenever they are called. This makes them quite slow for large grammars. In such cases, the judicious use of an interpolated regex is likely to be far more efficient: typedef: 'typedef' type_name/ identifier ';' { $thisparser->{local}{type_name} .= "|$item[3]" } | identifier: ...!type_name /[A-Za-z_]w*/ type_name: /$thisparser->{local}{type_name}/ =head2 Precompiling parsers Normally Parse::RecDescent builds a parser from a grammar at run-time. That approach simplifies the design and implementation of parsing code, but has the disadvantage that it slows the parsing process down - you have to wait for Parse::RecDescent to build the parser every time the program runs. Long or complex grammars can be particularly slow to build, leading to unacceptable delays at start-up. To overcome this, the module provides a way of "pre-building" a parser object and saving it in a separate module. That module can then be used to create clones of the original parser. A grammar may be precompiled using the C class method. For example, to precompile a grammar stored in the scalar $grammar, and produce a class named PreGrammar in a module file named PreGrammar.pm, you could use: use Parse::RecDescent; Parse::RecDescent->Precompile([$options_hashref], $grammar, "PreGrammar", ["RuntimeClass"]); The first required argument is the grammar string, the second is the name of the class to be built. The name of the module file is generated automatically by appending ".pm" to the last element of the class name. Thus Parse::RecDescent->Precompile($grammar, "My::New::Parser"); would produce a module file named Parser.pm. After the class name, you may specify the name of the runtime_class called by the Precompiled parser. See L for more details. An optional hash reference may be supplied as the first argument to C. This argument is currently EXPERIMENTAL, and may change in a future release of Parse::RecDescent. The only supported option is currently C<-standalone>, see L. It is somewhat tedious to have to write a small Perl program just to generate a precompiled grammar class, so Parse::RecDescent has some special magic that allows you to do the job directly from the command-line. If your grammar is specified in a file named F, you can generate a class named Yet::Another::Grammar like so: > perl -MParse::RecDescent - grammar Yet::Another::Grammar [Runtime::Class] This would produce a file named F containing the full definition of a class called Yet::Another::Grammar. Of course, to use that class, you would need to put the F file in a directory named F, somewhere in your Perl include path. Having created the new class, it's very easy to use it to build a parser. You simply C the new module, and then call its C method to create a parser object. For example: use Yet::Another::Grammar; my $parser = Yet::Another::Grammar->new(); The effect of these two lines is exactly the same as: use Parse::RecDescent; open GRAMMAR_FILE, "grammar" or die; local $/; my $grammar = ; my $parser = Parse::RecDescent->new($grammar); only considerably faster. Note however that the parsers produced by either approach are exactly the same, so whilst precompilation has an effect on I speed, it has no effect on I speed. RecDescent 2.0 will address that problem. =head3 Standalone precompiled parsers Until version 1.967003 of Parse::RecDescent, parser modules built with C were dependent on Parse::RecDescent. Future Parse::RecDescent releases with different internal implementations would break pre-existing precompiled parsers. Version 1.967_005 added the ability for Parse::RecDescent to include itself in the resulting .pm file if you pass the boolean option C<-standalone> to C: Parse::RecDescent->Precompile({ -standalone => 1, }, $grammar, "My::New::Parser"); Parse::RecDescent is included as C<$class::_Runtime> in order to avoid conflicts between an installed version of Parse::RecDescent and other precompiled, standalone parser made with Parse::RecDescent. The name of this class may be changed with the C<-runtime_class> option to Precompile. This renaming is experimental, and is subject to change in future versions. Precompiled parsers remain dependent on Parse::RecDescent by default, as this feature is still considered experimental. In the future, standalone parsers will become the default. =head3 Precompiled runtimes Standalone precompiled parsers each include a copy of Parse::RecDescent. For users who have a family of related precompiled parsers, this is very inefficient. C now supports an experimental C<-runtime_class> option. To build a precompiled parser with a different runtime name, call: Parse::RecDescent->Precompile({ -standalone => 1, -runtime_class => "My::Runtime", }, $grammar, "My::New::Parser"); The resulting standalone parser will contain a copy of Parse::RecDescent, renamed to "My::Runtime". To build a set of parsers that C a custom-named runtime, without including that runtime in the output, simply build those parsers with C<-runtime_class> and without C<-standalone>: Parse::RecDescent->Precompile({ -runtime_class => "My::Runtime", }, $grammar, "My::New::Parser"); The runtime itself must be generated as well, so that it may be Cd by My::New::Parser. To generate the runtime file, use one of the two folling calls: Parse::RecDescent->PrecompiledRuntime("My::Runtime"); Parse::RecDescent->Precompile({ -standalone => 1, -runtime_class => "My::Runtime", }, '', # empty grammar "My::Runtime"); =head1 GOTCHAS This section describes common mistakes that grammar writers seem to make on a regular basis. =head2 1. Expecting an error to always invalidate a parse A common mistake when using error messages is to write the grammar like this: file: line(s) line: line_type_1 | line_type_2 | line_type_3 | The expectation seems to be that any line that is not of type 1, 2 or 3 will invoke the CerrorE> directive and thereby cause the parse to fail. Unfortunately, that only happens if the error occurs in the very first line. The first rule states that a C is matched by one or more lines, so if even a single line succeeds, the first rule is completely satisfied and the parse as a whole succeeds. That means that any error messages generated by subsequent failures in the C rule are quietly ignored. Typically what's really needed is this: file: line(s) eofile { $return = $item[1] } line: line_type_1 | line_type_2 | line_type_3 | eofile: /^\Z/ The addition of the C subrule to the first production means that a file only matches a series of successful C matches I. If any input text remains after the lines are matched, there must have been an error in the last C. In that case the C rule will fail, causing the entire C rule to fail too. Note too that C must match C (end-of-text), I C or C (end-of-file). And don't forget the action at the end of the production. If you just write: file: line(s) eofile then the value returned by the C rule will be the value of its last item: C. Since C always returns an empty string on success, that will cause the C rule to return that empty string. Apart from returning the wrong value, returning an empty string will trip up code such as: $parser->file($filetext) || die; (since "" is false). Remember that Parse::RecDescent returns undef on failure, so the only safe test for failure is: defined($parser->file($filetext)) || die; =head2 2. Using a C in an action An action is like a C block inside the subroutine implementing the surrounding rule. So if you put a C statement in an action: range: '(' start '..' end )' { return $item{end} } /\s+/ that subroutine will immediately return, without checking the rest of the items in the current production (e.g. the C) and without setting up the necessary data structures to tell the parser that the rule has succeeded. The correct way to set a return value in an action is to set the C<$return> variable: range: '(' start '..' end )' { $return = $item{end} } /\s+/ =head2 2. Setting C<$Parse::RecDescent::skip> at parse time If you want to change the default skipping behaviour (see L and the Cskip:...E> directive) by setting C<$Parse::RecDescent::skip> you have to remember to set this variable I creating the grammar object. For example, you might want to skip all Perl-like comments with this regular expression: my $skip_spaces_and_comments = qr/ (?mxs: \s+ # either spaces | \# .*?$ # or a dash and whatever up to the end of line )* # repeated at will (in whatever order) /; And then: my $parser1 = Parse::RecDescent->new($grammar); $Parse::RecDescent::skip = $skip_spaces_and_comments; my $parser2 = Parse::RecDescent->new($grammar); $parser1->parse($text); # this does not cope with comments $parser2->parse($text); # this skips comments correctly The two parsers behave differently, because any skipping behaviour specified via C<$Parse::RecDescent::skip> is hard-coded when the grammar object is built, not at parse time. =head1 DIAGNOSTICS Diagnostics are intended to be self-explanatory (particularly if you use B<-RD_HINT> (under B) or define C<$::RD_HINT> inside the program). C currently diagnoses the following: =over 4 =item * Invalid regular expressions used as pattern terminals (fatal error). =item * Invalid Perl code in code blocks (fatal error). =item * Lookahead used in the wrong place or in a nonsensical way (fatal error). =item * "Obvious" cases of left-recursion (fatal error). =item * Missing or extra components in a CleftopE> or CrightopE> directive. =item * Unrecognisable components in the grammar specification (fatal error). =item * "Orphaned" rule components specified before the first rule (fatal error) or after an CerrorE> directive (level 3 warning). =item * Missing rule definitions (this only generates a level 3 warning, since you may be providing them later via C). =item * Instances where greedy repetition behaviour will almost certainly cause the failure of a production (a level 3 warning - see L<"ON-GOING ISSUES AND FUTURE DIRECTIONS"> below). =item * Attempts to define rules named 'Replace' or 'Extend', which cannot be called directly through the parser object because of the predefined meaning of C and C. (Only a level 2 warning is generated, since such rules I still be used as subrules). =item * Productions which consist of a single Cerror?E> directive, and which therefore may succeed unexpectedly (a level 2 warning, since this might conceivably be the desired effect). =item * Multiple consecutive lookahead specifiers (a level 1 warning only, since their effects simply accumulate). =item * Productions which start with a CrejectE> or Crulevar:...E> directive. Such productions are optimized away (a level 1 warning). =item * Rules which are autogenerated under C<$::AUTOSTUB> (a level 1 warning). =back =head1 AUTHOR Damian Conway (damian@conway.org) Jeremy T. Braun (JTBRAUN@CPAN.org) [current maintainer] =head1 BUGS AND IRRITATIONS There are undoubtedly serious bugs lurking somewhere in this much code :-) Bug reports, test cases and other feedback are most welcome. Ongoing annoyances include: =over 4 =item * There's no support for parsing directly from an input stream. If and when the Perl Gods give us regular expressions on streams, this should be trivial (ahem!) to implement. =item * The parser generator can get confused if actions aren't properly closed or if they contain particularly nasty Perl syntax errors (especially unmatched curly brackets). =item * The generator only detects the most obvious form of left recursion (potential recursion on the first subrule in a rule). More subtle forms of left recursion (for example, through the second item in a rule after a "zero" match of a preceding "zero-or-more" repetition, or after a match of a subrule with an empty production) are not found. =item * Instead of complaining about left-recursion, the generator should silently transform the grammar to remove it. Don't expect this feature any time soon as it would require a more sophisticated approach to parser generation than is currently used. =item * The generated parsers don't always run as fast as might be wished. =item * The meta-parser should be bootstrapped using C :-) =back =head1 ON-GOING ISSUES AND FUTURE DIRECTIONS =over 4 =item 1. Repetitions are "incorrigibly greedy" in that they will eat everything they can and won't backtrack if that behaviour causes a production to fail needlessly. So, for example: rule: subrule(s) subrule will I succeed, because the repetition will eat all the subrules it finds, leaving none to match the second item. Such constructions are relatively rare (and C generates a warning whenever they occur) so this may not be a problem, especially since the insatiable behaviour can be overcome "manually" by writing: rule: penultimate_subrule(s) subrule penultimate_subrule: subrule ...subrule The issue is that this construction is exactly twice as expensive as the original, whereas backtracking would add only 1/I to the cost (for matching I repetitions of C). I would welcome feedback on the need for backtracking; particularly on cases where the lack of it makes parsing performance problematical. =item 2. Having opened that can of worms, it's also necessary to consider whether there is a need for non-greedy repetition specifiers. Again, it's possible (at some cost) to manually provide the required functionality: rule: nongreedy_subrule(s) othersubrule nongreedy_subrule: subrule ...!othersubrule Overall, the issue is whether the benefit of this extra functionality outweighs the drawbacks of further complicating the (currently minimalist) grammar specification syntax, and (worse) introducing more overhead into the generated parsers. =item 3. An CautocommitE> directive would be nice. That is, it would be useful to be able to say: command: command: 'find' name | 'find' address | 'do' command 'at' time 'if' condition | 'do' command 'at' time | 'do' command | unusual_command and have the generator work out that this should be "pruned" thus: command: 'find' name | 'find' address | 'do' command 'at' time 'if' condition | 'do' command 'at' time | 'do' command | unusual_command There are several issues here. Firstly, should the CautocommitE> automatically install an CuncommitE> at the start of the last production (on the grounds that the "command" rule doesn't know whether an "unusual_command" might start with "find" or "do") or should the "unusual_command" subgraph be analysed (to see if it I be viable after a "find" or "do")? The second issue is how regular expressions should be treated. The simplest approach would be simply to uncommit before them (on the grounds that they I match). Better efficiency would be obtained by analyzing all preceding literal tokens to determine whether the pattern would match them. Overall, the issues are: can such automated "pruning" approach a hand-tuned version sufficiently closely to warrant the extra set-up expense, and (more importantly) is the problem important enough to even warrant the non-trivial effort of building an automated solution? =back =head1 SUPPORT =head2 Source Code Repository L =head2 Mailing List Visit L to sign up for the mailing list. L is also a good place to ask questions. Previous posts about Parse::RecDescent can typically be found with this search: L. =head2 FAQ Visit L for answers to frequently (and not so frequently) asked questions about Parse::RecDescent. =head2 View/Report Bugs To view the current bug list or report a new issue visit L. =head1 SEE ALSO L provides Parse::RecDescent style parsing using native Perl 5.10 regular expressions. =head1 LICENCE AND COPYRIGHT Copyright (c) 1997-2007, Damian Conway C<< >>. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See L. =head1 DISCLAIMER OF WARRANTY BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Parse-RecDescent-1.967015/README0000644000175000017500000000417113070730631015634 0ustar jtbraunjtbraunParse::RecDescent version 1.967015 NAME Parse::RecDescent - generate recursive-descent parsers DESCRIPTION RecDescent incrementally generates top-down recursive-descent text parsers from simple yacc-like grammar specifications. It provides: * Regular expressions or literal strings as terminals (tokens), * Multiple (non-contiguous) productions for any rule, * Repeated, optional and alternate subrules within productions, * Late-bound (run-time dispatched) subrules * Full access to Perl within actions specified as part of the grammar, * Simple automated error reporting during parser generation and parsing, * The ability to commit to, uncommit to, or reject particular productions during a parse, * Incremental extension of the parsing grammar (even during a parse), * Precompilation of parser objects, * User-definable reduce-reduce conflict resolution via "scoring" of matching productions. LIMITATIONS * There's no support for parsing directly from an input stream. * The generator doesn't handle left-recursion. PREREQUISITES Parse::RecDescent requires the Text::Balanced module, which is available from the CPAN. INSTALLATION It's all pure Perl, so just put the .pm files in their appropriate local Perl subdirectories. CHANGES AND FUTURE DEVELOPMENT This readme refers to versions 1.XXXXXX. For details of changes, refer to the file Changes. Version 2.00 will provide a cleaner interface and better parsing performance. AUTHOR Damian Conway (damian@conway.org) Jeremy T. Braun (JTBRAUN@CPAN.org) [current maintainer] INSTALLATION To install this module, run the following commands: perl Makefile.PL make make test make install Alternatively, to install with Module::Build, you can use the following commands: perl Makefile.PL make make test make install COPYRIGHT AND LICENCE Copyright (C) 1997-2007, Damian Conway This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Parse-RecDescent-1.967015/Build.PL0000755000175000017500000000173613070730500016252 0ustar jtbraunjtbraunuse strict; use warnings; use Module::Build; my $builder = Module::Build->new( module_name => 'Parse::RecDescent', license => 'perl', dist_author => [ 'Damian Conway ', 'Jeremy T. Braun ', ], dist_version_from => 'lib/Parse/RecDescent.pm', requires => { 'Text::Balanced' => 1.95, 'Test::More' => 0, }, add_to_cleanup => [ 'Parse-RecDescent-*' ], meta_merge => { "meta-spec" => { version => 2 }, resources => { repository => { type => 'git', url => 'git://github.com/jtbraun/Parse-RecDescent', web => 'https://github.com/jtbraun/Parse-RecDescent', }, bugtracker => { web => 'https://rt.cpan.org/Dist/Display.html?Status=Active&Queue=Parse-RecDescent', }, }, }, ); $builder->create_build_script(); Parse-RecDescent-1.967015/META.json0000664000175000017500000000244313070731552016402 0ustar jtbraunjtbraun{ "abstract" : "Generate Recursive-Descent Parsers", "author" : [ "Damian Conway ", "Jeremy T. Braun " ], "dynamic_config" : 1, "generated_by" : "ExtUtils::MakeMaker version 7.24, CPAN::Meta::Converter version 2.143240", "license" : [ "perl_5" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : "2" }, "name" : "Parse-RecDescent", "no_index" : { "directory" : [ "t", "inc" ] }, "prereqs" : { "build" : { "requires" : { "ExtUtils::MakeMaker" : "6.5702" } }, "configure" : { "requires" : { "ExtUtils::MakeMaker" : "6.5702" } }, "runtime" : { "requires" : { "Test::More" : "0", "Text::Balanced" : "1.95" } } }, "release_status" : "stable", "resources" : { "bugtracker" : { "web" : "https://rt.cpan.org/Dist/Display.html?Status=Active&Queue=Parse-RecDescent" }, "repository" : { "type" : "git", "url" : "git://github.com/jtbraun/Parse-RecDescent", "web" : "https://github.com/jtbraun/Parse-RecDescent" } }, "version" : "1.967015" } Parse-RecDescent-1.967015/t/0000755000175000017500000000000013070731552015217 5ustar jtbraunjtbraunParse-RecDescent-1.967015/t/00.load.t0000755000175000017500000000020711710167512016541 0ustar jtbraunjtbraunuse Test::More tests => 1; BEGIN { use_ok( 'Parse::RecDescent' ); } diag( "Testing Parse::RecDescent $Parse::RecDescent::VERSION" ); Parse-RecDescent-1.967015/t/reproducible.t0000644000175000017500000002605712602025457020075 0ustar jtbraunjtbraun# Ensure that the output of Parse::RecDescent is the same, time after # time. This prevents automated builds with precompiled parsers from # registering changes, with no change input. use strict; use warnings; use Parse::RecDescent; use Test::More tests => 41; # Turn off the "build a -standalone parser" precompile warning our $RD_HINT = 0; # mask "subroutine element redefined" warnings local $^W; my $grammar = <<'EOGRAMMAR'; translation_unit: external_declaration(s) | external_declaration: function_definition | declaration | { if ($::opt_SKIPPEDLINES || (defined $::opt_VERBOSE and $::opt_VERBOSE >= 1 )) { print "Skipping line $thisline\n" # Try next line if possible... } } function_definition: declaration_specifiers(?) declarator declaration_list(?) compound_statement { if($::opt_FUNCTIONS) { $::functions_output .= ::flatten_list($item[1]); $::functions_output .= ::flatten_list($item[2]); $::functions_output .= ::flatten_list($item[3]) . ";\n"; } } declaration: declaration_specifiers init_declarator_list(?) ';' { if($::opt_DECLARATIONS) { $::declarations_output .= ::flatten_list($item[1]); $::declarations_output .= ::flatten_list($item[2]); $::declarations_output .= ::flatten_list($item[3]) . "\n"; } } declaration_list: declaration(s) declaration_specifiers: type_qualifier declaration_specifiers(?) | storage_class_specifier declaration_specifiers(?) | type_specifier declaration_specifiers(?) storage_class_specifier: 'auto' | 'register' | 'static' | 'extern' | 'typedef' type_specifier: 'int' | 'double' | 'void' | 'char' | 'long' | 'float' | 'signed' | 'unsigned' | 'short' | struct_or_union_specifier | enum_specifier | typedef_name ...typedef_name_lookahead { [$item[1] ] } typedef_name_lookahead: declarator # | pointer # | ',' ...parameter_type_list # | ')' type_qualifier: 'const' | 'volatile' struct_or_union_specifier: struct_or_union IDENTIFIER(?) '{' struct_declaration_list(?) '}' { if($::opt_STRUCTS){ $::structs_output .= ::flatten_list($item[1]) . " "; $::structs_output .= ::flatten_list($item[2]); $::structs_output .= ::flatten_list($item[3]) . "\n"; $::structs_output .= ::flatten_list_beautified($item[4]); $::structs_output .= ::flatten_list($item[5]) . ";\n\n"; } } | struct_or_union IDENTIFIER struct_or_union: 'struct' | 'union' struct_declaration_list: struct_declaration(s) init_declarator_list: init_declarator(s /(,)/) init_declarator: declarator '=' initializer | declarator struct_declaration: specifier_qualifier_list struct_declarator_list ';' specifier_qualifier_list: type_specifier specifier_qualifier_list(?) | type_qualifier specifier_qualifier_list(?) struct_declarator_list: struct_declarator(s /(,)/) struct_declarator: declarator(?) ':' constant_expression | declarator enum_specifier: 'enum' IDENTIFIER(?) '{' enumerator_list '}' { if($::opt_STRUCTS){ $::structs_output .= ::flatten_list($item[1]) . " "; $::structs_output .= ::flatten_list($item[2]); $::structs_output .= ::flatten_list($item[3]) . "\n"; $::structs_output .= ::flatten_list_beautified($item[4]); $::structs_output .= ::flatten_list($item[5]) . ";\n\n"; } } | 'enum' IDENTIFIER enumerator_list: enumerator(s /(,)/) enumerator: IDENTIFIER ('=' constant_expression)(?) declarator: pointer(?) direct_declarator function_signature: '[' constant_expression(?) ']' | '(' parameter_type_list ')' | '(' identifier_list(?) ')' direct_declarator: IDENTIFIER function_signature(s?) | '(' declarator ')' function_signature(s?) pointer: '*' type_qualifier_list(?) pointer(?) type_qualifier_list: type_qualifier(s) parameter_type_list: parameter_list (',' '...')(?) parameter_list: parameter_declaration(s /(,)/) parameter_declaration: declaration_specifiers declarator | declaration_specifiers abstract_declarator(?) identifier_list: IDENTIFIER(s /(,)/) initializer: assignment_expression | '{' initializer_list (',')(?) '}' initializer_list: initializer(s /(,)/) type_name: specifier_qualifier_list abstract_declarator(?) abstract_declarator: pointer(?) direct_abstract_declarator | pointer abstract_type: '[' constant_expression(?) ']' | '(' parameter_type_list(?) ')' direct_abstract_declarator: '(' abstract_declarator ')' abstract_type(s?) | abstract_type(s) typedef_name: IDENTIFIER statement: selection_statement | expression_statement | iteration_statement | compound_statement | jump_statement | labeled_statement labeled_statement: 'case' constant_expression ':' statement | IDENTIFIER ':' statement | 'default' ':' statement expression_statement: expression(?) ';' compound_statement: '{' declaration_list(?) statement_list(?) '}' statement_list: statement(s) selection_statement: 'if' '(' expression ')' statement ('else' statement)(?) | 'switch' '(' expression ')' statement iteration_statement: 'for' '(' expression(?) ';' expression(?) ';' expression(?) ')' statement | 'while' '(' expression ')' statement | 'do' statement 'while' '(' expression ')' jump_statement: 'return' expression(?) ';' | 'break' ';' | 'continue' ';' | 'goto' IDENTIFIER ';' expression: assignment_expression(s /(,)/) assignment_expression: unary_expression ASSIGNMENT_OPERATOR assignment_expression | conditional_expression conditional_expression: logical_OR_expression ('?' expression ':' conditional_expression)(?) constant_expression: conditional_expression logical_OR_expression: logical_AND_expression(s /(\|\|)/) logical_AND_expression: inclusive_OR_expression(s /(&&)/) inclusive_OR_expression: exclusive_OR_expression(s /(\|)/) exclusive_OR_expression: AND_expression(s /(\^)/) AND_expression: equality_expression(s /(&)/) equality_expression: relational_expression(s /(==|!=)/) relational_expression: shift_expression(s /(<=|>=|<|>)/) shift_expression: additive_expression(s /(<<|>>)/) additive_expression: multiplicative_expression(s /(\+|-)/) multiplicative_expression: cast_expression(s /(\*|\/|%)/) cast_expression: unary_expression | '(' type_name ')' cast_expression unary_expression: postfix_expression | '++' unary_expression | '--' unary_expression | 'sizeof' '(' type_name ')' | UNARY_OPERATOR cast_expression | 'sizeof' unary_expression postfix_expression: primary_expression postfix_expression_token(s?) postfix_expression_token: '[' expression ']' | '(' argument_expression_list(?)')' | '.' IDENTIFIER | '->' IDENTIFIER | '++' | '--' primary_expression: IDENTIFIER | constant | STRING | '(' expression ')' argument_expression_list: assignment_expression(s /(,)/) constant: CHARACTER_CONSTANT | FLOATING_CONSTANT | INTEGER_CONSTANT | ENUMERATION_CONSTANT ### TERMINALS INTEGER_CONSTANT: /(?:0[xX][\da-fA-F]+) # Hexadecimal |(?:0[0-7]*) # Octal or Zero |(?:[1-9]\d*) # Decimal [uUlL]? # Suffix /x CHARACTER_CONSTANT: /'([^\\'"] # None of these |\\['\\ntvbrfa'"] # or a backslash followed by one of those |\\[0-7]{1,3}|\\x\d+)' # or an octal or hex constant /x FLOATING_CONSTANT: /(?:\d+|(?=\.\d+)) # No leading digits only if '.moreDigits' follows (?:\.|(?=[eE])) # There may be no floating point only if an exponent is present \d* # Zero or more floating digits ([eE][+-]?\d+)? # expontent [lLfF]? # Suffix /x ENUMERATION_CONSTANT: INTEGER_CONSTANT STRING: /"(([^\\'"]) # None of these |(\\[\\ntvbrfa'"]) # or a backslash followed by one of those |(\\[0-7]{1,3})|(\\x\d+))*"/x # or an octal or hex IDENTIFIER: /(?!(auto|break|case|char|const|continue|default|do|double|else|enum|extern|float|for|goto # LOOKAHEAD FOR KEYWORDS |if|int|long|register|return|signed|sizeof|short|static|struct|switch|typedef # NONE OF THE KEYWORDS |union|unsigned|void|volatile|while)[^a-zA-Z_]) # SHOULD FULLY MATCH! (([a-zA-Z]\w*)|(_\w+))/x # Check for valid identifier ASSIGNMENT_OPERATOR: '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|=' UNARY_OPERATOR: '&' | '*' | '+' | '-' | '~' | '!' EOGRAMMAR # Create the reference output my $class = "TestParser"; sub CompileParser { my $pm_filename = $class . '.pm'; eval { Parse::RecDescent->Precompile({-standalone => 1,}, $grammar, $class); }; ok(!$@, qq{created a precompiled parser: } . $@); ok(-e $pm_filename, "found the precompiled parser file"); my $fh; ok((open $fh, '<', $pm_filename), "opened the precompiled parser"); my $parser_text; local $/; $parser_text = <$fh>; close $fh; ok((defined($parser_text) and length($parser_text)), "parser contains data"); unlink $pm_filename; ok(!-e $pm_filename, "deleted precompiled parser"); return $parser_text; } my $reference_parser = CompileParser(); for (0..5) { my $new_parser = CompileParser($_); ok($new_parser eq $reference_parser, "parsers match"); } Parse-RecDescent-1.967015/t/reentry.t0000755000175000017500000000263712575137332017114 0ustar jtbraunjtbraunuse warnings; use strict; $^W++; # for some reason use warnings doesn't cut it use Test::More; eval "use Test::Warn"; plan skip_all => "Test::Warn required for testing reentry" if $@; use Parse::RecDescent; my $g1 = <<'EOG'; { use warnings; use strict; my @seq; } genome : base(s) { $return = \@seq } base : A | C | G | T A : /a/ { push @seq, $item[0] } C : /c/ { push @seq, $item[0] } G : /g/ { push @seq, $item[0] } T : /t/ { push @seq, $item[0] } EOG my $g2 = <<'EOG'; { use warnings; use strict; my @seq; } genome : ( A | C | G | T )(s) { $return = \@seq } A : /a/ { push @seq, $item[0] } C : /c/ { push @seq, $item[0] } G : /g/ { push @seq, $item[0] } T : /t/ { push @seq, $item[0] } EOG my @sequences = (qw/aatgcttgc cctggattcg ctggaagtc ctgXc/); plan tests => @sequences * 4; for my $to_sequence (@sequences) { my ($p1, $p2); warnings_are (sub { $p1 = Parse::RecDescent->new ($g1); }, [], 'no warnings emitted during grammar1 parsing'); warnings_are (sub { $p2 = Parse::RecDescent->new ($g2); }, [], 'no warnings emitted during grammar2 parsing'); warnings_are (sub { is_deeply ( $p1->genome ($to_sequence), $p2->genome ($to_sequence), 'grammars produce same result' ); }, [], 'no warnings emitted during grammar execution'); } Parse-RecDescent-1.967015/t/precompile.t0000755000175000017500000000317013067760522017554 0ustar jtbraunjtbraunuse strict; use warnings; use Parse::RecDescent; use Test::More tests => 14; use lib '.'; # Turn off the "build a -standalone parser" precompile warning our $RD_HINT = 0; # mask "subroutine element redefined" warnings local $^W; my $grammar = <<'EOGRAMMAR'; TOP: (s?) ';' { $item[1] } element: 'punctuation' { $thisparser->Extend('element: /!/'); $return = $item[1]; } | /\w+/ EOGRAMMAR for my $standalone (0..1) { my $standalone_str = $standalone ? 'standalone' : 'dependent'; my $class = "TestParser$standalone_str"; my $pm_filename = $class . '.pm'; if (-e $pm_filename) { unlink $pm_filename; } ok(!-e $pm_filename, "no preexisting precompiled parser"); eval { Parse::RecDescent->Precompile({-standalone => $standalone,}, $grammar, $class); }; ok(!$@, qq{created a $standalone_str precompiled parser: } . $@); ok(-f $pm_filename, "found $standalone_str precompiled parser"); eval "use $class;"; ok(!$@, qq{use'd a $standalone_str precompiled parser: }.$@); unlink $pm_filename; ok(!-e $pm_filename, "deleted $standalone_str precompiled parser"); my $result = eval qq{ my \$text = "one, two, three, four, punctuation, !, five, six, seven ;"; use $class; my \$parser = $class->new(); \$parser->TOP(\$text); }; ok(!$@, qq{ran a $standalone_str precompiled parser}); is_deeply($result, [qw(one two three four punctuation ! five six seven)], "correct result from precompiled parser"); } Parse-RecDescent-1.967015/t/skip.t0000755000175000017500000000234712602025457016363 0ustar jtbraunjtbraunuse strict; use warnings; use Parse::RecDescent; use Test::More tests => 8; my $grammar = <<'END_OF_GRAMMAR'; foo: item(s) eotext { $return = $item[1] } foo_with_skip: item(s) eotext { $return = $item[1] } item: name value { [ @item[1,2] ] } name: 'whatever' | 'another' value: /\S+/ eotext: /\s*\z/ END_OF_GRAMMAR my $text = <<'END_OF_TEXT'; whatever value # some spaces, newlines and a comment too! another value END_OF_TEXT # Test setting the initial skip via the global directive RunTests(q{ }); # Test setting the initial skip via $Parse::RecDescent::skip global local $Parse::RecDescent::skip = qr/(?mxs: \s+ |\# .*?$)*/; RunTests(); sub RunTests { my $prefix = shift || ''; my $parser = Parse::RecDescent->new($prefix . $grammar); ok($parser, 'got a parser'); my $inskip = $parser->foo_with_skip($text); ok($inskip, 'foo_with_skip()'); { my $outskip = $parser->foo($text); ok($outskip, 'foo() with regex $P::RD::skip'); } { my $outskip = $parser->foo($text); ok($outskip, 'foo() with string $P::RD::skip'); } } Parse-RecDescent-1.967015/t/separated_repetition.t0000755000175000017500000000074211710167512021622 0ustar jtbraunjtbraunuse strict; use warnings; use Test::More 'no_plan'; use Parse::RecDescent; my $parser = Parse::RecDescent->new( q{ sep: some(?) '(' repeated(s? /,/) ')' 'elements' { $return = $item[3]; } repeated: 'repeated' some: 'some' }); ok($parser, 'Created parser'); my $str = 'some (repeated, repeated, repeated, repeated) elements'; my $result = $parser->sep($str); my $expected = ['repeated', 'repeated', 'repeated', 'repeated']; is_deeply($result, $expected); Parse-RecDescent-1.967015/t/pod.t0000755000175000017500000000020111710167512016160 0ustar jtbraunjtbraunuse Test::More; eval "use Test::Pod 1.14"; plan skip_all => "Test::Pod 1.14 required for testing POD" if $@; all_pod_files_ok(); Parse-RecDescent-1.967015/t/leftop_cap.t0000755000175000017500000000360413067760522017533 0ustar jtbraunjtbraunuse Parse::RecDescent; my $grammar = q { nolcap : lcap : norcap : rcap : nolcappos: start end { # force @itempos to be included &::make_itempos_text(\@item, \@itempos); } lcappos: start end { &::make_itempos_text(\@item, \@itempos); } norcappos: start end { &::make_itempos_text(\@item, \@itempos); } rcappos: start end { &::make_itempos_text(\@item, \@itempos); } start: /start/i end: /end/i id : /[a-zA-Z][a-zA-Z_0-9\.]*/ }; my $parser = new Parse::RecDescent($grammar) or die "Bad Grammar"; use Test::More tests=>8; require './t/util.pl'; my $text = "a + b - c + d"; is_deeply $parser->nolcap($text), [qw] => 'Noncapturing leftop'; is_deeply $parser->lcap($text), [qw] => 'Capturing leftop'; is_deeply $parser->norcap($text), [qw] => 'Noncapturing rightop'; is_deeply $parser->rcap($text), [qw] => 'Capturing rightop'; my $postext = "START a + bb - ccccccccc + d END"; my $message = ''; my $expected = ' START offset.from= 0 offset.to= 4 line.from= 1 line.to= 1 column.from= 1 column.to= 5 _REF_ offset.from= 6 offset.to= 30 line.from= 1 line.to= 4 column.from= 7 column.to= 3 END offset.from= 32 offset.to= 34 line.from= 5 line.to= 5 column.from= 1 column.to= 3 '; is $parser->nolcappos($postext), $expected => "Position capturing leftop$message"; is $parser->lcappos($postext), $expected => "Position noncapturing leftop$message"; is $parser->norcappos($postext), $expected => "Position capturing rightop$message"; is $parser->rcappos($postext), $expected => "Position noncapturing rightop$message"; Parse-RecDescent-1.967015/t/text.t0000755000175000017500000000235411711403033016365 0ustar jtbraunjtbraunuse strict; use warnings; use Parse::RecDescent; use Test::More tests => 2; my $grammar = <<'END_OF_GRAMMAR'; lex: token(s) token: 'reject' { print "REJECT";} | identifier | include include: /#\s*include\s+/ identifier { $text = $::includes->{$item[2]} . $text; $return = "INCLUDED_$item[2]"; } identifier: /[a-z_]\w*/i END_OF_GRAMMAR our $includes = { inc_0 => "\nSome included\n tokens\n\n", inc_1 => " And some without newlines", inc_2 => "more includes here", inc_3 => 'post reject', }; my $text = <<'END_OF_TEXT'; some tokens #include inc_0 other tokens #include inc_1 #include inc_2 yet more tokens #include inc_3 reject another value END_OF_TEXT my $parser = Parse::RecDescent->new($grammar); ok($parser, 'got a parser'); my $parse = $parser->lex($text); is_deeply $parser->lex($text), [ qw(some tokens INCLUDED_inc_0 Some included tokens other tokens INCLUDED_inc_1 And some without newlines INCLUDED_inc_2 more includes here yet more tokens INCLUDED_inc_3 post reject reject another value )] => 'text modification ok'; Parse-RecDescent-1.967015/t/util.pl0000644000175000017500000000111612602135436016527 0ustar jtbraunjtbraunsub make_itempos_text { my ($item, $itempos) = @_; join("\n", '', ( map { my $i = $_; join(' ', sprintf("%-10s",ref $item->[$i] ? '_REF_' : $item->[$i]), map { my $type = $_; map { sprintf("%s.%s=%3d", $type, $_, $itempos->[$i]{$type}{$_}) } qw(from to) } qw(offset line column)); } (1..$#$item), ), ''); } 1; Parse-RecDescent-1.967015/t/01.basics.t0000755000175000017500000001700212602135436017071 0ustar jtbraunjtbraun# Before `make install' is performed this script should be runnable with # `make test'. After `make install' it should work as `perl test.pl' ######################### We start with some black magic to print on failure. # Change 1..1 below to 1..last_test_to_print . # (It may become useful if the test is moved to ./t subdirectory.) BEGIN { $| = 1; print "1..32\n"; } END {print "not ok 1\n" unless $loaded;} use Parse::RecDescent; $loaded = 1; print "ok 1\n"; sub debug { $D || $D || 0 } my $count = 2; sub ok($;$) { my $ok = ((@_==2) ? ($_[0] eq $_[1]) : $_[0]); print "\texp=[$_[1]]\n\tres=[$_[0]]\n" if debug && @_==2; print "not " unless $ok; print "ok $count\n"; $count++; return $ok; } ######################### End of black magic. do { $RD_TRACE = 1; $RD_HINT = 1; } if debug > 1; $data1 = '(the 1st teeeeeest are easy easy easyeasy'; $expect1 = '[1st|teeeeeest|are|easy:easy:easy:easy]'; $data2 = '(the 2nd test is'; $expect2 = '[2nd|test|is|]'; $data3 = 'the cat'; $expect3a = 'fluffy'; $expect3b = 'not fluffy'; $data4 = 'a dog'; $expect4 = 'rover'; $data5 = 'type a is int; type b is a; var x holds b; type c is d;'; $expect5 = 'typedef=>a, typedef=>b, defn=>x, baddef, baddef'; require './t/util.pl'; ################################################################## $parser_A = new Parse::RecDescent q { test1: "(" 'the' "$::first" /te+st/ is ('easy')(s?) { "[$item[3]|$item[4]|$item[5]|" . join(':', @{$item[6]}) . ']' } is: 'is' | 'are' #================================================================# test2: [$arg{sound}] the: 'the' a: 'a' cat: 'cat' { "fluffy" } | { "not fluffy" } dog: 'dog' { "rover" } #================================================================# test3: (defn | typedef | fail)(5..10) { join ', ', @{$item[1]}; } typedef: 'type' id 'is' typename ';' { $return = "$item[0]=>$item[2]"; $thisparser->Extend("typename: '$item[2]'"); } fail: { 'baddef' } defn: 'var' id 'holds' typename ';' { "$item[0]=>$item[2]" } id: /[a-z] # LEADING ALPHABETIC \w* # FOLLOWED BY ALPHAS, DIGITS, OR UNDERSCORES /ix # CASE INSENSITIVE typename: 'int' #================================================================# test4: 'a' b /c/ { "$itempos[1]{offset}{from}:$itempos[2]{offset}{from}:$itempos[3]{offset}{from}" } b: "b" #================================================================# test5: ...!name notname | name notname: /[a-z]\w*/i { 'notname' } name: 'fred' { 'name' } #================================================================# test6: test6: 'a' 'b' 'c' { 'prod 1' } | 'a' { 'prod 2' } | { 'prod 3' } #================================================================# test7: 'x' /y+/ { $return = $item[3] } #================================================================# test8: 'a' b /c+/ 'dddd' e 'f' { &::make_itempos_text(\@item, \@itempos); } e: /ee/ #================================================================# test9: 'a' d(s) /c/ { &::make_itempos_text(\@item, \@itempos); } d: 'd' 'd' 'd' }; ok ($parser_A) or exit; ################################################################## $first = "1st"; $res = $parser_A->test1($data1); ok($res,$expect1); ################################################################## $first = "2nd"; $res = $parser_A->test1($data2); ok($res,$expect2); ################################################################## $res = $parser_A->test2($data3,undef, article=>'the', animal=>'cat', sound=>'meows'); ok($res,$expect3a); ################################################################## $res = $parser_A->test2($data3,undef, article=>'the', animal=>'cat', sound=>'purrs'); ok ($res,$expect3b); ################################################################## $res = $parser_A->test2($data4,undef, article=>'a', animal=>'dog', sound=>'barks'); ok($res,$expect4); ################################################################## $res = $parser_A->test3($data5); ok($res,$expect5); ################################################################## $res = $parser_A->test4("a b c"); ok($res, "0:3:7"); ################################################################## $res = $parser_A->test5("fred"); ok($res, "name"); $res = $parser_A->test5("fled"); ok($res, "notname"); ################################################################## $res = $parser_A->test6("a b d"); ok($res, "prod 2"); $res = $parser_A->test6("a c d"); ok($res, "prod 3"); $res = $parser_A->test6("a b c"); ok($res, "prod 1"); $res = $parser_A->test6("a b c d"); ok($res, "prod 2"); ################################################################## $res = $parser_A->test7("x yyy \n y"); ok($res, "y"); ################################################################## $res = $parser_A->test8("a\n b\n cccccccccc\ndddd ee\n f"); ok($res,' a offset.from= 0 offset.to= 0 line.from= 1 line.to= 1 column.from= 1 column.to= 1 b offset.from= 3 offset.to= 3 line.from= 2 line.to= 2 column.from= 2 column.to= 2 cccccccccc offset.from= 7 offset.to= 16 line.from= 3 line.to= 3 column.from= 3 column.to= 12 dddd offset.from= 18 offset.to= 21 line.from= 4 line.to= 4 column.from= 1 column.to= 4 ee offset.from= 26 offset.to= 27 line.from= 4 line.to= 4 column.from= 9 column.to= 10 f offset.from= 32 offset.to= 32 line.from= 5 line.to= 5 column.from= 4 column.to= 4 '); ################################################################## $res = $parser_A->test9("a\n d d \n d d d d \n d d d\nc\n"); ok($res,' a offset.from= 0 offset.to= 0 line.from= 1 line.to= 1 column.from= 1 column.to= 1 _REF_ offset.from= 3 offset.to= 23 line.from= 2 line.to= 4 column.from= 2 column.to= 6 c offset.from= 25 offset.to= 25 line.from= 5 line.to= 5 column.from= 1 column.to= 1 '); ################################################################## package Derived; @ISA = qw { Parse::RecDescent }; sub method($$) { reverse $_[1] } package main; $parser_B = new Derived q { test1: /[a-z]+/i { reverse $item[1] } { $thisparser->method($item[2]) } }; ok ($parser_B) or exit; ################################################################## $res = $parser_B->test1("literal string"); ok($res, "literal"); ################################################################# $res = $parser_A->Extend("extended : 'some extension'"); ok(@{"$parser_A->{namespace}::ISA"} == 1); ################################################################# package main; # Ensure that regex modifiers (like /x below) get interpreted $parser = new Parse::RecDescent q { test : /\. # a literal period (Test)? /x }; ok($parser) or exit; ok($parser->test(".")); ok($parser->test(".Test")); ok($parser->test(".Test")); ################################################################# $parser = new Parse::RecDescent q { whatever : /\\\\/ | /whatever/ }; ok ($parser) or exit; ok($parser->whatever(" \\ ")); ok($parser->whatever(" whatever ")); ################################################################# # Check that changing some Data::Dumper variables don't break the # parsers foreach my $terse (0..1) { local $Data::Dumper::Terse = $terse; $parser = new Parse::RecDescent q{ startrule : string string : "hello" }; ok ($parser) or exit; ok($parser->startrule("hello")); } Parse-RecDescent-1.967015/t/autotree.t0000644000175000017500000000152112575137332017240 0ustar jtbraunjtbraunuse Parse::RecDescent; use Test::More tests=>9; foreach my $base ('', ':BaseModule', ':BaseModule::', ) { my $grammar = "" . q { file : command(s) command : get | set | vet get : 'get' ident ';' set : 'set' ident 'to' value ';' vet : 'check' ident 'is' value ';' ident : /\w+/ value : /\d+/ }; my $parser = new Parse::RecDescent($grammar) or die "Bad Grammar"; pass('created autotree grammar'); my $text = q{ set a to 3; get b; check a is 3; set c to 4; check b is 0; }; my $tree = $parser->file($text); ok $tree, 'parsed input text'; use Data::Dumper; my $package = $base; $package =~ s/^:*//; $package =~ s/:*$//; $package .= '::' if length $package; ok("${package}file" eq ref $tree, qq{got "$base" as "$package"}); } Parse-RecDescent-1.967015/t/re_capture_return.t0000755000175000017500000000107011710167512021133 0ustar jtbraunjtbraunuse strict; use warnings; use Test::More 'no_plan'; use Parse::RecDescent; my $parser = Parse::RecDescent->new(<<'EOG'); { my %ret; } CONFIG : KV_PAIR(s) { return \%ret } KV_PAIR : WORD /\s*=\s*/ MAYBE_QUOTED_WORD { $ret{$item[1]} = $item[3] } MAYBE_QUOTED_WORD: WORD | /'([^']+)'/ { $return = $1 } | /"([^"]+)"/ { $return = $1 } WORD : /\w+/ EOG ok($parser, 'Created parser'); my $str = q|a=1 b="2" c ="33" d= '12'|; my $result = $parser->CONFIG($str); is_deeply($result, { a => 1, b => 2, c => 33, d => 12 } ); Parse-RecDescent-1.967015/t/skip_dynamic.t0000755000175000017500000000165311713770033020065 0ustar jtbraunjtbraun#! /usr/bin/perl -w use strict; use warnings; use Parse::RecDescent; use Test::More tests => 3; my $grammar = <<'END_OF_GRAMMAR'; foo_with_dynamic_skip: item(s) eotext { $return = $item[1] } item: name value { [ @item[1,2] ] } name: 'whatever' | 'another' value: /\S+/ eotext: /\s*\z/ END_OF_GRAMMAR my $text = <<'END_OF_TEXT'; whatever value # some spaces, newlines and a comment too! another value END_OF_TEXT my $parser = Parse::RecDescent->new($grammar); ok($parser, 'got a parser'); { local $::skip_pattern = qr/XXXXX/; my $outskip = $parser->foo_with_dynamic_skip($text); ok(!defined $outskip, 'foo()'); } { no warnings 'once'; local $::skip_pattern = qr/(?mxs: \s+ |\# .*?$)*/; my $outskip = $parser->foo_with_dynamic_skip($text); ok($outskip, 'foo() with string $::skip'); } Parse-RecDescent-1.967015/META.yml0000664000175000017500000000135013070731552016226 0ustar jtbraunjtbraun--- abstract: 'Generate Recursive-Descent Parsers' author: - 'Damian Conway ' - 'Jeremy T. Braun ' build_requires: ExtUtils::MakeMaker: '6.5702' configure_requires: ExtUtils::MakeMaker: '6.5702' dynamic_config: 1 generated_by: 'ExtUtils::MakeMaker version 7.24, CPAN::Meta::Converter version 2.143240' license: perl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: '1.4' name: Parse-RecDescent no_index: directory: - t - inc requires: Test::More: '0' Text::Balanced: '1.95' resources: bugtracker: https://rt.cpan.org/Dist/Display.html?Status=Active&Queue=Parse-RecDescent repository: git://github.com/jtbraun/Parse-RecDescent version: '1.967015' Parse-RecDescent-1.967015/ToDo0000644000175000017500000002607712602135436015557 0ustar jtbraunjtbraun-----cut----------cut----------cut----------cut----------cut----------cut----- Completion support Ted Zlatanov to recdescent: Maybe P::RD could have better built-in completion support, similar to the way works: something you can access when a rule fails that tells you how far it got and what was successfully parsed. Basically a $was_expecting variable, set automatically. -----cut----------cut----------cut----------cut----------cut----------cut----- Clean up parser object's unique namespace (i.e. delete all methods) when object is destroyed -----cut----------cut----------cut----------cut----------cut----------cut----- Generalize repetitions to allow parse-time expressions in counts -----------cut-----------cut-----------cut-----------cut-----------cut---------- provide some mechanism to easily specify required vs optional repeated alternatives and mutually exclusive alernatives -----------cut-----------cut-----------cut-----------cut-----------cut---------- Detect (and flag as an error) the use of return() is wrong in an action (e.g. "Did you mean '$return =' instead?" -----------cut-----------cut-----------cut-----------cut-----------cut---------- Fix line counter. -----------cut-----------cut-----------cut-----------cut-----------cut---------- Fix truncation of traces -----------cut-----------cut-----------cut-----------cut-----------cut---------- Allow rules to access the raw string that has been consumed by the rule. rule: sub1 sub2 sub3 { $return = $RD_CONSUMED } -----------cut-----------cut-----------cut-----------cut-----------cut---------- Allow > to specified what to skip in terms of a rule rather than just a raw pattern -----------cut-----------cut-----------cut-----------cut-----------cut---------- Allow Parse::RecDescent to take a grammar and construct a "reconstructor" or "dumper" for autotree'd data structures. For example: $parse = Parse::RecDescent->new($autotree_grammar); $unparse = Parse::RecDescent->antiparser($autotree_grammar); $tree = $parse->startrule($text); $tree->munge_somehow(); $text = $unparse->startrule($tree); (but think of a better method name than "antiparser"!!! ;-) ================================================================================ Require {...} around code blocks within directives in V2 ================================================================================ *Remove* (i.e. don't just disable) tracing code if not ================================================================================ Provide a ->Grammar() method to reconstruct the grammar string from a (possibly extended) parser object. E.g.: say $parser->Grammar() prints out entire grammar from which $parser was built. Also provide a '-depth' option: say $parser->Grammar(-depth => $N) that prints out rules to a given depth in the grammar (also allow: -depth => -1 means all but pure terminals) ================================================================================ directive to prevent "re-opening" of rules by subsequent definitions. I.e. if present all rules must have only one definition ================================================================================ Update meta-grammar to capture current features ================================================================================ * in "was expecting" error message, add alternatives for zero or more options. For example: command option(s?) end prints: ...was expecting "end". should print ...was expecting "option" or "end". ================================================================================ * Consider and $RD_AUTOERROR to automatically add an production to "pre-terminal" rules. May need to be smarter than just "every non-terminal" (analyse grammar tree and only add error messages to penultimate nodes? or is this just a bad idea?) ================================================================================ * Implement common prefix extraction (CPE) for cases like: ifstat: 'if' cond 'then' stat 'else' stat 'endif' | 'if' cond 'then' stat 'endif' This would avoid rematching items 1 to 4 of the 2nd prod, by just reusing @item[1..4] from the previous production Document use of simple to subvert CPE. E.g.: ifstat: 'if' cond 'then' stat 'else' stat 'endif' | | 'if' cond 'then' stat 'endif' doesn't use CPE. ================================================================================ * Add backtracking to rules containing repetition specifiers (in order to overcome repetitions "incorrigible" greed)? ================================================================================ * directive to insert s after common prefixes ================================================================================ * Handle left-recursion (preferably without rearranging grammar) No idea how to do this though ;=) ================================================================================ Add a pass-by-name iterface to the constructor (pass in a hash-ref). I expect the constructor will take at least the following flags: -error Same as $::RD_ERROR (and defaults to it) -warn Same as $::RD_WARN (and defaults to it) -hint Same as $::RD_HINT (and defaults to it) -trace Same as $::RD_TRACE (and defaults to it) -autostub Same as $::RD_AUTOSTUB (and defaults to it) -autoaction Same as $::RD_AUTOACTION (and defaults to it) -source Specifies source of grammar (may be a file name or subroutine reference) -module Specifies name of separate parser module to create (rather than generating an "on-the-fly" parser). -nodefaults Don't use defaults Likewise, let grammar rules called as methods on the pasrer object also take named args: -text Text to parse (defaults to first non-flag arg) -line Line at which text starts (defaults to second non-flag arg) -file File from which text taken -args Arguments to appear as @arg/%arg in rule (defaults to non-flag args 2..N) Document this as the standard (and preferred) interface, with the current positional interface relegated to "backwards compatibility". -----------cut-----------cut-----------cut-----------cut-----------cut---------- and directives to immediately cause the entire parse to fail (with error messages, if any are pending). (or maybe ) -----------cut-----------cut-----------cut-----------cut-----------cut---------- Generate #line directives to tie errors back into original file. -----------cut-----------cut-----------cut-----------cut-----------cut---------- Pluggable debugger? Maybe all you need is for rules to call: # at the start of a rule... $thisparser->{tracer}->enter($rulename, $posInText) if $thisparser->{tracer}; # at a successful match of a rule... $thisparser->{tracer}->match($rulename, $posInText, $production_num) if $thisparser->{tracer}; # at the end of a rule... $thisparser->{tracer}->exit($rulename, $posInText) if $thisparser->{tracer}; # ETC. So then, by giving the parser a different tracer object, you can instantly provide a new tracing mechanism. -----------cut-----------cut-----------cut-----------cut-----------cut---------- Grammar namespaces: > So ... I'm looking for a way which I can break the parser into several > logical pieces ... and only compile what changes each time - using make. > Any ideas on the best way to do this ? Currently there is no good way to do this. Others have asked for this too, so I will definitely provide some solution in the next release. That solution will probably consist of allowing parts of a grammar to be specified in different namespaces. For example: startrule: rule(s) rules: R1::rule | R2::rule | R3::rule rule: prefix middle suffix rule: middle suffix rule: prefix middle # unnamed namespace is common prefix: /blah/ blah 'blah' middle: /BLAH/ BLAH 'BLAH' suffix: /blah/ blah 'blah' Apart from the obvious benefit of allowing names to be reused, this would also solve your precompilation problem, since the various namespaces would be compiled as separate modules. The Precompile method would then take an extra optional argument, specifying which namespace/module to precompile. Then your makefile might look like: VHDL: general.pm R1.pm R2.pm R3.pm general.pm: general.rd perl -MParse::RecDescent - general.rd VHDL general R1.pm: R1.rd perl -MParse::RecDescent - R1.rd VHDL R1 R2.pm: R2.rd perl -MParse::RecDescent - R2.rd VHDL R2 etc. Does this sound like a good solution to your problem? Damian -----------cut-----------cut-----------cut-----------cut-----------cut---------- dircetive to add /i to all terminals (including literals) -----------cut-----------cut-----------cut-----------cut-----------cut---------- Allow a second kind of unconditional autoaction... Feedback: > (3) Wish to have: I realize that AUTOACTION is executed for a matching > rule only when no action is defined for that rule. But I ran into > situations where a universal kind of action would be useful (mostly for > debugging purposes but where setting the TRACE flag seems overkill). By > 'universal' I mean an action that is executed for ALL matching rules > regardless of whether individual action exists. This 'universal' action > can either precede or follow the execution of individual action. -----------cut-----------cut-----------cut-----------cut-----------cut---------- Allow multiple lhs's for a rule. That is, instead of: keyword: /\w+/ ident: /\w+/ name: /\w+/ allow: (keyword ident name): /\w+/ (Thanks Colin) -----cut----------cut----------cut----------cut----------cut----------cut----- Hi Damian, I stumbled over a possible bug in your module: It doesn't seem to like raw "<<" or "<<=" inside actions. They work fine inside quoted strings though. Here is the stripped down version of the grammar which triggers the problem: ------------------------------------------------------- use strict; use warnings; use Parse::RecDescent; $::RD_HINT = 1; $::RD_TRACE = 1; my $parser = Parse::RecDescent->new(<<'END_OF_GRAMMAR' rule1: rule2 { # BEGIN TEST CODE my $dummy = 1 << 2; # END TEST CODE } rule2: /^\w+/ END_OF_GRAMMAR ) or die "Can't compile grammar\n"; ------------------------------------------------------- This is caused by a bug in Text::Balanced: https://rt.cpan.org/Ticket/Display.html?id=74714 -----cut----------cut----------cut----------cut----------cut----------cut----- Support named options to Precompile and other public methods -----cut----------cut----------cut----------cut----------cut----------cut----- Separate out "runtime" stuff to avoid the "#ifndef RUNTIME" hack inside of Precompile.