devtodo-master/0000755000000000000000000000000013722700563012516 5ustar rootrootdevtodo-master/NEWS0000644000000000000000000000174513722700563013224 0ustar rootrootI'm looking for new features to add! If you think of any send them to me at alecthomas@mail.com 02/07/01 One week until I head off for a journey across Europe. This will mean that no bugfixes or other enhancements will be made to devtodo for approximately three or four months. 13/04/01 Many, many new features and fixes in 0.1.4. I'm also simultaneously working on the next major revision which will allow databases to be linked together, allow for much more useful filtering and reporting, etc. Basically, a major overhaul. Minor disaster where I tried --no-sync and it promptly removed my entire database for todo! I rebuilt it, sort of, from TODO. Ironic really. 17/03/01 I've started compiling todo on the sourceforge compile farm, so it's now been successfully compiled on Debian, Slackware, Solaris 2.8 and FreeBSD. I'd also like to give a heartfelt thanks to the guys at Sourceforge and VA Linux for contributing such a valuable resource to the community. devtodo-master/QuickStart0000644000000000000000000000117413722700563014536 0ustar rootroot1. Installation ./configure --sysconfdir=/etc make make install 2. Optionally copy the contents of doc/scripts.sh or doc/scripts.tcsh (depending on your shell) to your startup scripts (either global or local). More information is available in the respective scripts. 3. Optionally modify /etc/todorc to meet your own configuration needs, or create your own ~/.todorc 4. Usage Displaying items: todo Adding an item: tda Editing an existing item: tde Marking an item as completed: tdd Removing an item: tdr 5. Problems? man devtodo devtodo-master/changelog.html0000644000000000000000000004331613722700563015342 0ustar rootroot

    Version 0.1.17

  •  Fixed seg-fault when specifying bold colours with the --colour option. Thanks
      to Tim Peoples for pointing pointing this out.
  •  Renamed regex.{c,h} to c_regex.{c,h} so they don't conflict with Regex.{cc,h}
      under operating systems which do not honour case (OS/X and/or Cygwin). Also
      removed -s (strip) from LDFLAGS, as this is not supported on some O/S' (eg.
      OS/X). David Bacher wrote in with these issues.
  •  Modified tod2html.xslt so the todo title is displayed in the page, as well as
      in the page title.
  •  Fixed an issue with not displaying links when timeout was in effect.
  •  Added a new 'default' priority which has the following semantics: when adding
      a new item it will be priority 'medium', when grafting to an existing item 
      the new item will be given the parents priority, and when editing an existing
      item its priority will be preserved. An items priority can be overridden on
      the command line with --priority <priority> when any of these actions are 
      performed. Devtodo will never prompt for priority with this enabled. Handy for
      putting in your ~/.todorc.
  •  Fixed display issues when summarising multi-line items.
  •  Fixed a bug where colours were reset when displaying linked databases.
      

    Version 0.1.16

  •  Made broken links silently fail rather than failing loudly and horribly.
  •  Modified the semantics for expanding indices. Now when you use an index with
      no modifier (eg. todo 10, as opposed to todo -10 or todo +10), only one
      level of children is expanded.
  •  Big change is the addition of multi-line items and title texts! This is a 
      much requested feature. To use this, you can either pipe text into devtodo:
       cat <<- EOF | tda -p medium
    This is some
    multi-line 
    text.
    EOF
      Or press <CTRL-N> to insert a new line when at the devtodo input prompt.

    Version 0.1.15

  •  Added --force-colour option to force use of colour even when not outputting to
      a terminal. Requested by James Troup.
  •  Added a patch sent in by Philipp Buehler which adds support for title strings
      to the todo2html.xslt XSLT script.
  •  Fixed bug when displaying priorities in TODO list ("mediumriority").
  •  Added a MASSIVE patch sent in by Christian Hammond which adds support for 
      linking other todo databases into one single view. This is extremely useful
      for situations where you have a core project directory with sub-directories
      containing unique individual databases. You can link them so they are all
      viewed from the core database. Great work!
  •  tda will now merge all non-quoted arguments into one string to be used for 
      the body text of the item to add. This lets this work:
        tda -p high Need to go to the shop and get some milk
      without needing to do:
        tda -p high "Need to go to the shop and get some milk"
  •  Another bug report by James Troup via the Debian Project. Thanks James and
      Arthur for being long time supporters of the project!
  •  Made doc/scripts.sh more coherent and less convoluted.
  •  A variation on the XSLT transform was sent in by Christian Hammond. An 
      example of its use is available at http://www.chipx86.com/todo.ml.

    Version 0.1.14

  •  Fixed a bug where devtodo doesn't work without the TERM variable being set.
      Caused havoc in Bill Carlson's cron job. Thanks for tracking it down Bill.
  •  Fixed some more GCC 3.x compilation problems. I'm still using [io]strstream,
      as opposed to the more correct [io]stringstream, so that it should still
      compile with older versions of GCC. Fingers crossed.

    Version 0.1.13

  •  Patch from David Furodet to fix compilation problems on Solaris.
  •  Added 'finishing comments'. This allows you to optionally add a comment to an
      item when you are marking it as done, giving reasons/comments for why you are
      marking it complete. This is really useful. This also breaks backward
      compatibility with previous BINARY formats only. The XML format is not
      affected.

    Version 0.1.12

  •  Applied a patch by Michael Tilstra that fixes segfaults when overriding 
      colours.
  •  Added a script (contrib/tdrec) to display databases from all sub-directories.
      Thanks to Brian Herlihy for the contribution. Modified it slightly so it 
      passes arguments to devtodo (such as --summary).
  •  Modified cd, pushd and popd replacement scripts (doc/scripts.sh) so that their 
      exit status is preserved. Required so things like this work correctly:
        cd doc && echo foo
      Thanks to Erin Quinlan for the fix.
  •  Now displays the index of newly added items when --verbose is on. Thanks to 
      James Troup for suggesting this.
  •  Fixed man page inconsistency with '-S' and '-s' for summary mode. Thanks to 
      James Troup again for picking this up.
  •  Another problem found by James Troup - when grafting, the validity of the 
      grafting index was only checked after the new item text was typed. This has
      been recitifed.
  •  Work around for some weirdness when generating RPM's.

    Version 0.1.11

  •  Modified .spec file so it doesn't do an aclocal/autoheader/automake/autoconf 
      before compiling. Fixes incompatibilities between versions of automake.
  •  Applied a patch sent in by Anreas Amann to fix more incompatibilities with 
      GCC 3.0.

    Version 0.1.10

  •  Now 'using namespace std;'. It's the standard and it seems as if GCC 3.0 
      finally requires it, so in it goes.
  •  Fixed an incompatibility with versions of GCC prior to 3.0 using different
      arguments to std::string::compare. Quite annoying. I replaced it with 
      strncmp.
  •  Readline is driving me insane. The example from the readline info page says
      to use "completion_matches" so I basically copied the code verbatim. But
      some versions don't seem to include the function in the readline header 
      file? Agggggghh. So once again, it's back to using a manually created
      header file. It seems to work the best.

    Version 0.1.9

  •  I had a bad feeling that including the readline headers would cause problems
      and I was right :(. I'm now not including any readline headers of any form,
      so if compilation breaks due to your system readline headers having K&R style
      function declarations, upgrade to the latest version.
  •  Renamed the XSLT examples to reflect their purpose (eg. xslt-example.1 => 
      todo2pdf.xslt)

    Version 0.1.8

  •  Fixed for the spelling of "heirarchical" :)
  •  Repatched --mono fix that gets rid of spurious escape sequences (picked up
      by Mark Eichin).
  •  Fixed envar expansion broken by event handling modifications. Whoops.
  •  Numeric priorities can now be specified on the command line. Christian
      Kurz picked this one up.
  •  Added a new filter for searching through the database. This is done with 
      the filter '/'. eg. "todo --filter /CVS" will show all items with the word
      CVS in them. This can also be expressed as a shorthand version: "todo /CVS". 
      The text is interpreted as a regular expression.
  •  Changed filter behaviour to be more logical. Numeric filters with no sign 
      prefix will now only show the item itself, not children. Prefixing the 
      filter with a + will display the children as well and prefixing with a -
      will filter out that item.
  •  Subsequently almost totally rewrote filtering code so it's more logical and
      doesn't break (hopefully).
  •  Short arguments can now be fully merged into one argument. eg.
        todo -v -v -f 2-10
      can be represented as:
        todo -vvf2-10
  •  Added an "echo" command for use in ~/.todorc. This can be used for status
      messages or whatever. eg.
         on save echo Saved $TODODB
  •  Added another PERL script (contrib/changelog2html) to convert directly from 
      the ChangeLog to HTML.
  •  Fixed default formatting string for --TODO.
  •  James Troup had the suggestion (which is now implemented) of clearing the 
      priority as soon as the user hits a key other than enter when editing an 
      existing item. 
  •  Copied the readline.h and history.h from my system's readline into devtodo's
      source. Hopefully this won't break compiles on any systems :\.

devtodo-master/stamp-h.in0000644000000000000000000000001213722700563014410 0ustar rootroottimestamp devtodo-master/makepackages.sh.in0000644000000000000000000000462413722700563016101 0ustar rootroot#!/bin/sh # # Generate Slackware and RPM packages from devtodo source # prefix=@prefix@ sysconfdir=`eval echo @sysconfdir@` PACKAGE=@PACKAGE@ VERSION=@VERSION@ ORIGDIR=`pwd` cat << EOF ================================================================================ Generating distribution tar file ================================================================================ EOF make dist cat << EOF ================================================================================ Generating RPMs ================================================================================ EOF cp ${PACKAGE}-${VERSION}.tar.gz /usr/src/rpm/SOURCES rpmbuild -ba devtodo.spec mv /usr/src/rpm/RPMS/i386/${PACKAGE}-${VERSION}-1.i386.rpm . mv /usr/src/rpm/SRPMS/${PACKAGE}-${VERSION}-1.src.rpm . cat << EOF ================================================================================ Generating Slackpack ================================================================================ EOF PKGDIR=/var/tmp/${PACKAGE}-${VERSION}.slackpack cd /usr/src/rpm/BUILD/${PACKAGE}-${VERSION} test -d ${PKGDIR} && mkdir ${PKGDIR} make DESTDIR=${PKGDIR} install # copy documentation mkdir -p ${PKGDIR}/$prefix/doc/${PACKAGE}-${VERSION} cp doc/todorc.example doc/scripts.* README QuickStart INSTALL NEWS ChangeLog AUTHORS COPYING contrib/todo2html*.xslt contrib/tdrec ${PKGDIR}/$prefix/doc/${PACKAGE}-${VERSION} cd ${PKGDIR} mkdir -p install mv .$sysconfdir/todorc .$sysconfdir/todorc.latest cat << EOF > install/doinst.sh echo echo if test -f ${sysconfdir}/todorc; then echo "Existing devtodo RC file not overwritten (new one is installed as " echo " ${sysconfdir}/todorc.latest)." else echo "Installed example config file as ${sysconfdir}/todorc" mv ${sysconfdir}/todorc.latest ${sysconfdir}/todorc fi echo echo EOF chmod +x install/doinst.sh /sbin/makepkg devtodo-${VERSION}-i386-1.tgz mv devtodo-${VERSION}-i386-1.tgz ${ORIGDIR} rm -rf ${PKGDIR} cd ${ORIGDIR} chown athomas.athomas ${PACKAGE}-${VERSION}-1.i386.rpm ${PACKAGE}-${VERSION}-1.src.rpm ${PACKAGE}-${VERSION}.tar.gz ${PACKAGE}-${VERSION}-i386-1.tgz cat << EOF ================================================================================ Package generation completed! ================================================================================ EOF devtodo-master/AUTHORS0000644000000000000000000000240313722700563013565 0ustar rootroot The main author of devtodo is: Alec Thomas With contributions by: Arthur Korn Debian package maintainer and, basically, bug tracking GOD. You're a legend Arthur. Christian Hammond Database linking patch. Philipp Buehler Added support to the XSLT transform. David Furodet Solaris build patches. Michael Tilstra Patch which fixed a segfault when overriding colours. Brian Herlihy Wrote the tdrec script, which recursively displays all todo lists. Erin Quinlan Patches for the cd/pushd/popd scripts. Anreas Amann GCC 3.x patches. Mark Eichen Tracking down the really painful regex slowdown problem. Sent in the original/first XSLT transform. Honourable mention for the LARGE number of bug fixes Mark sends in :) Daniel Peterson Sent in an XSLT transform for generating HTML. Philippe Chiasson Patch to fix compile problems on RedHat. Matt Kraai Found a segfault issue with readline, and various man page fixes. Ron Bailey Found a bug which resulted in editing of sub-items causing a seg-fault. Matthew Russel Contributed the doc/scripts.tcsh script so that TCSH uses get the same love as BASH users. Stephean Hegel Fixed numerous bugs early in the life of devtodo (0.1.1 period) �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/acconfig.h���������������������������������������������������������������������������0000644�0000000�0000000�00000000170�13722700563�014436� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* System configuration directory */ #undef SYSCONFDIR /* Use termcap to get terminal width */ #undef USETERMCAP @TOP@ ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/.cvsignore���������������������������������������������������������������������������0000644�0000000�0000000�00000000000�13722700563�014504� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/ChangeLog����������������������������������������������������������������������������0000644�0000000�0000000�00000060000�13722700563�014264� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������0.1.20 - A few bugfixes. - Added XML declaration parsing/saving. This is a stop-gap at best, but will hopefully be useful to some. - Removed reliance on builtin regex library. Hopefully the completely broken version of glibc that instigated the inclusion of it is now out of circulation. - Fixed a whole bunch of compiler warnings on more recent GCC versions. - --not-done now marks sub-tasks as not done. Thanks to an anonymous user for picking this up. 0.1.19 - Updated my E-Mail address after about a year of having the wrong one :) - Removed informational message at top of .todo files as it was rather useless and actually annoying to some people placing their .todo files under version control systems. - Added a XSLT -> XHTML+CSS transform from Francesco Poli. - Added a bash completion script from the Gentoo projects maintainer Aaron Walker. - Fixed seg fault visible on 64-bit systems but present on all. Thanks to the Debian project for notifying me and providing a fix. 0.1.18 - Added --purge [<days-old>] option which lets you purge old completed items. Thanks to Jakub Turski for wanting this feature. - Can now use force-colour in the todorc. - Various patches from Debian and Gentoo maintainers applied. These fix compiler errors on recent GCC versions, uncaught exception when TERM is not set, a few man page issues, issues with the BASH scripts and miscellaneous other things. 0.1.17 - Fixed seg-fault when specifying bold colours with the --colour option. Thanks to Tim Peoples for pointing pointing this out. - Renamed regex.{c,h} to c_regex.{c,h} so they don't conflict with Regex.{cc,h} under operating systems which do not honour case (OS/X and/or Cygwin). Also removed -s (strip) from LDFLAGS, as this is not supported on some O/S' (eg. OS/X). David Bacher wrote in with these issues. - Modified tod2html.xslt so the todo title is displayed in the page, as well as in the page title. - Fixed an issue with not displaying links when timeout was in effect. - Added a new 'default' priority which has the following semantics: when adding a new item it will be priority 'medium', when grafting to an existing item the new item will be given the parents priority, and when editing an existing item its priority will be preserved. An items priority can be overridden on the command line with --priority <priority> when any of these actions are performed. Devtodo will never prompt for priority with this enabled. Handy for putting in your ~/.todorc. - Fixed display issues when summarising multi-line items. - Fixed a bug where colours were reset when displaying linked databases. 0.1.16 - Made broken links silently fail rather than failing loudly and horribly. - Modified the semantics for expanding indices. Now when you use an index with no modifier (eg. todo 10, as opposed to todo -10 or todo +10), only one level of children is expanded. - Big change is the addition of multi-line items and title texts! This is a much requested feature. To use this, you can either pipe text into devtodo: cat <<- EOF | tda -p medium This is some multi-line text. EOF Or press <CTRL-N> to insert a new line when at the devtodo input prompt. 0.1.15 - Added --force-colour option to force use of colour even when not outputting to a terminal. Requested by James Troup. - Added a patch sent in by Philipp Buehler which adds support for title strings to the todo2html.xslt XSLT script. - Fixed bug when displaying priorities in TODO list ("mediumriority"). - Added a MASSIVE patch sent in by Christian Hammond which adds support for linking other todo databases into one single view. This is extremely useful for situations where you have a core project directory with sub-directories containing unique individual databases. You can link them so they are all viewed from the core database. Great work! - tda will now merge all non-quoted arguments into one string to be used for the body text of the item to add. This lets this work: tda -p high Need to go to the shop and get some milk without needing to do: tda -p high "Need to go to the shop and get some milk" - Another bug report by James Troup via the Debian Project. Thanks James and Arthur for being long time supporters of the project! - Made doc/scripts.sh more coherent and less convoluted. - A variation on the XSLT transform was sent in by Christian Hammond. An example of its use is available at http://www.chipx86.com/todo.ml. 0.1.14 - Fixed a bug where devtodo doesn't work without the TERM variable being set. Caused havoc in Bill Carlson's cron job. Thanks for tracking it down Bill. - Fixed some more GCC 3.x compilation problems. I'm still using [io]strstream, as opposed to the more correct [io]stringstream, so that it should still compile with older versions of GCC. Fingers crossed. 0.1.13 - Patch from David Furodet to fix compilation problems on Solaris. - Added 'finishing comments'. This allows you to optionally add a comment to an item when you are marking it as done, giving reasons/comments for why you are marking it complete. This is really useful. This also breaks backward compatibility with previous BINARY formats only. The XML format is not affected. 0.1.12 - Applied a patch by Michael Tilstra that fixes segfaults when overriding colours. - Added a script (contrib/tdrec) to display databases from all sub-directories. Thanks to Brian Herlihy for the contribution. Modified it slightly so it passes arguments to devtodo (such as --summary). - Modified cd, pushd and popd replacement scripts (doc/scripts.sh) so that their exit status is preserved. Required so things like this work correctly: cd doc && echo foo Thanks to Erin Quinlan for the fix. - Now displays the index of newly added items when --verbose is on. Thanks to James Troup for suggesting this. - Fixed man page inconsistency with '-S' and '-s' for summary mode. Thanks to James Troup again for picking this up. - Another problem found by James Troup - when grafting, the validity of the grafting index was only checked after the new item text was typed. This has been recitifed. - Work around for some weirdness when generating RPM's. 0.1.11 - Modified .spec file so it doesn't do an aclocal/autoheader/automake/autoconf before compiling. Fixes incompatibilities between versions of automake. - Applied a patch sent in by Anreas Amann to fix more incompatibilities with GCC 3.0. 0.1.10 - Now 'using namespace std;'. It's the standard and it seems as if GCC 3.0 finally requires it, so in it goes. - Fixed an incompatibility with versions of GCC prior to 3.0 using different arguments to std::string::compare. Quite annoying. I replaced it with strncmp. - Readline is driving me insane. The example from the readline info page says to use "completion_matches" so I basically copied the code verbatim. But some versions don't seem to include the function in the readline header file? Agggggghh. So once again, it's back to using a manually created header file. It seems to work the best. 0.1.9 - I had a bad feeling that including the readline headers would cause problems and I was right :(. I'm now not including any readline headers of any form, so if compilation breaks due to your system readline headers having K&R style function declarations, upgrade to the latest version. - Renamed the XSLT examples to reflect their purpose (eg. xslt-example.1 => todo2pdf.xslt) 0.1.8 - Fixed for the spelling of "heirarchical" :) - Repatched --mono fix that gets rid of spurious escape sequences (picked up by Mark Eichin). - Fixed envar expansion broken by event handling modifications. Whoops. - Numeric priorities can now be specified on the command line. Christian Kurz picked this one up. - Added a new filter for searching through the database. This is done with the filter '/'. eg. "todo --filter /CVS" will show all items with the word CVS in them. This can also be expressed as a shorthand version: "todo /CVS". The text is interpreted as a regular expression. - Changed filter behaviour to be more logical. Numeric filters with no sign prefix will now only show the item itself, not children. Prefixing the filter with a + will display the children as well and prefixing with a - will filter out that item. - Subsequently almost totally rewrote filtering code so it's more logical and doesn't break (hopefully). - Short arguments can now be fully merged into one argument. eg. todo -v -v -f 2-10 can be represented as: todo -vvf2-10 - Added an "echo" command for use in ~/.todorc. This can be used for status messages or whatever. eg. on save echo Saved $TODODB - Added another PERL script (contrib/changelog2html) to convert directly from the ChangeLog to HTML. - Fixed default formatting string for --TODO. - James Troup had the suggestion (which is now implemented) of clearing the priority as soon as the user hits a key other than enter when editing an existing item. - Copied the readline.h and history.h from my system's readline into devtodo's source. Hopefully this won't break compiles on any systems :\. 0.1.7 - Finally tracked down the VERY nasty (at times up to a minute or more) slowdown some users have been experiencing. It turns out that one of the recent versions of glibc has a bug in its regex code when dealing with non-multibyte characters (ie. most of the time). This came to a head because I upgraded to slackware-current, which has this version of glibc. Great. Thanks to Mark Eichen for pointing me towards several Debian bug tracker items about other programs having this same problem. - Added a new directory "contrib" which will be used for anything that users contribute that is not patched into the main distribution. - XSLT transform courtesy of Mark Eichin, to convert devtodo XML databases into colour PDF's. This is contrib/xslt-example.1. - XSLT contribution for converting devtodo XML databases into HTML, courtesy of Daniel Peterson. This is contrib/xslt-example.2. - I have created an amalgam of the above two XSLT contributions that will output a HTML page with colourised items. Completed items are struck out. This is a dodgy hack, so if anybody has any enhancements it would be much appreciated. - Changed filename of src/todo.cc to src/main.cc so that devtodo will compile under environments where case is not relevant in filenames (ie. Cygwin under M$ Windows). - Added a small PERL script to generate a todo database from a ChangeLog file that's in the same format as that used by devtodo. In the contrib directory. eg. changelog2todo > changelog.todo && devtodo --database changelog.todo - Added two new events: "load" and "save". This can be used in conjunction with one of the above XSLT files by putting something like the following in your ~/.todorc (assuming you have libxlst installed - www.xmlsoft.org): on save exec xsltproc $HOME/etc/todo-html.xslt $TODODB > `dirname $TODODB`/.todo.html Which will basically generate a .todo.html file every time a devtodo database is modified. - Fixed a few minor man page bugs. 0.1.6-1 - Changed --summary to a toggle so you can use "todo -s" to switch it on or off. This minimises the use of the shift key ;). - Uncommented two lines so that --priority works once more when editing items. - Added RPMage. 0.1.6 - When creating backups, I now make the backed up databases read-only if --paranoid is specified. - A slight modification to the BASH shell script to make it more compatible (courtesy of Arthur). - Added -S/--summary (and -s/--no-summary to override it if 'summary' is the default in ~/.todorc) which only displays the first line of todo items. This is handy if, like me, you have numerous multi-line items. The shell scripts use this by default now, meaning when you cd into a directory only the first line of each item will be shown (handy for a quick...summary). - Added --timeout [<seconds>] option. When no second are specified, todo will only display the database if it hasn't been displayed within a number of seconds (also specified by using --timeout, but *with* a number of seconds). For example, by putting this in your ~/.todorc: "timeout 10", then adding "--timeout" when you run devtodo, the database will only be displayed if it hasn't been displayed within 10 seconds. The shell scripts have been updated to use this facility. The access time is used to determine when the database was last used (stat.st_atime). - Unified formatting strings into one location and added the generic option --format <tag>=<format-string> to modify them as well as the option --use-format <tag>=<tag> to use them. eg. todo --use-format verbose-display=generated - Now attempts to obtain the width of the current terminal from the termcap database (if compile fails (please send me a bug report, and...) re-run ./configure with the --without-termcap option). - Added a '%s' output formatting flag which formats item text the same as is done with --summary. - You can use the (undocumented) --dump-config option to dump the current config to stdout. This should be usable as a ~/.todorc file pretty much as is. Handy to use as a base for your own modifications. - Fixed a minor bug where invalid numeric priority exceptions weren't being caught. - Added "title" as a defineable colour, seperate from the "info" colour. - Integrated some Debian Makefile mojo (thanks go to Arthur Korn). - Fixed --paranoid behaviour. The logic to check permissions had become commented out in the move to multiple loaders. - Added an "exec" command to the ~/.todorc. This can be used to execute shell commands from within devtodo, although it's really only useful in conjunction with triggers (see below). In addition, the environment variable TODODB will contain the name of the current database. - Added event "triggers". These are useful for modifying the behaviour of devtodo. A perfect example of a use for this is to trap the "create" trigger so that when a new database is created todo will remove world and group permissions from it. eg. on create { verbose exec chmod 600 $TODODB } - Modified Makefile.am in src and doc to support relocatable installs (via automakes DESTDIR variable). 0.1.5-1 - Fixed a nasty Makefile bug that can, under certain circumstances, cause the build/install to fail. 0.1.5 - Added a binary database format. The default is still XML, but you can change this using the new --database-loaders option. You can transparently convert your existing XML databases to binary format (or vice-versa) by simply changing the load order. For example, to change from XML to binary, put this in your ~/.todorc: database-loaders binary,xml The next time you modify an XML format database, it will be saved in binary format. The man page has more information. I recommend only using the binary format if you are actually having performance problems, as if something goes awry, manually fixing the XML database is *much* easier. But if you do use it, it might be an idea to use it in conjunction with --backup. - Added user-defineable formatting for both display output and TODO generated output. This is cool. Look for the section FORMATTING in the man page. - Added a new filter, which I've wanted for ages. It constists of a single '-', '+' or '='. A '-' stops display of all items except those explicitly specified in the rest of the filter whereas a '+' overrides all other filters to display all items. A '=' is the default behaviour. This is brilliant if you want to narrow the view down to just one item: todo --filter -,29 (*Note*: Slightly superceded by the modification to the semantics of numeric filters, which now display *only* the numbers specified if the = (or no) prefix is used - see two points down for more information). - Modified the numeric filters. Ranges can now be specified by doing this: todo --filter 1-20. If a '-' sign precedes the range it explicitly excludes all these items. This can also be used in most other places indices are used. ie. todo --done 10.1-20 would mark items 10.1 through 10.20 as done. - Modified behaviour of numeric filters slightly. If prefix is '=' or none, *only* those items are displayed. Before, this was a nop. - Patch to todorl.h courtesy of Philippe M. Chiasson that fixes compilation problems on RH 7.0. - Priority defaults to medium if a blank line is entered at the "priority>" prompt (thanks to Alexei Gilchrist for this idea, along with quite a few others :)) - Removed --fallback-database - the semantics were too clunky and generally confusing. - Added --global-database <filename> and -G,--global to replace --fallback-database. Basically, you specify a file with --global-database (defaults to ~/.todo_global) then whenever you pass -G or --global to todo it will use the global database. Much simpler than the way --fallback-database behaved. This idea was courtesy, once again, of Alexei Gilchrist. Good stuff! - todo can now automatically backup the database to a user specified number of levels. Use the option --database [<n>] to do this, where <n> is the optional number of revisions to keep (defaults to 1). This option is best specified in your ~/.todorc. - Numbers can once again be used to specify priorities when entering them from the 'priority>' prompt (requested by Alexei Gilchrist). 0.1.4 - Added version checking so that the binary won't accept databases from future versions. The actual behaviour is that minor revision differences produce a warning while major revision differences cause an error. - Added a patch from Arthur Korn that allows the bash scripts to cd into directories with spaces. - Fixed a few man page problems, again courtesy of Arthur (I swear this guy doesn't sleep!) - Changed primary binary to 'devtodo', with a convenience symlink, 'todo'. Also changed the man page filename to reflect this. The user should see no actual difference though, as symlinks with the old names exist. - Fixed a bug where todo would segfault if ^D was pressed while editing a line. Thanks to Matt Kraai for picking this up. The problem was due to not handling a NULL return value from readline. - More man page fixes (this time, thanks again go to Matt). - You can now specify more than one item index on the command line as seperate arguments. Previously, a comma was required and if multiple arguments were used the last one was used. Arthur picked this one up. - Added parsing of /etc/todorc (actually, the location is specified by the --sysconfdir argument to configure, so it will probably be /usr/local/etc/todorc on most peoples systems). - Added awareness of the TODORC environment variable. This specifies the RC file to parse on startup. TODORC=$HOME/.todorc is the default behaviour. This idea was thanks to Claude. Claude also suggests, quite rightly, that it would be useful for specifying a system-wide todorc file by putting TODORC=/etc/todorc in /etc/profile or somewhere similar. - Added two new arguments for modifying the database used. The first is --database <file> which is used to change the default filename used. eg. --database .todo is the default behaviour. The other is --fallback-database <file> which specifies the database to use if no other can be found. By default there is no fallback database. Both of these options can be specified in the .todorc. - Environment variables can now be used in the ~/.todorc. This is especially useful for something like 'fallback-database $HOME/.todo'. - Finally fixed the bug where > and & were not being correctly interpreted. - Fixed a long-time bug where wraptext() was wrapping the first line prematurely. - Fixed a bug where if the sort order changed, visible indices would not match parameter indices. - --verbose now displays time between when an item was created and when it was completed. - Added --date-format for formatting the display of dates (currently only used with --verbose). The format is that used by strftime(3) but if strftime is not available on a system, ctime(3) is used. - Added fully-featured sorting via the --sort parameter. It is now possible to sort on pretty much anything you can think of; creation time, completed time, duration of item, text body, priority and whether an item is done or not. - Added --paranoid option that enables some warnings about permissions. This is in response to a user request to not make the .todo file group/world accessible. This option will make devtodo warn the user if such a database is created. - Removed --sync and --no-sync. You can generate the TODO file with --TODO. 0.1.3 - Fixed a MAJOR bug introduced while fixing the non-correlating indices where all editing of sub-items caused a seg-fault! This was a bad one. Thanks to Ron Bailey for picking this one up. - Added auto-cd scripts for tcsh, courtesy of Matthew Russell. 0.1.2 - Regex needs sys/types.h to be included before regex.h on BSD - solution thanks to Ashley Penney - Fixed curses failing to link on Solaris due to the link phase not bringing in the termcap library. Thanks to Josh Wilmes for picking this up (subsequent autoconf script snippet shamelessly stolen from librep). - Fixed bug introduced by new colour code where colours were not being reset to the default terminal colour as they should have been. Once again, Stephan Hegel picked this one up...thanks again. - Related to the above bug, added a new 'colour' called 'default' which is the terminals default colour and removed the defineable colour item 'normal'. - Fixed a fairly major bug that was triggered when changing the priority of an existing item - it caused visible indices not to correlate with their actual index. - When grafting a child, the priority of the parent is used by default. - Added check for empty rx - FreeBSD doesn't support this. - Changed string parameter for Regex = to char const *. This fixes compile errors using gcc 2.9.2 udner FreeBSD. - Changed default 'low' colour to un-bolded cyan. - Removed '-r' as a short option - this functionality only exists as --remove now. 0.1.1 - Bug picked up by Christoph Jaeger relating to the use of a temporary string in TodoDB::find() has been fixed - there may be more, need to investigate further. - Added call to rl_initialize so that ~/.inputrc gets read correctly - patch from Ulrich Pfeifer (with slight modifications by me). - Validated options in ~/.todorc so that options like 'add' and 'reparent' can't be used. - Added section on colours to the man page. - Made configure.in determine whether to use curses or ncurses (a problem picked up by Stephan Hegel) and abort if readline won't link properly. - A few minor man page fixes (again, thanks to Stephan Hegel) - also moved the man into configure.in so the version will be automatically updated. - Extracted all readline exports into todorl.h - these are required because some versions of readline do NOT have C++ compatible headers, that is, the most of the functions use implicit parameters which C++ barfs on. - Added --enable-debug to configure phase which removes -s from LDFLAGS and sets CXXFLAGS to '-Wall -g'. - Added --mono to remove all ANSI escape sequences - useful for the colour impaired terminals (can also be put in ~/.todorc) - Cleaned up the TodoDB class a bit by moving the StreamColour stuff into the class body itself. 0.1.0 - Added a ~/.todorc that basically lets you prepend command line arguments to todo before it parses command line arguments. This is perfect for specifying default filters. My personal favourite is 'filter -children' to not display child nodes by default. An example is in the doc sub-directory. - Changed the behaviour of filters slightly in that numeric values in filters now represent item indices. Prefixed by a '-', this causes the specified item to not be displayed. Prefixed by a '+', the specified item will be displayed even if other filters inhibit it. - Added --colour facility so that users can override the default colours. - Added --reparent so that items can be moved around the tree. - If -v is specified, more information is printed out when editing, or adding items as well as when adding a title. 0.0.9 - Initial release. devtodo-master/devtodo.list.in����������������������������������������������������������������������0000644�0000000�0000000�00000002100�13722700563�015455� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������%product devtodo %copyright GPL %vendor Alec Thomas <alec@korn.ch> %packager Giuseppe "Cowo" Corbelli <cowo@lugbs.linux.it> %license COPYING %readme README %version @VERSION@ # Directories... $prefix=@prefix@ $exec_prefix=@exec_prefix@ $bindir=@bindir@ $mandir=@mandir@ $datadir=@datadir@ d 755 root sys ${bindir} - f 755 root sys ${bindir}/devtodo src/devtodo l 755 root sys ${bindir}/tda ${bindir}/devtodo l 755 root sys ${bindir}/tdd ${bindir}/devtodo l 755 root sys ${bindir}/tde ${bindir}/devtodo l 755 root sys ${bindir}/tdr ${bindir}/devtodo l 755 root sys ${bindir}/todo ${bindir}/devtodo d 755 root sys /etc - f 644 root sys /etc/todorc doc/todorc.example d 755 root sys ${mandir} - d 755 root sys ${mandir}/man1 - f 644 root sys ${mandir}/man1/devtodo.1 doc/devtodo.1 l 644 root sys ${mandir}/man1/tdr.1 ${mandir}/man1/devtodo.1 l 644 root sys ${mandir}/man1/todo.1 ${mandir}/man1/devtodo.1 l 644 root sys ${mandir}/man1/tda.1 ${mandir}/man1/devtodo.1 l 644 root sys ${mandir}/man1/tdd.1 ${mandir}/man1/devtodo.1 l 644 root sys ${mandir}/man1/tde.1 ${mandir}/man1/devtodo.1 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/��������������������������������������������������������������������������������0000755�0000000�0000000�00000000000�13722700563�013473� 5����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/Regex.cc������������������������������������������������������������������������0000644�0000000�0000000�00000004572�13722700563�015064� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#include "Regex.h" //map<string, Regex::Cache> Regex::cache; Regex::Regex() { memset(®ex, 0, sizeof(regex)); } Regex::Regex(char const *rx) { *this = rx; } Regex::Regex(Regex const ©) { *this = copy; } Regex::~Regex() { // Free RX if not in cache //if (cache.find(inrx) == cache.end()) if (inrx != "") regfree(®ex); /*else cache[inrx].instances--;*/ } Regex &Regex::operator = (Regex const ©) { return (*this = copy.inrx.c_str()); } Regex &Regex::operator = (char const *rx) { int error; if (!rx || !rx[0]) return *this; inrx = rx; /*map<string, Cache>::iterator hit; if ((hit = cache.find(rx)) != cache.end()) { regex = (*hit).second.rx; (*hit).second.hits++; (*hit).second.instances++; return *this; }*/ if ((error = regcomp(®ex, inrx.c_str(), REG_EXTENDED | REG_NEWLINE))) { char buffer[128]; regerror(error, ®ex, buffer, 128); throw runtime_error("couldn't compile rx: " + string(buffer)); } /*Cache &c = cache[rx]; c.rx = regex; c.hits++; c.instances = 1; // Erase least used entry if (cache.size() >= CRASH_REGEX_CACHE_THRESHOLD) { for (map<string, Cache>::iterator i = cache.begin(); i != cache.end(); i++) if ((*i).second.hits < (*hit).second.hits) hit = i; // Free RX if not in use if ((*hit).second.instances == 0) regfree(&(*hit).second.rx); cache.erase(hit); }*/ return *this; } int Regex::match(char const *str) { if (regexec(®ex, str, 50, matches, 0) == REG_NOMATCH) return -1; return matches[0].rm_eo - matches[0].rm_so; } int Regex::matchStart(char const *str) { if (match(str) == -1) return -1; if (matches[0].rm_so != 0) return -1; return matches[0].rm_eo; } string Regex::transform(string const &str, string const &mask) { if (match(str.c_str()) == -1) throw no_match("couldn't transform '" + str + "' with '" + inrx + "' as it does not match"); int count = 10; string token; for (int i = 0; i < 10; i++) if (matches[i].rm_so == -1) { count = i; break; } for (unsigned i = 0; i < mask.size(); i++) if (mask[i] == '\\' && strchr("0123456789", mask[i + 1])) { int index = mask[i + 1] - '0'; if (index >= count) throw out_of_range("Regex transform index for '" + inrx + "' out of range (" + mask.substr(i + 1, 1) + ")"); else token += string(str, matches[index].rm_so, matches[index].rm_eo - matches[index].rm_so); i++; } else token += mask[i]; return token; } ��������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/c_regex.c�����������������������������������������������������������������������0000644�0000000�0000000�00000746330�13722700563�015270� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* Extended regular expression matching and search library, version 0.12. (Implements POSIX draft P1003.2/D11.2, except for some of the internationalization features.) Copyright (C) 1993-1999, 2000, 2001 Free Software Foundation, Inc. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with the GNU C Library; see the file COPYING.LIB. If not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ /* AIX requires this to be the first thing in the file. */ #if defined _AIX && !defined REGEX_MALLOC #pragma alloca #endif #undef _GNU_SOURCE #define _GNU_SOURCE #ifdef HAVE_CONFIG_H # include <config.h> #endif #ifndef PARAMS # if defined __GNUC__ || (defined __STDC__ && __STDC__) # define PARAMS(args) args # else # define PARAMS(args) () # endif /* GCC. */ #endif /* Not PARAMS. */ #if defined STDC_HEADERS && !defined emacs # include <stddef.h> #else /* We need this for `regex.h', and perhaps for the Emacs include files. */ # include <sys/types.h> #endif #include <stdlib.h> #define WIDE_CHAR_SUPPORT (HAVE_WCTYPE_H && HAVE_WCHAR_H && HAVE_BTOWC) /* For platform which support the ISO C amendement 1 functionality we support user defined character classes. */ #if defined _LIBC || WIDE_CHAR_SUPPORT /* Solaris 2.5 has a bug: <wchar.h> must be included before <wctype.h>. */ # include <wchar.h> # include <wctype.h> #endif /* This is for multi byte string support. */ #ifdef MBS_SUPPORT # define CHAR_TYPE wchar_t # define US_CHAR_TYPE wchar_t/* unsigned character type */ # define COMPILED_BUFFER_VAR wc_buffer # define OFFSET_ADDRESS_SIZE 1 /* the size which STORE_NUMBER macro use */ # define CHAR_CLASS_SIZE ((__alignof__(wctype_t)+sizeof(wctype_t))/sizeof(CHAR_TYPE)+1) # define PUT_CHAR(c) \ do { \ if (MB_CUR_MAX == 1) \ putchar (c); \ else \ printf ("%C", (wint_t) c); /* Should we use wide stream?? */ \ } while (0) # define TRUE 1 # define FALSE 0 #else # define CHAR_TYPE char # define US_CHAR_TYPE unsigned char /* unsigned character type */ # define COMPILED_BUFFER_VAR bufp->buffer # define OFFSET_ADDRESS_SIZE 2 # define PUT_CHAR(c) putchar (c) #endif /* MBS_SUPPORT */ #ifdef _LIBC /* We have to keep the namespace clean. */ # define regfree(preg) __regfree (preg) # define regexec(pr, st, nm, pm, ef) __regexec (pr, st, nm, pm, ef) # define regcomp(preg, pattern, cflags) __regcomp (preg, pattern, cflags) # define regerror(errcode, preg, errbuf, errbuf_size) \ __regerror(errcode, preg, errbuf, errbuf_size) # define re_set_registers(bu, re, nu, st, en) \ __re_set_registers (bu, re, nu, st, en) # define re_match_2(bufp, string1, size1, string2, size2, pos, regs, stop) \ __re_match_2 (bufp, string1, size1, string2, size2, pos, regs, stop) # define re_match(bufp, string, size, pos, regs) \ __re_match (bufp, string, size, pos, regs) # define re_search(bufp, string, size, startpos, range, regs) \ __re_search (bufp, string, size, startpos, range, regs) # define re_compile_pattern(pattern, length, bufp) \ __re_compile_pattern (pattern, length, bufp) # define re_set_syntax(syntax) __re_set_syntax (syntax) # define re_search_2(bufp, st1, s1, st2, s2, startpos, range, regs, stop) \ __re_search_2 (bufp, st1, s1, st2, s2, startpos, range, regs, stop) # define re_compile_fastmap(bufp) __re_compile_fastmap (bufp) # define btowc __btowc /* We are also using some library internals. */ # include <locale/localeinfo.h> # include <locale/elem-hash.h> # include <langinfo.h> # include <locale/coll-lookup.h> #endif /* This is for other GNU distributions with internationalized messages. */ #if HAVE_LIBINTL_H || defined _LIBC # include <libintl.h> # ifdef _LIBC # undef gettext # define gettext(msgid) __dcgettext ("libc", msgid, LC_MESSAGES) # endif #else # define gettext(msgid) (msgid) #endif #ifndef gettext_noop /* This define is so xgettext can find the internationalizable strings. */ # define gettext_noop(String) String #endif /* The `emacs' switch turns on certain matching commands that make sense only in Emacs. */ #ifdef emacs # include "lisp.h" # include "buffer.h" # include "syntax.h" #else /* not emacs */ /* If we are not linking with Emacs proper, we can't use the relocating allocator even if config.h says that we can. */ # undef REL_ALLOC # include <stdlib.h> /* When used in Emacs's lib-src, we need to get bzero and bcopy somehow. If nothing else has been done, use the method below. */ # ifdef INHIBIT_STRING_HEADER # if !(defined HAVE_BZERO && defined HAVE_BCOPY) # if !defined bzero && !defined bcopy # undef INHIBIT_STRING_HEADER # endif # endif # endif /* This is the normal way of making sure we have a bcopy and a bzero. This is used in most programs--a few other programs avoid this by defining INHIBIT_STRING_HEADER. */ # ifndef INHIBIT_STRING_HEADER # if defined HAVE_STRING_H || defined STDC_HEADERS || defined _LIBC # include <string.h> # ifndef bzero # ifndef _LIBC # define bzero(s, n) (memset (s, '\0', n), (s)) # else # define bzero(s, n) __bzero (s, n) # endif # endif # else # include <strings.h> # ifndef memcmp # define memcmp(s1, s2, n) bcmp (s1, s2, n) # endif # ifndef memcpy # define memcpy(d, s, n) (bcopy (s, d, n), (d)) # endif # endif # endif /* Define the syntax stuff for \<, \>, etc. */ /* This must be nonzero for the wordchar and notwordchar pattern commands in re_match_2. */ # ifndef Sword # define Sword 1 # endif # ifdef SWITCH_ENUM_BUG # define SWITCH_ENUM_CAST(x) ((int)(x)) # else # define SWITCH_ENUM_CAST(x) (x) # endif #endif /* not emacs */ #if defined _LIBC || HAVE_LIMITS_H # include <limits.h> #endif #ifndef MB_LEN_MAX # define MB_LEN_MAX 1 #endif /* Get the interface, including the syntax bits. */ #include "c_regex.h" /* isalpha etc. are used for the character classes. */ #include <ctype.h> /* Jim Meyering writes: "... Some ctype macros are valid only for character codes that isascii says are ASCII (SGI's IRIX-4.0.5 is one such system --when using /bin/cc or gcc but without giving an ansi option). So, all ctype uses should be through macros like ISPRINT... If STDC_HEADERS is defined, then autoconf has verified that the ctype macros don't need to be guarded with references to isascii. ... Defining isascii to 1 should let any compiler worth its salt eliminate the && through constant folding." Solaris defines some of these symbols so we must undefine them first. */ #undef ISASCII #if defined STDC_HEADERS || (!defined isascii && !defined HAVE_ISASCII) # define ISASCII(c) 1 #else # define ISASCII(c) isascii(c) #endif #ifdef isblank # define ISBLANK(c) (ISASCII (c) && isblank (c)) #else # define ISBLANK(c) ((c) == ' ' || (c) == '\t') #endif #ifdef isgraph # define ISGRAPH(c) (ISASCII (c) && isgraph (c)) #else # define ISGRAPH(c) (ISASCII (c) && isprint (c) && !isspace (c)) #endif #undef ISPRINT #define ISPRINT(c) (ISASCII (c) && isprint (c)) #define ISDIGIT(c) (ISASCII (c) && isdigit (c)) #define ISALNUM(c) (ISASCII (c) && isalnum (c)) #define ISALPHA(c) (ISASCII (c) && isalpha (c)) #define ISCNTRL(c) (ISASCII (c) && iscntrl (c)) #define ISLOWER(c) (ISASCII (c) && islower (c)) #define ISPUNCT(c) (ISASCII (c) && ispunct (c)) #define ISSPACE(c) (ISASCII (c) && isspace (c)) #define ISUPPER(c) (ISASCII (c) && isupper (c)) #define ISXDIGIT(c) (ISASCII (c) && isxdigit (c)) #ifdef _tolower # define TOLOWER(c) _tolower(c) #else # define TOLOWER(c) tolower(c) #endif #ifndef NULL # define NULL (void *)0 #endif /* We remove any previous definition of `SIGN_EXTEND_CHAR', since ours (we hope) works properly with all combinations of machines, compilers, `char' and `unsigned char' argument types. (Per Bothner suggested the basic approach.) */ #undef SIGN_EXTEND_CHAR #if __STDC__ # define SIGN_EXTEND_CHAR(c) ((signed char) (c)) #else /* not __STDC__ */ /* As in Harbison and Steele. */ # define SIGN_EXTEND_CHAR(c) ((((unsigned char) (c)) ^ 128) - 128) #endif #ifndef emacs /* How many characters in the character set. */ # define CHAR_SET_SIZE 256 # ifdef SYNTAX_TABLE extern char *re_syntax_table; # else /* not SYNTAX_TABLE */ static char re_syntax_table[CHAR_SET_SIZE]; static void init_syntax_once PARAMS ((void)); static void init_syntax_once () { register int c; static int done = 0; if (done) return; bzero (re_syntax_table, sizeof re_syntax_table); for (c = 0; c < CHAR_SET_SIZE; ++c) if (ISALNUM (c)) re_syntax_table[c] = Sword; re_syntax_table['_'] = Sword; done = 1; } # endif /* not SYNTAX_TABLE */ # define SYNTAX(c) re_syntax_table[(unsigned char) (c)] #endif /* emacs */ /* Should we use malloc or alloca? If REGEX_MALLOC is not defined, we use `alloca' instead of `malloc'. This is because using malloc in re_search* or re_match* could cause memory leaks when C-g is used in Emacs; also, malloc is slower and causes storage fragmentation. On the other hand, malloc is more portable, and easier to debug. Because we sometimes use alloca, some routines have to be macros, not functions -- `alloca'-allocated space disappears at the end of the function it is called in. */ #ifdef REGEX_MALLOC # define REGEX_ALLOCATE malloc # define REGEX_REALLOCATE(source, osize, nsize) realloc (source, nsize) # define REGEX_FREE free #else /* not REGEX_MALLOC */ /* Emacs already defines alloca, sometimes. */ # ifndef alloca /* Make alloca work the best possible way. */ # ifdef __GNUC__ # define alloca __builtin_alloca # else /* not __GNUC__ */ # if HAVE_ALLOCA_H # include <alloca.h> # endif /* HAVE_ALLOCA_H */ # endif /* not __GNUC__ */ # endif /* not alloca */ # define REGEX_ALLOCATE alloca /* Assumes a `char *destination' variable. */ # define REGEX_REALLOCATE(source, osize, nsize) \ (destination = (char *) alloca (nsize), \ memcpy (destination, source, osize)) /* No need to do anything to free, after alloca. */ # define REGEX_FREE(arg) ((void)0) /* Do nothing! But inhibit gcc warning. */ #endif /* not REGEX_MALLOC */ /* Define how to allocate the failure stack. */ #if defined REL_ALLOC && defined REGEX_MALLOC # define REGEX_ALLOCATE_STACK(size) \ r_alloc (&failure_stack_ptr, (size)) # define REGEX_REALLOCATE_STACK(source, osize, nsize) \ r_re_alloc (&failure_stack_ptr, (nsize)) # define REGEX_FREE_STACK(ptr) \ r_alloc_free (&failure_stack_ptr) #else /* not using relocating allocator */ # ifdef REGEX_MALLOC # define REGEX_ALLOCATE_STACK malloc # define REGEX_REALLOCATE_STACK(source, osize, nsize) realloc (source, nsize) # define REGEX_FREE_STACK free # else /* not REGEX_MALLOC */ # define REGEX_ALLOCATE_STACK alloca # define REGEX_REALLOCATE_STACK(source, osize, nsize) \ REGEX_REALLOCATE (source, osize, nsize) /* No need to explicitly free anything. */ # define REGEX_FREE_STACK(arg) # endif /* not REGEX_MALLOC */ #endif /* not using relocating allocator */ /* True if `size1' is non-NULL and PTR is pointing anywhere inside `string1' or just past its end. This works if PTR is NULL, which is a good thing. */ #define FIRST_STRING_P(ptr) \ (size1 && string1 <= (ptr) && (ptr) <= string1 + size1) /* (Re)Allocate N items of type T using malloc, or fail. */ #define TALLOC(n, t) ((t *) malloc ((n) * sizeof (t))) #define RETALLOC(addr, n, t) ((addr) = (t *) realloc (addr, (n) * sizeof (t))) #define RETALLOC_IF(addr, n, t) \ if (addr) RETALLOC((addr), (n), t); else (addr) = TALLOC ((n), t) #define REGEX_TALLOC(n, t) ((t *) REGEX_ALLOCATE ((n) * sizeof (t))) #define BYTEWIDTH 8 /* In bits. */ #define STREQ(s1, s2) ((strcmp (s1, s2) == 0)) #undef MAX #undef MIN #define MAX(a, b) ((a) > (b) ? (a) : (b)) #define MIN(a, b) ((a) < (b) ? (a) : (b)) typedef char boolean; #define false 0 #define true 1 static int re_match_2_internal PARAMS ((struct re_pattern_buffer *bufp, const char *string1, int size1, const char *string2, int size2, int pos, struct re_registers *regs, int stop)); /* These are the command codes that appear in compiled regular expressions. Some opcodes are followed by argument bytes. A command code can specify any interpretation whatsoever for its arguments. Zero bytes may appear in the compiled regular expression. */ typedef enum { no_op = 0, /* Succeed right away--no more backtracking. */ succeed, /* Followed by one byte giving n, then by n literal bytes. */ exactn, #ifdef MBS_SUPPORT /* Same as exactn, but contains binary data. */ exactn_bin, #endif /* Matches any (more or less) character. */ anychar, /* Matches any one char belonging to specified set. First following byte is number of bitmap bytes. Then come bytes for a bitmap saying which chars are in. Bits in each byte are ordered low-bit-first. A character is in the set if its bit is 1. A character too large to have a bit in the map is automatically not in the set. */ /* ifdef MBS_SUPPORT, following element is length of character classes, length of collating symbols, length of equivalence classes, length of character ranges, and length of characters. Next, character class element, collating symbols elements, equivalence class elements, range elements, and character elements follow. See regex_compile function. */ charset, /* Same parameters as charset, but match any character that is not one of those specified. */ charset_not, /* Start remembering the text that is matched, for storing in a register. Followed by one byte with the register number, in the range 0 to one less than the pattern buffer's re_nsub field. Then followed by one byte with the number of groups inner to this one. (This last has to be part of the start_memory only because we need it in the on_failure_jump of re_match_2.) */ start_memory, /* Stop remembering the text that is matched and store it in a memory register. Followed by one byte with the register number, in the range 0 to one less than `re_nsub' in the pattern buffer, and one byte with the number of inner groups, just like `start_memory'. (We need the number of inner groups here because we don't have any easy way of finding the corresponding start_memory when we're at a stop_memory.) */ stop_memory, /* Match a duplicate of something remembered. Followed by one byte containing the register number. */ duplicate, /* Fail unless at beginning of line. */ begline, /* Fail unless at end of line. */ endline, /* Succeeds if at beginning of buffer (if emacs) or at beginning of string to be matched (if not). */ begbuf, /* Analogously, for end of buffer/string. */ endbuf, /* Followed by two byte relative address to which to jump. */ jump, /* Same as jump, but marks the end of an alternative. */ jump_past_alt, /* Followed by two-byte relative address of place to resume at in case of failure. */ /* ifdef MBS_SUPPORT, the size of address is 1. */ on_failure_jump, /* Like on_failure_jump, but pushes a placeholder instead of the current string position when executed. */ on_failure_keep_string_jump, /* Throw away latest failure point and then jump to following two-byte relative address. */ /* ifdef MBS_SUPPORT, the size of address is 1. */ pop_failure_jump, /* Change to pop_failure_jump if know won't have to backtrack to match; otherwise change to jump. This is used to jump back to the beginning of a repeat. If what follows this jump clearly won't match what the repeat does, such that we can be sure that there is no use backtracking out of repetitions already matched, then we change it to a pop_failure_jump. Followed by two-byte address. */ /* ifdef MBS_SUPPORT, the size of address is 1. */ maybe_pop_jump, /* Jump to following two-byte address, and push a dummy failure point. This failure point will be thrown away if an attempt is made to use it for a failure. A `+' construct makes this before the first repeat. Also used as an intermediary kind of jump when compiling an alternative. */ /* ifdef MBS_SUPPORT, the size of address is 1. */ dummy_failure_jump, /* Push a dummy failure point and continue. Used at the end of alternatives. */ push_dummy_failure, /* Followed by two-byte relative address and two-byte number n. After matching N times, jump to the address upon failure. */ /* ifdef MBS_SUPPORT, the size of address is 1. */ succeed_n, /* Followed by two-byte relative address, and two-byte number n. Jump to the address N times, then fail. */ /* ifdef MBS_SUPPORT, the size of address is 1. */ jump_n, /* Set the following two-byte relative address to the subsequent two-byte number. The address *includes* the two bytes of number. */ /* ifdef MBS_SUPPORT, the size of address is 1. */ set_number_at, wordchar, /* Matches any word-constituent character. */ notwordchar, /* Matches any char that is not a word-constituent. */ wordbeg, /* Succeeds if at word beginning. */ wordend, /* Succeeds if at word end. */ wordbound, /* Succeeds if at a word boundary. */ notwordbound /* Succeeds if not at a word boundary. */ #ifdef emacs ,before_dot, /* Succeeds if before point. */ at_dot, /* Succeeds if at point. */ after_dot, /* Succeeds if after point. */ /* Matches any character whose syntax is specified. Followed by a byte which contains a syntax code, e.g., Sword. */ syntaxspec, /* Matches any character whose syntax is not that specified. */ notsyntaxspec #endif /* emacs */ } re_opcode_t; /* Common operations on the compiled pattern. */ /* Store NUMBER in two contiguous bytes starting at DESTINATION. */ /* ifdef MBS_SUPPORT, we store NUMBER in 1 element. */ #ifdef MBS_SUPPORT # define STORE_NUMBER(destination, number) \ do { \ *(destination) = (US_CHAR_TYPE)(number); \ } while (0) #else # define STORE_NUMBER(destination, number) \ do { \ (destination)[0] = (number) & 0377; \ (destination)[1] = (number) >> 8; \ } while (0) #endif /* MBS_SUPPORT */ /* Same as STORE_NUMBER, except increment DESTINATION to the byte after where the number is stored. Therefore, DESTINATION must be an lvalue. */ /* ifdef MBS_SUPPORT, we store NUMBER in 1 element. */ #define STORE_NUMBER_AND_INCR(destination, number) \ do { \ STORE_NUMBER (destination, number); \ (destination) += OFFSET_ADDRESS_SIZE; \ } while (0) /* Put into DESTINATION a number stored in two contiguous bytes starting at SOURCE. */ /* ifdef MBS_SUPPORT, we store NUMBER in 1 element. */ #ifdef MBS_SUPPORT # define EXTRACT_NUMBER(destination, source) \ do { \ (destination) = *(source); \ } while (0) #else # define EXTRACT_NUMBER(destination, source) \ do { \ (destination) = *(source) & 0377; \ (destination) += SIGN_EXTEND_CHAR (*((source) + 1)) << 8; \ } while (0) #endif #ifdef DEBUG static void extract_number _RE_ARGS ((int *dest, US_CHAR_TYPE *source)); static void extract_number (dest, source) int *dest; US_CHAR_TYPE *source; { #ifdef MBS_SUPPORT *dest = *source; #else int temp = SIGN_EXTEND_CHAR (*(source + 1)); *dest = *source & 0377; *dest += temp << 8; #endif } # ifndef EXTRACT_MACROS /* To debug the macros. */ # undef EXTRACT_NUMBER # define EXTRACT_NUMBER(dest, src) extract_number (&dest, src) # endif /* not EXTRACT_MACROS */ #endif /* DEBUG */ /* Same as EXTRACT_NUMBER, except increment SOURCE to after the number. SOURCE must be an lvalue. */ #define EXTRACT_NUMBER_AND_INCR(destination, source) \ do { \ EXTRACT_NUMBER (destination, source); \ (source) += OFFSET_ADDRESS_SIZE; \ } while (0) #ifdef DEBUG static void extract_number_and_incr _RE_ARGS ((int *destination, US_CHAR_TYPE **source)); static void extract_number_and_incr (destination, source) int *destination; US_CHAR_TYPE **source; { extract_number (destination, *source); *source += OFFSET_ADDRESS_SIZE; } # ifndef EXTRACT_MACROS # undef EXTRACT_NUMBER_AND_INCR # define EXTRACT_NUMBER_AND_INCR(dest, src) \ extract_number_and_incr (&dest, &src) # endif /* not EXTRACT_MACROS */ #endif /* DEBUG */ /* If DEBUG is defined, Regex prints many voluminous messages about what it is doing (if the variable `debug' is nonzero). If linked with the main program in `iregex.c', you can enter patterns and strings interactively. And if linked with the main program in `main.c' and the other test files, you can run the already-written tests. */ #ifdef DEBUG /* We use standard I/O for debugging. */ # include <stdio.h> /* It is useful to test things that ``must'' be true when debugging. */ # include <assert.h> static int debug; # define DEBUG_STATEMENT(e) e # define DEBUG_PRINT1(x) if (debug) printf (x) # define DEBUG_PRINT2(x1, x2) if (debug) printf (x1, x2) # define DEBUG_PRINT3(x1, x2, x3) if (debug) printf (x1, x2, x3) # define DEBUG_PRINT4(x1, x2, x3, x4) if (debug) printf (x1, x2, x3, x4) # define DEBUG_PRINT_COMPILED_PATTERN(p, s, e) \ if (debug) print_partial_compiled_pattern (s, e) # define DEBUG_PRINT_DOUBLE_STRING(w, s1, sz1, s2, sz2) \ if (debug) print_double_string (w, s1, sz1, s2, sz2) /* Print the fastmap in human-readable form. */ void print_fastmap (fastmap) char *fastmap; { unsigned was_a_range = 0; unsigned i = 0; while (i < (1 << BYTEWIDTH)) { if (fastmap[i++]) { was_a_range = 0; putchar (i - 1); while (i < (1 << BYTEWIDTH) && fastmap[i]) { was_a_range = 1; i++; } if (was_a_range) { printf ("-"); putchar (i - 1); } } } putchar ('\n'); } /* Print a compiled pattern string in human-readable form, starting at the START pointer into it and ending just before the pointer END. */ void print_partial_compiled_pattern (start, end) US_CHAR_TYPE *start; US_CHAR_TYPE *end; { int mcnt, mcnt2; US_CHAR_TYPE *p1; US_CHAR_TYPE *p = start; US_CHAR_TYPE *pend = end; if (start == NULL) { printf ("(null)\n"); return; } /* Loop over pattern commands. */ while (p < pend) { #ifdef _LIBC printf ("%td:\t", p - start); #else printf ("%ld:\t", (long int) (p - start)); #endif switch ((re_opcode_t) *p++) { case no_op: printf ("/no_op"); break; case exactn: mcnt = *p++; printf ("/exactn/%d", mcnt); do { putchar ('/'); PUT_CHAR (*p++); } while (--mcnt); break; #ifdef MBS_SUPPORT case exactn_bin: mcnt = *p++; printf ("/exactn_bin/%d", mcnt); do { printf("/%lx", (long int) *p++); } while (--mcnt); break; #endif /* MBS_SUPPORT */ case start_memory: mcnt = *p++; printf ("/start_memory/%d/%ld", mcnt, (long int) *p++); break; case stop_memory: mcnt = *p++; printf ("/stop_memory/%d/%ld", mcnt, (long int) *p++); break; case duplicate: printf ("/duplicate/%ld", (long int) *p++); break; case anychar: printf ("/anychar"); break; case charset: case charset_not: { #ifdef MBS_SUPPORT int i, length; wchar_t *workp = p; printf ("/charset [%s", (re_opcode_t) *(workp - 1) == charset_not ? "^" : ""); p += 5; length = *workp++; /* the length of char_classes */ for (i=0 ; i<length ; i++) printf("[:%lx:]", (long int) *p++); length = *workp++; /* the length of collating_symbol */ for (i=0 ; i<length ;) { printf("[."); while(*p != 0) PUT_CHAR((i++,*p++)); i++,p++; printf(".]"); } length = *workp++; /* the length of equivalence_class */ for (i=0 ; i<length ;) { printf("[="); while(*p != 0) PUT_CHAR((i++,*p++)); i++,p++; printf("=]"); } length = *workp++; /* the length of char_range */ for (i=0 ; i<length ; i++) { wchar_t range_start = *p++; wchar_t range_end = *p++; if (MB_CUR_MAX == 1) printf("%c-%c", (char) range_start, (char) range_end); else printf("%C-%C", (wint_t) range_start, (wint_t) range_end); } length = *workp++; /* the length of char */ for (i=0 ; i<length ; i++) if (MB_CUR_MAX == 1) putchar (*p++); else printf("%C", (wint_t) *p++); putchar (']'); #else register int c, last = -100; register int in_range = 0; printf ("/charset [%s", (re_opcode_t) *(p - 1) == charset_not ? "^" : ""); assert (p + *p < pend); for (c = 0; c < 256; c++) if (c / 8 < *p && (p[1 + (c/8)] & (1 << (c % 8)))) { /* Are we starting a range? */ if (last + 1 == c && ! in_range) { putchar ('-'); in_range = 1; } /* Have we broken a range? */ else if (last + 1 != c && in_range) { putchar (last); in_range = 0; } if (! in_range) putchar (c); last = c; } if (in_range) putchar (last); putchar (']'); p += 1 + *p; #endif /* MBS_SUPPORT */ } break; case begline: printf ("/begline"); break; case endline: printf ("/endline"); break; case on_failure_jump: extract_number_and_incr (&mcnt, &p); #ifdef _LIBC printf ("/on_failure_jump to %td", p + mcnt - start); #else printf ("/on_failure_jump to %ld", (long int) (p + mcnt - start)); #endif break; case on_failure_keep_string_jump: extract_number_and_incr (&mcnt, &p); #ifdef _LIBC printf ("/on_failure_keep_string_jump to %td", p + mcnt - start); #else printf ("/on_failure_keep_string_jump to %ld", (long int) (p + mcnt - start)); #endif break; case dummy_failure_jump: extract_number_and_incr (&mcnt, &p); #ifdef _LIBC printf ("/dummy_failure_jump to %td", p + mcnt - start); #else printf ("/dummy_failure_jump to %ld", (long int) (p + mcnt - start)); #endif break; case push_dummy_failure: printf ("/push_dummy_failure"); break; case maybe_pop_jump: extract_number_and_incr (&mcnt, &p); #ifdef _LIBC printf ("/maybe_pop_jump to %td", p + mcnt - start); #else printf ("/maybe_pop_jump to %ld", (long int) (p + mcnt - start)); #endif break; case pop_failure_jump: extract_number_and_incr (&mcnt, &p); #ifdef _LIBC printf ("/pop_failure_jump to %td", p + mcnt - start); #else printf ("/pop_failure_jump to %ld", (long int) (p + mcnt - start)); #endif break; case jump_past_alt: extract_number_and_incr (&mcnt, &p); #ifdef _LIBC printf ("/jump_past_alt to %td", p + mcnt - start); #else printf ("/jump_past_alt to %ld", (long int) (p + mcnt - start)); #endif break; case jump: extract_number_and_incr (&mcnt, &p); #ifdef _LIBC printf ("/jump to %td", p + mcnt - start); #else printf ("/jump to %ld", (long int) (p + mcnt - start)); #endif break; case succeed_n: extract_number_and_incr (&mcnt, &p); p1 = p + mcnt; extract_number_and_incr (&mcnt2, &p); #ifdef _LIBC printf ("/succeed_n to %td, %d times", p1 - start, mcnt2); #else printf ("/succeed_n to %ld, %d times", (long int) (p1 - start), mcnt2); #endif break; case jump_n: extract_number_and_incr (&mcnt, &p); p1 = p + mcnt; extract_number_and_incr (&mcnt2, &p); printf ("/jump_n to %d, %d times", p1 - start, mcnt2); break; case set_number_at: extract_number_and_incr (&mcnt, &p); p1 = p + mcnt; extract_number_and_incr (&mcnt2, &p); #ifdef _LIBC printf ("/set_number_at location %td to %d", p1 - start, mcnt2); #else printf ("/set_number_at location %ld to %d", (long int) (p1 - start), mcnt2); #endif break; case wordbound: printf ("/wordbound"); break; case notwordbound: printf ("/notwordbound"); break; case wordbeg: printf ("/wordbeg"); break; case wordend: printf ("/wordend"); break; # ifdef emacs case before_dot: printf ("/before_dot"); break; case at_dot: printf ("/at_dot"); break; case after_dot: printf ("/after_dot"); break; case syntaxspec: printf ("/syntaxspec"); mcnt = *p++; printf ("/%d", mcnt); break; case notsyntaxspec: printf ("/notsyntaxspec"); mcnt = *p++; printf ("/%d", mcnt); break; # endif /* emacs */ case wordchar: printf ("/wordchar"); break; case notwordchar: printf ("/notwordchar"); break; case begbuf: printf ("/begbuf"); break; case endbuf: printf ("/endbuf"); break; default: printf ("?%ld", (long int) *(p-1)); } putchar ('\n'); } #ifdef _LIBC printf ("%td:\tend of pattern.\n", p - start); #else printf ("%ld:\tend of pattern.\n", (long int) (p - start)); #endif } void print_compiled_pattern (bufp) struct re_pattern_buffer *bufp; { US_CHAR_TYPE *buffer = (US_CHAR_TYPE*) bufp->buffer; print_partial_compiled_pattern (buffer, buffer + bufp->used / sizeof(US_CHAR_TYPE)); printf ("%ld bytes used/%ld bytes allocated.\n", bufp->used, bufp->allocated); if (bufp->fastmap_accurate && bufp->fastmap) { printf ("fastmap: "); print_fastmap (bufp->fastmap); } #ifdef _LIBC printf ("re_nsub: %Zd\t", bufp->re_nsub); #else printf ("re_nsub: %ld\t", (long int) bufp->re_nsub); #endif printf ("regs_alloc: %d\t", bufp->regs_allocated); printf ("can_be_null: %d\t", bufp->can_be_null); printf ("newline_anchor: %d\n", bufp->newline_anchor); printf ("no_sub: %d\t", bufp->no_sub); printf ("not_bol: %d\t", bufp->not_bol); printf ("not_eol: %d\t", bufp->not_eol); printf ("syntax: %lx\n", bufp->syntax); /* Perhaps we should print the translate table? */ } void print_double_string (where, string1, size1, string2, size2) const CHAR_TYPE *where; const CHAR_TYPE *string1; const CHAR_TYPE *string2; int size1; int size2; { int this_char; if (where == NULL) printf ("(null)"); else { if (FIRST_STRING_P (where)) { for (this_char = where - string1; this_char < size1; this_char++) PUT_CHAR (string1[this_char]); where = string2; } for (this_char = where - string2; this_char < size2; this_char++) PUT_CHAR (string2[this_char]); } } void printchar (c) int c; { putc (c, stderr); } #else /* not DEBUG */ # undef assert # define assert(e) # define DEBUG_STATEMENT(e) # define DEBUG_PRINT1(x) # define DEBUG_PRINT2(x1, x2) # define DEBUG_PRINT3(x1, x2, x3) # define DEBUG_PRINT4(x1, x2, x3, x4) # define DEBUG_PRINT_COMPILED_PATTERN(p, s, e) # define DEBUG_PRINT_DOUBLE_STRING(w, s1, sz1, s2, sz2) #endif /* not DEBUG */ #ifdef MBS_SUPPORT /* This convert a multibyte string to a wide character string. And write their correspondances to offset_buffer(see below) and write whether each wchar_t is binary data to is_binary. This assume invalid multibyte sequences as binary data. We assume offset_buffer and is_binary is already allocated enough space. */ static size_t convert_mbs_to_wcs (CHAR_TYPE *dest, const unsigned char* src, size_t len, int *offset_buffer, char *is_binary); static size_t convert_mbs_to_wcs (dest, src, len, offset_buffer, is_binary) CHAR_TYPE *dest; const unsigned char* src; size_t len; /* the length of multibyte string. */ /* It hold correspondances between src(char string) and dest(wchar_t string) for optimization. e.g. src = "xxxyzz" dest = {'X', 'Y', 'Z'} (each "xxx", "y" and "zz" represent one multibyte character corresponding to 'X', 'Y' and 'Z'.) offset_buffer = {0, 0+3("xxx"), 0+3+1("y"), 0+3+1+2("zz")} = {0, 3, 4, 6} */ int *offset_buffer; char *is_binary; { wchar_t *pdest = dest; const unsigned char *psrc = src; size_t wc_count = 0; if (MB_CUR_MAX == 1) { /* We don't need conversion. */ for ( ; wc_count < len ; ++wc_count) { *pdest++ = *psrc++; is_binary[wc_count] = FALSE; offset_buffer[wc_count] = wc_count; } offset_buffer[wc_count] = wc_count; } else { /* We need conversion. */ mbstate_t mbs; int consumed; size_t mb_remain = len; size_t mb_count = 0; /* Initialize the conversion state. */ memset (&mbs, 0, sizeof (mbstate_t)); offset_buffer[0] = 0; for( ; mb_remain > 0 ; ++wc_count, ++pdest, mb_remain -= consumed, psrc += consumed) { consumed = mbrtowc (pdest, psrc, mb_remain, &mbs); if (consumed <= 0) /* failed to convert. maybe src contains binary data. So we consume 1 byte manualy. */ { *pdest = *psrc; consumed = 1; is_binary[wc_count] = TRUE; } else is_binary[wc_count] = FALSE; /* In sjis encoding, we use yen sign as escape character in place of reverse solidus. So we convert 0x5c(yen sign in sjis) to not 0xa5(yen sign in UCS2) but 0x5c(reverse solidus in UCS2). */ if (consumed == 1 && (int) *psrc == 0x5c && (int) *pdest == 0xa5) *pdest = (wchar_t) *psrc; offset_buffer[wc_count + 1] = mb_count += consumed; } } return wc_count; } #endif /* MBS_SUPPORT */ /* Set by `re_set_syntax' to the current regexp syntax to recognize. Can also be assigned to arbitrarily: each pattern buffer stores its own syntax, so it can be changed between regex compilations. */ /* This has no initializer because initialized variables in Emacs become read-only after dumping. */ reg_syntax_t re_syntax_options; /* Specify the precise syntax of regexps for compilation. This provides for compatibility for various utilities which historically have different, incompatible syntaxes. The argument SYNTAX is a bit mask comprised of the various bits defined in regex.h. We return the old syntax. */ reg_syntax_t re_set_syntax (syntax) reg_syntax_t syntax; { reg_syntax_t ret = re_syntax_options; re_syntax_options = syntax; #ifdef DEBUG if (syntax & RE_DEBUG) debug = 1; else if (debug) /* was on but now is not */ debug = 0; #endif /* DEBUG */ return ret; } #ifdef _LIBC weak_alias (__re_set_syntax, re_set_syntax) #endif /* This table gives an error message for each of the error codes listed in regex.h. Obviously the order here has to be same as there. POSIX doesn't require that we do anything for REG_NOERROR, but why not be nice? */ static const char re_error_msgid[] = { #define REG_NOERROR_IDX 0 gettext_noop ("Success") /* REG_NOERROR */ "\0" #define REG_NOMATCH_IDX (REG_NOERROR_IDX + sizeof "Success") gettext_noop ("No match") /* REG_NOMATCH */ "\0" #define REG_BADPAT_IDX (REG_NOMATCH_IDX + sizeof "No match") gettext_noop ("Invalid regular expression") /* REG_BADPAT */ "\0" #define REG_ECOLLATE_IDX (REG_BADPAT_IDX + sizeof "Invalid regular expression") gettext_noop ("Invalid collation character") /* REG_ECOLLATE */ "\0" #define REG_ECTYPE_IDX (REG_ECOLLATE_IDX + sizeof "Invalid collation character") gettext_noop ("Invalid character class name") /* REG_ECTYPE */ "\0" #define REG_EESCAPE_IDX (REG_ECTYPE_IDX + sizeof "Invalid character class name") gettext_noop ("Trailing backslash") /* REG_EESCAPE */ "\0" #define REG_ESUBREG_IDX (REG_EESCAPE_IDX + sizeof "Trailing backslash") gettext_noop ("Invalid back reference") /* REG_ESUBREG */ "\0" #define REG_EBRACK_IDX (REG_ESUBREG_IDX + sizeof "Invalid back reference") gettext_noop ("Unmatched [ or [^") /* REG_EBRACK */ "\0" #define REG_EPAREN_IDX (REG_EBRACK_IDX + sizeof "Unmatched [ or [^") gettext_noop ("Unmatched ( or \\(") /* REG_EPAREN */ "\0" #define REG_EBRACE_IDX (REG_EPAREN_IDX + sizeof "Unmatched ( or \\(") gettext_noop ("Unmatched \\{") /* REG_EBRACE */ "\0" #define REG_BADBR_IDX (REG_EBRACE_IDX + sizeof "Unmatched \\{") gettext_noop ("Invalid content of \\{\\}") /* REG_BADBR */ "\0" #define REG_ERANGE_IDX (REG_BADBR_IDX + sizeof "Invalid content of \\{\\}") gettext_noop ("Invalid range end") /* REG_ERANGE */ "\0" #define REG_ESPACE_IDX (REG_ERANGE_IDX + sizeof "Invalid range end") gettext_noop ("Memory exhausted") /* REG_ESPACE */ "\0" #define REG_BADRPT_IDX (REG_ESPACE_IDX + sizeof "Memory exhausted") gettext_noop ("Invalid preceding regular expression") /* REG_BADRPT */ "\0" #define REG_EEND_IDX (REG_BADRPT_IDX + sizeof "Invalid preceding regular expression") gettext_noop ("Premature end of regular expression") /* REG_EEND */ "\0" #define REG_ESIZE_IDX (REG_EEND_IDX + sizeof "Premature end of regular expression") gettext_noop ("Regular expression too big") /* REG_ESIZE */ "\0" #define REG_ERPAREN_IDX (REG_ESIZE_IDX + sizeof "Regular expression too big") gettext_noop ("Unmatched ) or \\)") /* REG_ERPAREN */ }; static const size_t re_error_msgid_idx[] = { REG_NOERROR_IDX, REG_NOMATCH_IDX, REG_BADPAT_IDX, REG_ECOLLATE_IDX, REG_ECTYPE_IDX, REG_EESCAPE_IDX, REG_ESUBREG_IDX, REG_EBRACK_IDX, REG_EPAREN_IDX, REG_EBRACE_IDX, REG_BADBR_IDX, REG_ERANGE_IDX, REG_ESPACE_IDX, REG_BADRPT_IDX, REG_EEND_IDX, REG_ESIZE_IDX, REG_ERPAREN_IDX }; /* Avoiding alloca during matching, to placate r_alloc. */ /* Define MATCH_MAY_ALLOCATE unless we need to make sure that the searching and matching functions should not call alloca. On some systems, alloca is implemented in terms of malloc, and if we're using the relocating allocator routines, then malloc could cause a relocation, which might (if the strings being searched are in the ralloc heap) shift the data out from underneath the regexp routines. Here's another reason to avoid allocation: Emacs processes input from X in a signal handler; processing X input may call malloc; if input arrives while a matching routine is calling malloc, then we're scrod. But Emacs can't just block input while calling matching routines; then we don't notice interrupts when they come in. So, Emacs blocks input around all regexp calls except the matching calls, which it leaves unprotected, in the faith that they will not malloc. */ /* Normally, this is fine. */ #define MATCH_MAY_ALLOCATE /* When using GNU C, we are not REALLY using the C alloca, no matter what config.h may say. So don't take precautions for it. */ #ifdef __GNUC__ # undef C_ALLOCA #endif /* The match routines may not allocate if (1) they would do it with malloc and (2) it's not safe for them to use malloc. Note that if REL_ALLOC is defined, matching would not use malloc for the failure stack, but we would still use it for the register vectors; so REL_ALLOC should not affect this. */ #if (defined C_ALLOCA || defined REGEX_MALLOC) && defined emacs # undef MATCH_MAY_ALLOCATE #endif /* Failure stack declarations and macros; both re_compile_fastmap and re_match_2 use a failure stack. These have to be macros because of REGEX_ALLOCATE_STACK. */ /* Number of failure points for which to initially allocate space when matching. If this number is exceeded, we allocate more space, so it is not a hard limit. */ #ifndef INIT_FAILURE_ALLOC # define INIT_FAILURE_ALLOC 5 #endif /* Roughly the maximum number of failure points on the stack. Would be exactly that if always used MAX_FAILURE_ITEMS items each time we failed. This is a variable only so users of regex can assign to it; we never change it ourselves. */ #ifdef INT_IS_16BIT # if defined MATCH_MAY_ALLOCATE /* 4400 was enough to cause a crash on Alpha OSF/1, whose default stack limit is 2mb. */ long int re_max_failures = 4000; # else long int re_max_failures = 2000; # endif union fail_stack_elt { US_CHAR_TYPE *pointer; long int integer; }; typedef union fail_stack_elt fail_stack_elt_t; typedef struct { fail_stack_elt_t *stack; unsigned long int size; unsigned long int avail; /* Offset of next open position. */ } fail_stack_type; #else /* not INT_IS_16BIT */ # if defined MATCH_MAY_ALLOCATE /* 4400 was enough to cause a crash on Alpha OSF/1, whose default stack limit is 2mb. */ int re_max_failures = 4000; # else int re_max_failures = 2000; # endif union fail_stack_elt { US_CHAR_TYPE *pointer; int integer; }; typedef union fail_stack_elt fail_stack_elt_t; typedef struct { fail_stack_elt_t *stack; unsigned size; unsigned avail; /* Offset of next open position. */ } fail_stack_type; #endif /* INT_IS_16BIT */ #define FAIL_STACK_EMPTY() (fail_stack.avail == 0) #define FAIL_STACK_PTR_EMPTY() (fail_stack_ptr->avail == 0) #define FAIL_STACK_FULL() (fail_stack.avail == fail_stack.size) /* Define macros to initialize and free the failure stack. Do `return -2' if the alloc fails. */ #ifdef MATCH_MAY_ALLOCATE # define INIT_FAIL_STACK() \ do { \ fail_stack.stack = (fail_stack_elt_t *) \ REGEX_ALLOCATE_STACK (INIT_FAILURE_ALLOC * sizeof (fail_stack_elt_t)); \ \ if (fail_stack.stack == NULL) \ return -2; \ \ fail_stack.size = INIT_FAILURE_ALLOC; \ fail_stack.avail = 0; \ } while (0) # define RESET_FAIL_STACK() REGEX_FREE_STACK (fail_stack.stack) #else # define INIT_FAIL_STACK() \ do { \ fail_stack.avail = 0; \ } while (0) # define RESET_FAIL_STACK() #endif /* Double the size of FAIL_STACK, up to approximately `re_max_failures' items. Return 1 if succeeds, and 0 if either ran out of memory allocating space for it or it was already too large. REGEX_REALLOCATE_STACK requires `destination' be declared. */ #define DOUBLE_FAIL_STACK(fail_stack) \ ((fail_stack).size > (unsigned) (re_max_failures * MAX_FAILURE_ITEMS) \ ? 0 \ : ((fail_stack).stack = (fail_stack_elt_t *) \ REGEX_REALLOCATE_STACK ((fail_stack).stack, \ (fail_stack).size * sizeof (fail_stack_elt_t), \ ((fail_stack).size << 1) * sizeof (fail_stack_elt_t)), \ \ (fail_stack).stack == NULL \ ? 0 \ : ((fail_stack).size <<= 1, \ 1))) /* Push pointer POINTER on FAIL_STACK. Return 1 if was able to do so and 0 if ran out of memory allocating space to do so. */ #define PUSH_PATTERN_OP(POINTER, FAIL_STACK) \ ((FAIL_STACK_FULL () \ && !DOUBLE_FAIL_STACK (FAIL_STACK)) \ ? 0 \ : ((FAIL_STACK).stack[(FAIL_STACK).avail++].pointer = POINTER, \ 1)) /* Push a pointer value onto the failure stack. Assumes the variable `fail_stack'. Probably should only be called from within `PUSH_FAILURE_POINT'. */ #define PUSH_FAILURE_POINTER(item) \ fail_stack.stack[fail_stack.avail++].pointer = (US_CHAR_TYPE *) (item) /* This pushes an integer-valued item onto the failure stack. Assumes the variable `fail_stack'. Probably should only be called from within `PUSH_FAILURE_POINT'. */ #define PUSH_FAILURE_INT(item) \ fail_stack.stack[fail_stack.avail++].integer = (item) /* Push a fail_stack_elt_t value onto the failure stack. Assumes the variable `fail_stack'. Probably should only be called from within `PUSH_FAILURE_POINT'. */ #define PUSH_FAILURE_ELT(item) \ fail_stack.stack[fail_stack.avail++] = (item) /* These three POP... operations complement the three PUSH... operations. All assume that `fail_stack' is nonempty. */ #define POP_FAILURE_POINTER() fail_stack.stack[--fail_stack.avail].pointer #define POP_FAILURE_INT() fail_stack.stack[--fail_stack.avail].integer #define POP_FAILURE_ELT() fail_stack.stack[--fail_stack.avail] /* Used to omit pushing failure point id's when we're not debugging. */ #ifdef DEBUG # define DEBUG_PUSH PUSH_FAILURE_INT # define DEBUG_POP(item_addr) *(item_addr) = POP_FAILURE_INT () #else # define DEBUG_PUSH(item) # define DEBUG_POP(item_addr) #endif /* Push the information about the state we will need if we ever fail back to it. Requires variables fail_stack, regstart, regend, reg_info, and num_regs_pushed be declared. DOUBLE_FAIL_STACK requires `destination' be declared. Does `return FAILURE_CODE' if runs out of memory. */ #define PUSH_FAILURE_POINT(pattern_place, string_place, failure_code) \ do { \ char *destination; \ /* Must be int, so when we don't save any registers, the arithmetic \ of 0 + -1 isn't done as unsigned. */ \ /* Can't be int, since there is not a shred of a guarantee that int \ is wide enough to hold a value of something to which pointer can \ be assigned */ \ active_reg_t this_reg; \ \ DEBUG_STATEMENT (failure_id++); \ DEBUG_STATEMENT (nfailure_points_pushed++); \ DEBUG_PRINT2 ("\nPUSH_FAILURE_POINT #%u:\n", failure_id); \ DEBUG_PRINT2 (" Before push, next avail: %d\n", (fail_stack).avail);\ DEBUG_PRINT2 (" size: %d\n", (fail_stack).size);\ \ DEBUG_PRINT2 (" slots needed: %ld\n", NUM_FAILURE_ITEMS); \ DEBUG_PRINT2 (" available: %d\n", REMAINING_AVAIL_SLOTS); \ \ /* Ensure we have enough space allocated for what we will push. */ \ while (REMAINING_AVAIL_SLOTS < NUM_FAILURE_ITEMS) \ { \ if (!DOUBLE_FAIL_STACK (fail_stack)) \ return failure_code; \ \ DEBUG_PRINT2 ("\n Doubled stack; size now: %d\n", \ (fail_stack).size); \ DEBUG_PRINT2 (" slots available: %d\n", REMAINING_AVAIL_SLOTS);\ } \ \ /* Push the info, starting with the registers. */ \ DEBUG_PRINT1 ("\n"); \ \ if (1) \ for (this_reg = lowest_active_reg; this_reg <= highest_active_reg; \ this_reg++) \ { \ DEBUG_PRINT2 (" Pushing reg: %lu\n", this_reg); \ DEBUG_STATEMENT (num_regs_pushed++); \ \ DEBUG_PRINT2 (" start: %p\n", regstart[this_reg]); \ PUSH_FAILURE_POINTER (regstart[this_reg]); \ \ DEBUG_PRINT2 (" end: %p\n", regend[this_reg]); \ PUSH_FAILURE_POINTER (regend[this_reg]); \ \ DEBUG_PRINT2 (" info: %p\n ", \ reg_info[this_reg].word.pointer); \ DEBUG_PRINT2 (" match_null=%d", \ REG_MATCH_NULL_STRING_P (reg_info[this_reg])); \ DEBUG_PRINT2 (" active=%d", IS_ACTIVE (reg_info[this_reg])); \ DEBUG_PRINT2 (" matched_something=%d", \ MATCHED_SOMETHING (reg_info[this_reg])); \ DEBUG_PRINT2 (" ever_matched=%d", \ EVER_MATCHED_SOMETHING (reg_info[this_reg])); \ DEBUG_PRINT1 ("\n"); \ PUSH_FAILURE_ELT (reg_info[this_reg].word); \ } \ \ DEBUG_PRINT2 (" Pushing low active reg: %ld\n", lowest_active_reg);\ PUSH_FAILURE_INT (lowest_active_reg); \ \ DEBUG_PRINT2 (" Pushing high active reg: %ld\n", highest_active_reg);\ PUSH_FAILURE_INT (highest_active_reg); \ \ DEBUG_PRINT2 (" Pushing pattern %p:\n", pattern_place); \ DEBUG_PRINT_COMPILED_PATTERN (bufp, pattern_place, pend); \ PUSH_FAILURE_POINTER (pattern_place); \ \ DEBUG_PRINT2 (" Pushing string %p: `", string_place); \ DEBUG_PRINT_DOUBLE_STRING (string_place, string1, size1, string2, \ size2); \ DEBUG_PRINT1 ("'\n"); \ PUSH_FAILURE_POINTER (string_place); \ \ DEBUG_PRINT2 (" Pushing failure id: %u\n", failure_id); \ DEBUG_PUSH (failure_id); \ } while (0) /* This is the number of items that are pushed and popped on the stack for each register. */ #define NUM_REG_ITEMS 3 /* Individual items aside from the registers. */ #ifdef DEBUG # define NUM_NONREG_ITEMS 5 /* Includes failure point id. */ #else # define NUM_NONREG_ITEMS 4 #endif /* We push at most this many items on the stack. */ /* We used to use (num_regs - 1), which is the number of registers this regexp will save; but that was changed to 5 to avoid stack overflow for a regexp with lots of parens. */ #define MAX_FAILURE_ITEMS (5 * NUM_REG_ITEMS + NUM_NONREG_ITEMS) /* We actually push this many items. */ #define NUM_FAILURE_ITEMS \ (((0 \ ? 0 : highest_active_reg - lowest_active_reg + 1) \ * NUM_REG_ITEMS) \ + NUM_NONREG_ITEMS) /* How many items can still be added to the stack without overflowing it. */ #define REMAINING_AVAIL_SLOTS ((fail_stack).size - (fail_stack).avail) /* Pops what PUSH_FAIL_STACK pushes. We restore into the parameters, all of which should be lvalues: STR -- the saved data position. PAT -- the saved pattern position. LOW_REG, HIGH_REG -- the highest and lowest active registers. REGSTART, REGEND -- arrays of string positions. REG_INFO -- array of information about each subexpression. Also assumes the variables `fail_stack' and (if debugging), `bufp', `pend', `string1', `size1', `string2', and `size2'. */ #define POP_FAILURE_POINT(str, pat, low_reg, high_reg, regstart, regend, reg_info)\ { \ DEBUG_STATEMENT (unsigned failure_id;) \ active_reg_t this_reg; \ const US_CHAR_TYPE *string_temp; \ \ assert (!FAIL_STACK_EMPTY ()); \ \ /* Remove failure points and point to how many regs pushed. */ \ DEBUG_PRINT1 ("POP_FAILURE_POINT:\n"); \ DEBUG_PRINT2 (" Before pop, next avail: %d\n", fail_stack.avail); \ DEBUG_PRINT2 (" size: %d\n", fail_stack.size); \ \ assert (fail_stack.avail >= NUM_NONREG_ITEMS); \ \ DEBUG_POP (&failure_id); \ DEBUG_PRINT2 (" Popping failure id: %u\n", failure_id); \ \ /* If the saved string location is NULL, it came from an \ on_failure_keep_string_jump opcode, and we want to throw away the \ saved NULL, thus retaining our current position in the string. */ \ string_temp = POP_FAILURE_POINTER (); \ if (string_temp != NULL) \ str = (const CHAR_TYPE *) string_temp; \ \ DEBUG_PRINT2 (" Popping string %p: `", str); \ DEBUG_PRINT_DOUBLE_STRING (str, string1, size1, string2, size2); \ DEBUG_PRINT1 ("'\n"); \ \ pat = (US_CHAR_TYPE *) POP_FAILURE_POINTER (); \ DEBUG_PRINT2 (" Popping pattern %p:\n", pat); \ DEBUG_PRINT_COMPILED_PATTERN (bufp, pat, pend); \ \ /* Restore register info. */ \ high_reg = (active_reg_t) POP_FAILURE_INT (); \ DEBUG_PRINT2 (" Popping high active reg: %ld\n", high_reg); \ \ low_reg = (active_reg_t) POP_FAILURE_INT (); \ DEBUG_PRINT2 (" Popping low active reg: %ld\n", low_reg); \ \ if (1) \ for (this_reg = high_reg; this_reg >= low_reg; this_reg--) \ { \ DEBUG_PRINT2 (" Popping reg: %ld\n", this_reg); \ \ reg_info[this_reg].word = POP_FAILURE_ELT (); \ DEBUG_PRINT2 (" info: %p\n", \ reg_info[this_reg].word.pointer); \ \ regend[this_reg] = (const CHAR_TYPE *) POP_FAILURE_POINTER (); \ DEBUG_PRINT2 (" end: %p\n", regend[this_reg]); \ \ regstart[this_reg] = (const CHAR_TYPE *) POP_FAILURE_POINTER ();\ DEBUG_PRINT2 (" start: %p\n", regstart[this_reg]); \ } \ else \ { \ for (this_reg = highest_active_reg; this_reg > high_reg; this_reg--) \ { \ reg_info[this_reg].word.integer = 0; \ regend[this_reg] = 0; \ regstart[this_reg] = 0; \ } \ highest_active_reg = high_reg; \ } \ \ set_regs_matched_done = 0; \ DEBUG_STATEMENT (nfailure_points_popped++); \ } /* POP_FAILURE_POINT */ /* Structure for per-register (a.k.a. per-group) information. Other register information, such as the starting and ending positions (which are addresses), and the list of inner groups (which is a bits list) are maintained in separate variables. We are making a (strictly speaking) nonportable assumption here: that the compiler will pack our bit fields into something that fits into the type of `word', i.e., is something that fits into one item on the failure stack. */ /* Declarations and macros for re_match_2. */ typedef union { fail_stack_elt_t word; struct { /* This field is one if this group can match the empty string, zero if not. If not yet determined, `MATCH_NULL_UNSET_VALUE'. */ #define MATCH_NULL_UNSET_VALUE 3 unsigned match_null_string_p : 2; unsigned is_active : 1; unsigned matched_something : 1; unsigned ever_matched_something : 1; } bits; } register_info_type; #define REG_MATCH_NULL_STRING_P(R) ((R).bits.match_null_string_p) #define IS_ACTIVE(R) ((R).bits.is_active) #define MATCHED_SOMETHING(R) ((R).bits.matched_something) #define EVER_MATCHED_SOMETHING(R) ((R).bits.ever_matched_something) /* Call this when have matched a real character; it sets `matched' flags for the subexpressions which we are currently inside. Also records that those subexprs have matched. */ #define SET_REGS_MATCHED() \ do \ { \ if (!set_regs_matched_done) \ { \ active_reg_t r; \ set_regs_matched_done = 1; \ for (r = lowest_active_reg; r <= highest_active_reg; r++) \ { \ MATCHED_SOMETHING (reg_info[r]) \ = EVER_MATCHED_SOMETHING (reg_info[r]) \ = 1; \ } \ } \ } \ while (0) /* Registers are set to a sentinel when they haven't yet matched. */ static CHAR_TYPE reg_unset_dummy; #define REG_UNSET_VALUE (®_unset_dummy) #define REG_UNSET(e) ((e) == REG_UNSET_VALUE) /* Subroutine declarations and macros for regex_compile. */ static reg_errcode_t regex_compile _RE_ARGS ((const char *pattern, size_t size, reg_syntax_t syntax, struct re_pattern_buffer *bufp)); static void store_op1 _RE_ARGS ((re_opcode_t op, US_CHAR_TYPE *loc, int arg)); static void store_op2 _RE_ARGS ((re_opcode_t op, US_CHAR_TYPE *loc, int arg1, int arg2)); static void insert_op1 _RE_ARGS ((re_opcode_t op, US_CHAR_TYPE *loc, int arg, US_CHAR_TYPE *end)); static void insert_op2 _RE_ARGS ((re_opcode_t op, US_CHAR_TYPE *loc, int arg1, int arg2, US_CHAR_TYPE *end)); static boolean at_begline_loc_p _RE_ARGS ((const CHAR_TYPE *pattern, const CHAR_TYPE *p, reg_syntax_t syntax)); static boolean at_endline_loc_p _RE_ARGS ((const CHAR_TYPE *p, const CHAR_TYPE *pend, reg_syntax_t syntax)); #ifdef MBS_SUPPORT static reg_errcode_t compile_range _RE_ARGS ((CHAR_TYPE range_start, const CHAR_TYPE **p_ptr, const CHAR_TYPE *pend, char *translate, reg_syntax_t syntax, US_CHAR_TYPE *b, CHAR_TYPE *char_set)); static void insert_space _RE_ARGS ((int num, CHAR_TYPE *loc, CHAR_TYPE *end)); #else static reg_errcode_t compile_range _RE_ARGS ((unsigned int range_start, const CHAR_TYPE **p_ptr, const CHAR_TYPE *pend, char *translate, reg_syntax_t syntax, US_CHAR_TYPE *b)); #endif /* MBS_SUPPORT */ /* Fetch the next character in the uncompiled pattern---translating it if necessary. Also cast from a signed character in the constant string passed to us by the user to an unsigned char that we can use as an array index (in, e.g., `translate'). */ /* ifdef MBS_SUPPORT, we translate only if character <= 0xff, because it is impossible to allocate 4GB array for some encodings which have 4 byte character_set like UCS4. */ #ifndef PATFETCH # ifdef MBS_SUPPORT # define PATFETCH(c) \ do {if (p == pend) return REG_EEND; \ c = (US_CHAR_TYPE) *p++; \ if (translate && (c <= 0xff)) c = (US_CHAR_TYPE) translate[c]; \ } while (0) # else # define PATFETCH(c) \ do {if (p == pend) return REG_EEND; \ c = (unsigned char) *p++; \ if (translate) c = (unsigned char) translate[c]; \ } while (0) # endif /* MBS_SUPPORT */ #endif /* Fetch the next character in the uncompiled pattern, with no translation. */ #define PATFETCH_RAW(c) \ do {if (p == pend) return REG_EEND; \ c = (US_CHAR_TYPE) *p++; \ } while (0) /* Go backwards one character in the pattern. */ #define PATUNFETCH p-- /* If `translate' is non-null, return translate[D], else just D. We cast the subscript to translate because some data is declared as `char *', to avoid warnings when a string constant is passed. But when we use a character as a subscript we must make it unsigned. */ /* ifdef MBS_SUPPORT, we translate only if character <= 0xff, because it is impossible to allocate 4GB array for some encodings which have 4 byte character_set like UCS4. */ #ifndef TRANSLATE # ifdef MBS_SUPPORT # define TRANSLATE(d) \ ((translate && ((US_CHAR_TYPE) (d)) <= 0xff) \ ? (char) translate[(unsigned char) (d)] : (d)) #else # define TRANSLATE(d) \ (translate ? (char) translate[(unsigned char) (d)] : (d)) # endif /* MBS_SUPPORT */ #endif /* Macros for outputting the compiled pattern into `buffer'. */ /* If the buffer isn't allocated when it comes in, use this. */ #define INIT_BUF_SIZE (32 * sizeof(US_CHAR_TYPE)) /* Make sure we have at least N more bytes of space in buffer. */ #ifdef MBS_SUPPORT # define GET_BUFFER_SPACE(n) \ while (((unsigned long)b - (unsigned long)COMPILED_BUFFER_VAR \ + (n)*sizeof(CHAR_TYPE)) > bufp->allocated) \ EXTEND_BUFFER () #else # define GET_BUFFER_SPACE(n) \ while ((unsigned long) (b - bufp->buffer + (n)) > bufp->allocated) \ EXTEND_BUFFER () #endif /* MBS_SUPPORT */ /* Make sure we have one more byte of buffer space and then add C to it. */ #define BUF_PUSH(c) \ do { \ GET_BUFFER_SPACE (1); \ *b++ = (US_CHAR_TYPE) (c); \ } while (0) /* Ensure we have two more bytes of buffer space and then append C1 and C2. */ #define BUF_PUSH_2(c1, c2) \ do { \ GET_BUFFER_SPACE (2); \ *b++ = (US_CHAR_TYPE) (c1); \ *b++ = (US_CHAR_TYPE) (c2); \ } while (0) /* As with BUF_PUSH_2, except for three bytes. */ #define BUF_PUSH_3(c1, c2, c3) \ do { \ GET_BUFFER_SPACE (3); \ *b++ = (US_CHAR_TYPE) (c1); \ *b++ = (US_CHAR_TYPE) (c2); \ *b++ = (US_CHAR_TYPE) (c3); \ } while (0) /* Store a jump with opcode OP at LOC to location TO. We store a relative address offset by the three bytes the jump itself occupies. */ #define STORE_JUMP(op, loc, to) \ store_op1 (op, loc, (int) ((to) - (loc) - (1 + OFFSET_ADDRESS_SIZE))) /* Likewise, for a two-argument jump. */ #define STORE_JUMP2(op, loc, to, arg) \ store_op2 (op, loc, (int) ((to) - (loc) - (1 + OFFSET_ADDRESS_SIZE)), arg) /* Like `STORE_JUMP', but for inserting. Assume `b' is the buffer end. */ #define INSERT_JUMP(op, loc, to) \ insert_op1 (op, loc, (int) ((to) - (loc) - (1 + OFFSET_ADDRESS_SIZE)), b) /* Like `STORE_JUMP2', but for inserting. Assume `b' is the buffer end. */ #define INSERT_JUMP2(op, loc, to, arg) \ insert_op2 (op, loc, (int) ((to) - (loc) - (1 + OFFSET_ADDRESS_SIZE)),\ arg, b) /* This is not an arbitrary limit: the arguments which represent offsets into the pattern are two bytes long. So if 2^16 bytes turns out to be too small, many things would have to change. */ /* Any other compiler which, like MSC, has allocation limit below 2^16 bytes will have to use approach similar to what was done below for MSC and drop MAX_BUF_SIZE a bit. Otherwise you may end up reallocating to 0 bytes. Such thing is not going to work too well. You have been warned!! */ #if defined _MSC_VER && !defined WIN32 /* Microsoft C 16-bit versions limit malloc to approx 65512 bytes. The REALLOC define eliminates a flurry of conversion warnings, but is not required. */ # define MAX_BUF_SIZE 65500L # define REALLOC(p,s) realloc ((p), (size_t) (s)) #else # define MAX_BUF_SIZE (1L << 16) # define REALLOC(p,s) realloc ((p), (s)) #endif /* Extend the buffer by twice its current size via realloc and reset the pointers that pointed into the old block to point to the correct places in the new one. If extending the buffer results in it being larger than MAX_BUF_SIZE, then flag memory exhausted. */ #if __BOUNDED_POINTERS__ # define SET_HIGH_BOUND(P) (__ptrhigh (P) = __ptrlow (P) + bufp->allocated) # define MOVE_BUFFER_POINTER(P) \ (__ptrlow (P) += incr, SET_HIGH_BOUND (P), __ptrvalue (P) += incr) # define ELSE_EXTEND_BUFFER_HIGH_BOUND \ else \ { \ SET_HIGH_BOUND (b); \ SET_HIGH_BOUND (begalt); \ if (fixup_alt_jump) \ SET_HIGH_BOUND (fixup_alt_jump); \ if (laststart) \ SET_HIGH_BOUND (laststart); \ if (pending_exact) \ SET_HIGH_BOUND (pending_exact); \ } #else # define MOVE_BUFFER_POINTER(P) (P) += incr # define ELSE_EXTEND_BUFFER_HIGH_BOUND #endif #ifdef MBS_SUPPORT # define EXTEND_BUFFER() \ do { \ US_CHAR_TYPE *old_buffer = COMPILED_BUFFER_VAR; \ int wchar_count; \ if (bufp->allocated + sizeof(US_CHAR_TYPE) > MAX_BUF_SIZE) \ return REG_ESIZE; \ bufp->allocated <<= 1; \ if (bufp->allocated > MAX_BUF_SIZE) \ bufp->allocated = MAX_BUF_SIZE; \ /* How many characters the new buffer can have? */ \ wchar_count = bufp->allocated / sizeof(US_CHAR_TYPE); \ if (wchar_count == 0) wchar_count = 1; \ /* Truncate the buffer to CHAR_TYPE align. */ \ bufp->allocated = wchar_count * sizeof(US_CHAR_TYPE); \ RETALLOC (COMPILED_BUFFER_VAR, wchar_count, US_CHAR_TYPE); \ bufp->buffer = (char*)COMPILED_BUFFER_VAR; \ if (COMPILED_BUFFER_VAR == NULL) \ return REG_ESPACE; \ /* If the buffer moved, move all the pointers into it. */ \ if (old_buffer != COMPILED_BUFFER_VAR) \ { \ int incr = COMPILED_BUFFER_VAR - old_buffer; \ MOVE_BUFFER_POINTER (b); \ MOVE_BUFFER_POINTER (begalt); \ if (fixup_alt_jump) \ MOVE_BUFFER_POINTER (fixup_alt_jump); \ if (laststart) \ MOVE_BUFFER_POINTER (laststart); \ if (pending_exact) \ MOVE_BUFFER_POINTER (pending_exact); \ } \ ELSE_EXTEND_BUFFER_HIGH_BOUND \ } while (0) #else # define EXTEND_BUFFER() \ do { \ US_CHAR_TYPE *old_buffer = COMPILED_BUFFER_VAR; \ if (bufp->allocated == MAX_BUF_SIZE) \ return REG_ESIZE; \ bufp->allocated <<= 1; \ if (bufp->allocated > MAX_BUF_SIZE) \ bufp->allocated = MAX_BUF_SIZE; \ bufp->buffer = (US_CHAR_TYPE *) REALLOC (COMPILED_BUFFER_VAR, \ bufp->allocated); \ if (COMPILED_BUFFER_VAR == NULL) \ return REG_ESPACE; \ /* If the buffer moved, move all the pointers into it. */ \ if (old_buffer != COMPILED_BUFFER_VAR) \ { \ int incr = COMPILED_BUFFER_VAR - old_buffer; \ MOVE_BUFFER_POINTER (b); \ MOVE_BUFFER_POINTER (begalt); \ if (fixup_alt_jump) \ MOVE_BUFFER_POINTER (fixup_alt_jump); \ if (laststart) \ MOVE_BUFFER_POINTER (laststart); \ if (pending_exact) \ MOVE_BUFFER_POINTER (pending_exact); \ } \ ELSE_EXTEND_BUFFER_HIGH_BOUND \ } while (0) #endif /* MBS_SUPPORT */ /* Since we have one byte reserved for the register number argument to {start,stop}_memory, the maximum number of groups we can report things about is what fits in that byte. */ #define MAX_REGNUM 255 /* But patterns can have more than `MAX_REGNUM' registers. We just ignore the excess. */ typedef unsigned regnum_t; /* Macros for the compile stack. */ /* Since offsets can go either forwards or backwards, this type needs to be able to hold values from -(MAX_BUF_SIZE - 1) to MAX_BUF_SIZE - 1. */ /* int may be not enough when sizeof(int) == 2. */ typedef long pattern_offset_t; typedef struct { pattern_offset_t begalt_offset; pattern_offset_t fixup_alt_jump; pattern_offset_t inner_group_offset; pattern_offset_t laststart_offset; regnum_t regnum; } compile_stack_elt_t; typedef struct { compile_stack_elt_t *stack; unsigned size; unsigned avail; /* Offset of next open position. */ } compile_stack_type; #define INIT_COMPILE_STACK_SIZE 32 #define COMPILE_STACK_EMPTY (compile_stack.avail == 0) #define COMPILE_STACK_FULL (compile_stack.avail == compile_stack.size) /* The next available element. */ #define COMPILE_STACK_TOP (compile_stack.stack[compile_stack.avail]) /* Set the bit for character C in a list. */ #define SET_LIST_BIT(c) \ (b[((unsigned char) (c)) / BYTEWIDTH] \ |= 1 << (((unsigned char) c) % BYTEWIDTH)) /* Get the next unsigned number in the uncompiled pattern. */ #define GET_UNSIGNED_NUMBER(num) \ { \ while (p != pend) \ { \ PATFETCH (c); \ if (c < '0' || c > '9') \ break; \ if (num <= RE_DUP_MAX) \ { \ if (num < 0) \ num = 0; \ num = num * 10 + c - '0'; \ } \ } \ } #if defined _LIBC || WIDE_CHAR_SUPPORT /* The GNU C library provides support for user-defined character classes and the functions from ISO C amendement 1. */ # ifdef CHARCLASS_NAME_MAX # define CHAR_CLASS_MAX_LENGTH CHARCLASS_NAME_MAX # else /* This shouldn't happen but some implementation might still have this problem. Use a reasonable default value. */ # define CHAR_CLASS_MAX_LENGTH 256 # endif # ifdef _LIBC # define IS_CHAR_CLASS(string) __wctype (string) # else # define IS_CHAR_CLASS(string) wctype (string) # endif #else # define CHAR_CLASS_MAX_LENGTH 6 /* Namely, `xdigit'. */ # define IS_CHAR_CLASS(string) \ (STREQ (string, "alpha") || STREQ (string, "upper") \ || STREQ (string, "lower") || STREQ (string, "digit") \ || STREQ (string, "alnum") || STREQ (string, "xdigit") \ || STREQ (string, "space") || STREQ (string, "print") \ || STREQ (string, "punct") || STREQ (string, "graph") \ || STREQ (string, "cntrl") || STREQ (string, "blank")) #endif #ifndef MATCH_MAY_ALLOCATE /* If we cannot allocate large objects within re_match_2_internal, we make the fail stack and register vectors global. The fail stack, we grow to the maximum size when a regexp is compiled. The register vectors, we adjust in size each time we compile a regexp, according to the number of registers it needs. */ static fail_stack_type fail_stack; /* Size with which the following vectors are currently allocated. That is so we can make them bigger as needed, but never make them smaller. */ static int regs_allocated_size; static const char ** regstart, ** regend; static const char ** old_regstart, ** old_regend; static const char **best_regstart, **best_regend; static register_info_type *reg_info; static const char **reg_dummy; static register_info_type *reg_info_dummy; /* Make the register vectors big enough for NUM_REGS registers, but don't make them smaller. */ static regex_grow_registers (num_regs) int num_regs; { if (num_regs > regs_allocated_size) { RETALLOC_IF (regstart, num_regs, const char *); RETALLOC_IF (regend, num_regs, const char *); RETALLOC_IF (old_regstart, num_regs, const char *); RETALLOC_IF (old_regend, num_regs, const char *); RETALLOC_IF (best_regstart, num_regs, const char *); RETALLOC_IF (best_regend, num_regs, const char *); RETALLOC_IF (reg_info, num_regs, register_info_type); RETALLOC_IF (reg_dummy, num_regs, const char *); RETALLOC_IF (reg_info_dummy, num_regs, register_info_type); regs_allocated_size = num_regs; } } #endif /* not MATCH_MAY_ALLOCATE */ static boolean group_in_compile_stack _RE_ARGS ((compile_stack_type compile_stack, regnum_t regnum)); /* `regex_compile' compiles PATTERN (of length SIZE) according to SYNTAX. Returns one of error codes defined in `regex.h', or zero for success. Assumes the `allocated' (and perhaps `buffer') and `translate' fields are set in BUFP on entry. If it succeeds, results are put in BUFP (if it returns an error, the contents of BUFP are undefined): `buffer' is the compiled pattern; `syntax' is set to SYNTAX; `used' is set to the length of the compiled pattern; `fastmap_accurate' is zero; `re_nsub' is the number of subexpressions in PATTERN; `not_bol' and `not_eol' are zero; The `fastmap' and `newline_anchor' fields are neither examined nor set. */ /* Return, freeing storage we allocated. */ #ifdef MBS_SUPPORT # define FREE_STACK_RETURN(value) \ return (free(pattern), free(mbs_offset), free(is_binary), free (compile_stack.stack), value) #else # define FREE_STACK_RETURN(value) \ return (free (compile_stack.stack), value) #endif /* MBS_SUPPORT */ static reg_errcode_t #ifdef MBS_SUPPORT regex_compile (cpattern, csize, syntax, bufp) const char *cpattern; size_t csize; #else regex_compile (pattern, size, syntax, bufp) const char *pattern; size_t size; #endif /* MBS_SUPPORT */ reg_syntax_t syntax; struct re_pattern_buffer *bufp; { /* We fetch characters from PATTERN here. Even though PATTERN is `char *' (i.e., signed), we declare these variables as unsigned, so they can be reliably used as array indices. */ register US_CHAR_TYPE c, c1; #ifdef MBS_SUPPORT /* A temporary space to keep wchar_t pattern and compiled pattern. */ CHAR_TYPE *pattern, *COMPILED_BUFFER_VAR; size_t size; /* offset buffer for optimizatoin. See convert_mbs_to_wc. */ int *mbs_offset = NULL; /* It hold whether each wchar_t is binary data or not. */ char *is_binary = NULL; /* A flag whether exactn is handling binary data or not. */ char is_exactn_bin = FALSE; #endif /* MBS_SUPPORT */ /* A random temporary spot in PATTERN. */ const CHAR_TYPE *p1; /* Points to the end of the buffer, where we should append. */ register US_CHAR_TYPE *b; /* Keeps track of unclosed groups. */ compile_stack_type compile_stack; /* Points to the current (ending) position in the pattern. */ #ifdef MBS_SUPPORT const CHAR_TYPE *p; const CHAR_TYPE *pend; #else const CHAR_TYPE *p = pattern; const CHAR_TYPE *pend = pattern + size; #endif /* MBS_SUPPORT */ /* How to translate the characters in the pattern. */ RE_TRANSLATE_TYPE translate = bufp->translate; /* Address of the count-byte of the most recently inserted `exactn' command. This makes it possible to tell if a new exact-match character can be added to that command or if the character requires a new `exactn' command. */ US_CHAR_TYPE *pending_exact = 0; /* Address of start of the most recently finished expression. This tells, e.g., postfix * where to find the start of its operand. Reset at the beginning of groups and alternatives. */ US_CHAR_TYPE *laststart = 0; /* Address of beginning of regexp, or inside of last group. */ US_CHAR_TYPE *begalt; /* Address of the place where a forward jump should go to the end of the containing expression. Each alternative of an `or' -- except the last -- ends with a forward jump of this sort. */ US_CHAR_TYPE *fixup_alt_jump = 0; /* Counts open-groups as they are encountered. Remembered for the matching close-group on the compile stack, so the same register number is put in the stop_memory as the start_memory. */ regnum_t regnum = 0; #ifdef MBS_SUPPORT /* Initialize the wchar_t PATTERN and offset_buffer. */ p = pend = pattern = TALLOC(csize + 1, CHAR_TYPE); mbs_offset = TALLOC(csize + 1, int); is_binary = TALLOC(csize + 1, char); if (pattern == NULL || mbs_offset == NULL || is_binary == NULL) { free(pattern); free(mbs_offset); free(is_binary); return REG_ESPACE; } pattern[csize] = L'\0'; /* sentinel */ size = convert_mbs_to_wcs(pattern, cpattern, csize, mbs_offset, is_binary); pend = p + size; if (size < 0) { free(pattern); free(mbs_offset); free(is_binary); return REG_BADPAT; } #endif #ifdef DEBUG DEBUG_PRINT1 ("\nCompiling pattern: "); if (debug) { unsigned debug_count; for (debug_count = 0; debug_count < size; debug_count++) PUT_CHAR (pattern[debug_count]); putchar ('\n'); } #endif /* DEBUG */ /* Initialize the compile stack. */ compile_stack.stack = TALLOC (INIT_COMPILE_STACK_SIZE, compile_stack_elt_t); if (compile_stack.stack == NULL) { #ifdef MBS_SUPPORT free(pattern); free(mbs_offset); free(is_binary); #endif return REG_ESPACE; } compile_stack.size = INIT_COMPILE_STACK_SIZE; compile_stack.avail = 0; /* Initialize the pattern buffer. */ bufp->syntax = syntax; bufp->fastmap_accurate = 0; bufp->not_bol = bufp->not_eol = 0; /* Set `used' to zero, so that if we return an error, the pattern printer (for debugging) will think there's no pattern. We reset it at the end. */ bufp->used = 0; /* Always count groups, whether or not bufp->no_sub is set. */ bufp->re_nsub = 0; #if !defined emacs && !defined SYNTAX_TABLE /* Initialize the syntax table. */ init_syntax_once (); #endif if (bufp->allocated == 0) { if (bufp->buffer) { /* If zero allocated, but buffer is non-null, try to realloc enough space. This loses if buffer's address is bogus, but that is the user's responsibility. */ #ifdef MBS_SUPPORT /* Free bufp->buffer and allocate an array for wchar_t pattern buffer. */ free(bufp->buffer); COMPILED_BUFFER_VAR = TALLOC (INIT_BUF_SIZE/sizeof(US_CHAR_TYPE), US_CHAR_TYPE); #else RETALLOC (COMPILED_BUFFER_VAR, INIT_BUF_SIZE, US_CHAR_TYPE); #endif /* MBS_SUPPORT */ } else { /* Caller did not allocate a buffer. Do it for them. */ COMPILED_BUFFER_VAR = TALLOC (INIT_BUF_SIZE / sizeof(US_CHAR_TYPE), US_CHAR_TYPE); } if (!COMPILED_BUFFER_VAR) FREE_STACK_RETURN (REG_ESPACE); #ifdef MBS_SUPPORT bufp->buffer = (char*)COMPILED_BUFFER_VAR; #endif /* MBS_SUPPORT */ bufp->allocated = INIT_BUF_SIZE; } #ifdef MBS_SUPPORT else COMPILED_BUFFER_VAR = (US_CHAR_TYPE*) bufp->buffer; #endif begalt = b = COMPILED_BUFFER_VAR; /* Loop through the uncompiled pattern until we're at the end. */ while (p != pend) { PATFETCH (c); switch (c) { case '^': { if ( /* If at start of pattern, it's an operator. */ p == pattern + 1 /* If context independent, it's an operator. */ || syntax & RE_CONTEXT_INDEP_ANCHORS /* Otherwise, depends on what's come before. */ || at_begline_loc_p (pattern, p, syntax)) BUF_PUSH (begline); else goto normal_char; } break; case '$': { if ( /* If at end of pattern, it's an operator. */ p == pend /* If context independent, it's an operator. */ || syntax & RE_CONTEXT_INDEP_ANCHORS /* Otherwise, depends on what's next. */ || at_endline_loc_p (p, pend, syntax)) BUF_PUSH (endline); else goto normal_char; } break; case '+': case '?': if ((syntax & RE_BK_PLUS_QM) || (syntax & RE_LIMITED_OPS)) goto normal_char; handle_plus: case '*': /* If there is no previous pattern... */ if (!laststart) { if (syntax & RE_CONTEXT_INVALID_OPS) FREE_STACK_RETURN (REG_BADRPT); else if (!(syntax & RE_CONTEXT_INDEP_OPS)) goto normal_char; } { /* Are we optimizing this jump? */ boolean keep_string_p = false; /* 1 means zero (many) matches is allowed. */ char zero_times_ok = 0, many_times_ok = 0; /* If there is a sequence of repetition chars, collapse it down to just one (the right one). We can't combine interval operators with these because of, e.g., `a{2}*', which should only match an even number of `a's. */ for (;;) { zero_times_ok |= c != '+'; many_times_ok |= c != '?'; if (p == pend) break; PATFETCH (c); if (c == '*' || (!(syntax & RE_BK_PLUS_QM) && (c == '+' || c == '?'))) ; else if (syntax & RE_BK_PLUS_QM && c == '\\') { if (p == pend) FREE_STACK_RETURN (REG_EESCAPE); PATFETCH (c1); if (!(c1 == '+' || c1 == '?')) { PATUNFETCH; PATUNFETCH; break; } c = c1; } else { PATUNFETCH; break; } /* If we get here, we found another repeat character. */ } /* Star, etc. applied to an empty pattern is equivalent to an empty pattern. */ if (!laststart) break; /* Now we know whether or not zero matches is allowed and also whether or not two or more matches is allowed. */ if (many_times_ok) { /* More than one repetition is allowed, so put in at the end a backward relative jump from `b' to before the next jump we're going to put in below (which jumps from laststart to after this jump). But if we are at the `*' in the exact sequence `.*\n', insert an unconditional jump backwards to the ., instead of the beginning of the loop. This way we only push a failure point once, instead of every time through the loop. */ assert (p - 1 > pattern); /* Allocate the space for the jump. */ GET_BUFFER_SPACE (1 + OFFSET_ADDRESS_SIZE); /* We know we are not at the first character of the pattern, because laststart was nonzero. And we've already incremented `p', by the way, to be the character after the `*'. Do we have to do something analogous here for null bytes, because of RE_DOT_NOT_NULL? */ if (TRANSLATE (*(p - 2)) == TRANSLATE ('.') && zero_times_ok && p < pend && TRANSLATE (*p) == TRANSLATE ('\n') && !(syntax & RE_DOT_NEWLINE)) { /* We have .*\n. */ STORE_JUMP (jump, b, laststart); keep_string_p = true; } else /* Anything else. */ STORE_JUMP (maybe_pop_jump, b, laststart - (1 + OFFSET_ADDRESS_SIZE)); /* We've added more stuff to the buffer. */ b += 1 + OFFSET_ADDRESS_SIZE; } /* On failure, jump from laststart to b + 3, which will be the end of the buffer after this jump is inserted. */ /* ifdef MBS_SUPPORT, 'b + 1 + OFFSET_ADDRESS_SIZE' instead of 'b + 3'. */ GET_BUFFER_SPACE (1 + OFFSET_ADDRESS_SIZE); INSERT_JUMP (keep_string_p ? on_failure_keep_string_jump : on_failure_jump, laststart, b + 1 + OFFSET_ADDRESS_SIZE); pending_exact = 0; b += 1 + OFFSET_ADDRESS_SIZE; if (!zero_times_ok) { /* At least one repetition is required, so insert a `dummy_failure_jump' before the initial `on_failure_jump' instruction of the loop. This effects a skip over that instruction the first time we hit that loop. */ GET_BUFFER_SPACE (1 + OFFSET_ADDRESS_SIZE); INSERT_JUMP (dummy_failure_jump, laststart, laststart + 2 + 2 * OFFSET_ADDRESS_SIZE); b += 1 + OFFSET_ADDRESS_SIZE; } } break; case '.': laststart = b; BUF_PUSH (anychar); break; case '[': { boolean had_char_class = false; #ifdef MBS_SUPPORT CHAR_TYPE range_start = 0xffffffff; #else unsigned int range_start = 0xffffffff; #endif if (p == pend) FREE_STACK_RETURN (REG_EBRACK); #ifdef MBS_SUPPORT /* We assume a charset(_not) structure as a wchar_t array. charset[0] = (re_opcode_t) charset(_not) charset[1] = l (= length of char_classes) charset[2] = m (= length of collating_symbols) charset[3] = n (= length of equivalence_classes) charset[4] = o (= length of char_ranges) charset[5] = p (= length of chars) charset[6] = char_class (wctype_t) charset[6+CHAR_CLASS_SIZE] = char_class (wctype_t) ... charset[l+5] = char_class (wctype_t) charset[l+6] = collating_symbol (wchar_t) ... charset[l+m+5] = collating_symbol (wchar_t) ifdef _LIBC we use the index if _NL_COLLATE_SYMB_EXTRAMB instead of wchar_t string. charset[l+m+6] = equivalence_classes (wchar_t) ... charset[l+m+n+5] = equivalence_classes (wchar_t) ifdef _LIBC we use the index in _NL_COLLATE_WEIGHT instead of wchar_t string. charset[l+m+n+6] = range_start charset[l+m+n+7] = range_end ... charset[l+m+n+2o+4] = range_start charset[l+m+n+2o+5] = range_end ifdef _LIBC we use the value looked up in _NL_COLLATE_COLLSEQ instead of wchar_t character. charset[l+m+n+2o+6] = char ... charset[l+m+n+2o+p+5] = char */ /* We need at least 6 spaces: the opcode, the length of char_classes, the length of collating_symbols, the length of equivalence_classes, the length of char_ranges, the length of chars. */ GET_BUFFER_SPACE (6); /* Save b as laststart. And We use laststart as the pointer to the first element of the charset here. In other words, laststart[i] indicates charset[i]. */ laststart = b; /* We test `*p == '^' twice, instead of using an if statement, so we only need one BUF_PUSH. */ BUF_PUSH (*p == '^' ? charset_not : charset); if (*p == '^') p++; /* Push the length of char_classes, the length of collating_symbols, the length of equivalence_classes, the length of char_ranges and the length of chars. */ BUF_PUSH_3 (0, 0, 0); BUF_PUSH_2 (0, 0); /* Remember the first position in the bracket expression. */ p1 = p; /* charset_not matches newline according to a syntax bit. */ if ((re_opcode_t) b[-6] == charset_not && (syntax & RE_HAT_LISTS_NOT_NEWLINE)) { BUF_PUSH('\n'); laststart[5]++; /* Update the length of characters */ } /* Read in characters and ranges, setting map bits. */ for (;;) { if (p == pend) FREE_STACK_RETURN (REG_EBRACK); PATFETCH (c); /* \ might escape characters inside [...] and [^...]. */ if ((syntax & RE_BACKSLASH_ESCAPE_IN_LISTS) && c == '\\') { if (p == pend) FREE_STACK_RETURN (REG_EESCAPE); PATFETCH (c1); BUF_PUSH(c1); laststart[5]++; /* Update the length of chars */ range_start = c1; continue; } /* Could be the end of the bracket expression. If it's not (i.e., when the bracket expression is `[]' so far), the ']' character bit gets set way below. */ if (c == ']' && p != p1 + 1) break; /* Look ahead to see if it's a range when the last thing was a character class. */ if (had_char_class && c == '-' && *p != ']') FREE_STACK_RETURN (REG_ERANGE); /* Look ahead to see if it's a range when the last thing was a character: if this is a hyphen not at the beginning or the end of a list, then it's the range operator. */ if (c == '-' && !(p - 2 >= pattern && p[-2] == '[') && !(p - 3 >= pattern && p[-3] == '[' && p[-2] == '^') && *p != ']') { reg_errcode_t ret; /* Allocate the space for range_start and range_end. */ GET_BUFFER_SPACE (2); /* Update the pointer to indicate end of buffer. */ b += 2; ret = compile_range (range_start, &p, pend, translate, syntax, b, laststart); if (ret != REG_NOERROR) FREE_STACK_RETURN (ret); range_start = 0xffffffff; } else if (p[0] == '-' && p[1] != ']') { /* This handles ranges made up of characters only. */ reg_errcode_t ret; /* Move past the `-'. */ PATFETCH (c1); /* Allocate the space for range_start and range_end. */ GET_BUFFER_SPACE (2); /* Update the pointer to indicate end of buffer. */ b += 2; ret = compile_range (c, &p, pend, translate, syntax, b, laststart); if (ret != REG_NOERROR) FREE_STACK_RETURN (ret); range_start = 0xffffffff; } /* See if we're at the beginning of a possible character class. */ else if (syntax & RE_CHAR_CLASSES && c == '[' && *p == ':') { /* Leave room for the null. */ char str[CHAR_CLASS_MAX_LENGTH + 1]; PATFETCH (c); c1 = 0; /* If pattern is `[[:'. */ if (p == pend) FREE_STACK_RETURN (REG_EBRACK); for (;;) { PATFETCH (c); if ((c == ':' && *p == ']') || p == pend) break; if (c1 < CHAR_CLASS_MAX_LENGTH) str[c1++] = c; else /* This is in any case an invalid class name. */ str[0] = '\0'; } str[c1] = '\0'; /* If isn't a word bracketed by `[:' and `:]': undo the ending character, the letters, and leave the leading `:' and `[' (but store them as character). */ if (c == ':' && *p == ']') { wctype_t wt; uintptr_t alignedp; /* Query the character class as wctype_t. */ wt = IS_CHAR_CLASS (str); if (wt == 0) FREE_STACK_RETURN (REG_ECTYPE); /* Throw away the ] at the end of the character class. */ PATFETCH (c); if (p == pend) FREE_STACK_RETURN (REG_EBRACK); /* Allocate the space for character class. */ GET_BUFFER_SPACE(CHAR_CLASS_SIZE); /* Update the pointer to indicate end of buffer. */ b += CHAR_CLASS_SIZE; /* Move data which follow character classes not to violate the data. */ insert_space(CHAR_CLASS_SIZE, laststart + 6 + laststart[1], b - 1); alignedp = ((uintptr_t)(laststart + 6 + laststart[1]) + __alignof__(wctype_t) - 1) & ~(uintptr_t)(__alignof__(wctype_t) - 1); /* Store the character class. */ *((wctype_t*)alignedp) = wt; /* Update length of char_classes */ laststart[1] += CHAR_CLASS_SIZE; had_char_class = true; } else { c1++; while (c1--) PATUNFETCH; BUF_PUSH ('['); BUF_PUSH (':'); laststart[5] += 2; /* Update the length of characters */ range_start = ':'; had_char_class = false; } } else if (syntax & RE_CHAR_CLASSES && c == '[' && (*p == '=' || *p == '.')) { CHAR_TYPE str[128]; /* Should be large enough. */ CHAR_TYPE delim = *p; /* '=' or '.' */ # ifdef _LIBC uint32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); # endif PATFETCH (c); c1 = 0; /* If pattern is `[[=' or '[[.'. */ if (p == pend) FREE_STACK_RETURN (REG_EBRACK); for (;;) { PATFETCH (c); if ((c == delim && *p == ']') || p == pend) break; if (c1 < sizeof (str) - 1) str[c1++] = c; else /* This is in any case an invalid class name. */ str[0] = '\0'; } str[c1] = '\0'; if (c == delim && *p == ']' && str[0] != '\0') { unsigned int i, offset; /* If we have no collation data we use the default collation in which each character is in a class by itself. It also means that ASCII is the character set and therefore we cannot have character with more than one byte in the multibyte representation. */ /* If not defined _LIBC, we push the name and `\0' for the sake of matching performance. */ int datasize = c1 + 1; # ifdef _LIBC int32_t idx = 0; if (nrules == 0) # endif { if (c1 != 1) FREE_STACK_RETURN (REG_ECOLLATE); } # ifdef _LIBC else { const int32_t *table; const int32_t *weights; const int32_t *extra; const int32_t *indirect; wint_t *cp; /* This #include defines a local function! */ # include <locale/weightwc.h> if(delim == '=') { /* We push the index for equivalence class. */ cp = (wint_t*)str; table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEWC); weights = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_WEIGHTWC); extra = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAWC); indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTWC); idx = findidx ((const wint_t**)&cp); if (idx == 0 || cp < (wint_t*) str + c1) /* This is no valid character. */ FREE_STACK_RETURN (REG_ECOLLATE); str[0] = (wchar_t)idx; } else /* delim == '.' */ { /* We push collation sequence value for collating symbol. */ int32_t table_size; const int32_t *symb_table; const unsigned char *extra; int32_t idx; int32_t elem; int32_t second; int32_t hash; char char_str[c1]; /* We have to convert the name to a single-byte string. This is possible since the names consist of ASCII characters and the internal representation is UCS4. */ for (i = 0; i < c1; ++i) char_str[i] = str[i]; table_size = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_SYMB_HASH_SIZEMB); symb_table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_TABLEMB); extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB); /* Locate the character in the hashing table. */ hash = elem_hash (char_str, c1); idx = 0; elem = hash % table_size; second = hash % (table_size - 2); while (symb_table[2 * elem] != 0) { /* First compare the hashing value. */ if (symb_table[2 * elem] == hash && c1 == extra[symb_table[2 * elem + 1]] && memcmp (str, &extra[symb_table[2 * elem + 1] + 1], c1) == 0) { /* Yep, this is the entry. */ idx = symb_table[2 * elem + 1]; idx += 1 + extra[idx]; break; } /* Next entry. */ elem += second; } if (symb_table[2 * elem] != 0) { /* Compute the index of the byte sequence in the table. */ idx += 1 + extra[idx]; /* Adjust for the alignment. */ idx = (idx + 3) & ~4; str[0] = (wchar_t) idx + 4; } else if (symb_table[2 * elem] == 0 && c1 == 1) { /* No valid character. Match it as a single byte character. */ had_char_class = false; BUF_PUSH(str[0]); /* Update the length of characters */ laststart[5]++; range_start = str[0]; /* Throw away the ] at the end of the collating symbol. */ PATFETCH (c); /* exit from the switch block. */ continue; } else FREE_STACK_RETURN (REG_ECOLLATE); } datasize = 1; } # endif /* Throw away the ] at the end of the equivalence class (or collating symbol). */ PATFETCH (c); /* Allocate the space for the equivalence class (or collating symbol) (and '\0' if needed). */ GET_BUFFER_SPACE(datasize); /* Update the pointer to indicate end of buffer. */ b += datasize; if (delim == '=') { /* equivalence class */ /* Calculate the offset of char_ranges, which is next to equivalence_classes. */ offset = laststart[1] + laststart[2] + laststart[3] +6; /* Insert space. */ insert_space(datasize, laststart + offset, b - 1); /* Write the equivalence_class and \0. */ for (i = 0 ; i < datasize ; i++) laststart[offset + i] = str[i]; /* Update the length of equivalence_classes. */ laststart[3] += datasize; had_char_class = true; } else /* delim == '.' */ { /* collating symbol */ /* Calculate the offset of the equivalence_classes, which is next to collating_symbols. */ offset = laststart[1] + laststart[2] + 6; /* Insert space and write the collationg_symbol and \0. */ insert_space(datasize, laststart + offset, b-1); for (i = 0 ; i < datasize ; i++) laststart[offset + i] = str[i]; /* In re_match_2_internal if range_start < -1, we assume -range_start is the offset of the collating symbol which is specified as the character of the range start. So we assign -(laststart[1] + laststart[2] + 6) to range_start. */ range_start = -(laststart[1] + laststart[2] + 6); /* Update the length of collating_symbol. */ laststart[2] += datasize; had_char_class = false; } } else { c1++; while (c1--) PATUNFETCH; BUF_PUSH ('['); BUF_PUSH (delim); laststart[5] += 2; /* Update the length of characters */ range_start = delim; had_char_class = false; } } else { had_char_class = false; BUF_PUSH(c); laststart[5]++; /* Update the length of characters */ range_start = c; } } #else /* not MBS_SUPPORT */ /* Ensure that we have enough space to push a charset: the opcode, the length count, and the bitset; 34 bytes in all. */ GET_BUFFER_SPACE (34); laststart = b; /* We test `*p == '^' twice, instead of using an if statement, so we only need one BUF_PUSH. */ BUF_PUSH (*p == '^' ? charset_not : charset); if (*p == '^') p++; /* Remember the first position in the bracket expression. */ p1 = p; /* Push the number of bytes in the bitmap. */ BUF_PUSH ((1 << BYTEWIDTH) / BYTEWIDTH); /* Clear the whole map. */ bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH); /* charset_not matches newline according to a syntax bit. */ if ((re_opcode_t) b[-2] == charset_not && (syntax & RE_HAT_LISTS_NOT_NEWLINE)) SET_LIST_BIT ('\n'); /* Read in characters and ranges, setting map bits. */ for (;;) { if (p == pend) FREE_STACK_RETURN (REG_EBRACK); PATFETCH (c); /* \ might escape characters inside [...] and [^...]. */ if ((syntax & RE_BACKSLASH_ESCAPE_IN_LISTS) && c == '\\') { if (p == pend) FREE_STACK_RETURN (REG_EESCAPE); PATFETCH (c1); SET_LIST_BIT (c1); range_start = c1; continue; } /* Could be the end of the bracket expression. If it's not (i.e., when the bracket expression is `[]' so far), the ']' character bit gets set way below. */ if (c == ']' && p != p1 + 1) break; /* Look ahead to see if it's a range when the last thing was a character class. */ if (had_char_class && c == '-' && *p != ']') FREE_STACK_RETURN (REG_ERANGE); /* Look ahead to see if it's a range when the last thing was a character: if this is a hyphen not at the beginning or the end of a list, then it's the range operator. */ if (c == '-' && !(p - 2 >= pattern && p[-2] == '[') && !(p - 3 >= pattern && p[-3] == '[' && p[-2] == '^') && *p != ']') { reg_errcode_t ret = compile_range (range_start, &p, pend, translate, syntax, b); if (ret != REG_NOERROR) FREE_STACK_RETURN (ret); range_start = 0xffffffff; } else if (p[0] == '-' && p[1] != ']') { /* This handles ranges made up of characters only. */ reg_errcode_t ret; /* Move past the `-'. */ PATFETCH (c1); ret = compile_range (c, &p, pend, translate, syntax, b); if (ret != REG_NOERROR) FREE_STACK_RETURN (ret); range_start = 0xffffffff; } /* See if we're at the beginning of a possible character class. */ else if (syntax & RE_CHAR_CLASSES && c == '[' && *p == ':') { /* Leave room for the null. */ char str[CHAR_CLASS_MAX_LENGTH + 1]; PATFETCH (c); c1 = 0; /* If pattern is `[[:'. */ if (p == pend) FREE_STACK_RETURN (REG_EBRACK); for (;;) { PATFETCH (c); if ((c == ':' && *p == ']') || p == pend) break; if (c1 < CHAR_CLASS_MAX_LENGTH) str[c1++] = c; else /* This is in any case an invalid class name. */ str[0] = '\0'; } str[c1] = '\0'; /* If isn't a word bracketed by `[:' and `:]': undo the ending character, the letters, and leave the leading `:' and `[' (but set bits for them). */ if (c == ':' && *p == ']') { # if defined _LIBC || WIDE_CHAR_SUPPORT boolean is_lower = STREQ (str, "lower"); boolean is_upper = STREQ (str, "upper"); wctype_t wt; int ch; wt = IS_CHAR_CLASS (str); if (wt == 0) FREE_STACK_RETURN (REG_ECTYPE); /* Throw away the ] at the end of the character class. */ PATFETCH (c); if (p == pend) FREE_STACK_RETURN (REG_EBRACK); for (ch = 0; ch < 1 << BYTEWIDTH; ++ch) { # ifdef _LIBC if (__iswctype (__btowc (ch), wt)) SET_LIST_BIT (ch); # else if (iswctype (btowc (ch), wt)) SET_LIST_BIT (ch); # endif if (translate && (is_upper || is_lower) && (ISUPPER (ch) || ISLOWER (ch))) SET_LIST_BIT (ch); } had_char_class = true; # else int ch; boolean is_alnum = STREQ (str, "alnum"); boolean is_alpha = STREQ (str, "alpha"); boolean is_blank = STREQ (str, "blank"); boolean is_cntrl = STREQ (str, "cntrl"); boolean is_digit = STREQ (str, "digit"); boolean is_graph = STREQ (str, "graph"); boolean is_lower = STREQ (str, "lower"); boolean is_print = STREQ (str, "print"); boolean is_punct = STREQ (str, "punct"); boolean is_space = STREQ (str, "space"); boolean is_upper = STREQ (str, "upper"); boolean is_xdigit = STREQ (str, "xdigit"); if (!IS_CHAR_CLASS (str)) FREE_STACK_RETURN (REG_ECTYPE); /* Throw away the ] at the end of the character class. */ PATFETCH (c); if (p == pend) FREE_STACK_RETURN (REG_EBRACK); for (ch = 0; ch < 1 << BYTEWIDTH; ch++) { /* This was split into 3 if's to avoid an arbitrary limit in some compiler. */ if ( (is_alnum && ISALNUM (ch)) || (is_alpha && ISALPHA (ch)) || (is_blank && ISBLANK (ch)) || (is_cntrl && ISCNTRL (ch))) SET_LIST_BIT (ch); if ( (is_digit && ISDIGIT (ch)) || (is_graph && ISGRAPH (ch)) || (is_lower && ISLOWER (ch)) || (is_print && ISPRINT (ch))) SET_LIST_BIT (ch); if ( (is_punct && ISPUNCT (ch)) || (is_space && ISSPACE (ch)) || (is_upper && ISUPPER (ch)) || (is_xdigit && ISXDIGIT (ch))) SET_LIST_BIT (ch); if ( translate && (is_upper || is_lower) && (ISUPPER (ch) || ISLOWER (ch))) SET_LIST_BIT (ch); } had_char_class = true; # endif /* libc || wctype.h */ } else { c1++; while (c1--) PATUNFETCH; SET_LIST_BIT ('['); SET_LIST_BIT (':'); range_start = ':'; had_char_class = false; } } else if (syntax & RE_CHAR_CLASSES && c == '[' && *p == '=') { unsigned char str[MB_LEN_MAX + 1]; # ifdef _LIBC uint32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); # endif PATFETCH (c); c1 = 0; /* If pattern is `[[='. */ if (p == pend) FREE_STACK_RETURN (REG_EBRACK); for (;;) { PATFETCH (c); if ((c == '=' && *p == ']') || p == pend) break; if (c1 < MB_LEN_MAX) str[c1++] = c; else /* This is in any case an invalid class name. */ str[0] = '\0'; } str[c1] = '\0'; if (c == '=' && *p == ']' && str[0] != '\0') { /* If we have no collation data we use the default collation in which each character is in a class by itself. It also means that ASCII is the character set and therefore we cannot have character with more than one byte in the multibyte representation. */ # ifdef _LIBC if (nrules == 0) # endif { if (c1 != 1) FREE_STACK_RETURN (REG_ECOLLATE); /* Throw away the ] at the end of the equivalence class. */ PATFETCH (c); /* Set the bit for the character. */ SET_LIST_BIT (str[0]); } # ifdef _LIBC else { /* Try to match the byte sequence in `str' against those known to the collate implementation. First find out whether the bytes in `str' are actually from exactly one character. */ const int32_t *table; const unsigned char *weights; const unsigned char *extra; const int32_t *indirect; int32_t idx; const unsigned char *cp = str; int ch; /* This #include defines a local function! */ # include <locale/weight.h> table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB); weights = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_WEIGHTMB); extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAMB); indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB); idx = findidx (&cp); if (idx == 0 || cp < str + c1) /* This is no valid character. */ FREE_STACK_RETURN (REG_ECOLLATE); /* Throw away the ] at the end of the equivalence class. */ PATFETCH (c); /* Now we have to go throught the whole table and find all characters which have the same first level weight. XXX Note that this is not entirely correct. we would have to match multibyte sequences but this is not possible with the current implementation. */ for (ch = 1; ch < 256; ++ch) /* XXX This test would have to be changed if we would allow matching multibyte sequences. */ if (table[ch] > 0) { int32_t idx2 = table[ch]; size_t len = weights[idx2]; /* Test whether the lenghts match. */ if (weights[idx] == len) { /* They do. New compare the bytes of the weight. */ size_t cnt = 0; while (cnt < len && (weights[idx + 1 + cnt] == weights[idx2 + 1 + cnt])) ++cnt; if (cnt == len) /* They match. Mark the character as acceptable. */ SET_LIST_BIT (ch); } } } # endif had_char_class = true; } else { c1++; while (c1--) PATUNFETCH; SET_LIST_BIT ('['); SET_LIST_BIT ('='); range_start = '='; had_char_class = false; } } else if (syntax & RE_CHAR_CLASSES && c == '[' && *p == '.') { unsigned char str[128]; /* Should be large enough. */ # ifdef _LIBC uint32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); # endif PATFETCH (c); c1 = 0; /* If pattern is `[[.'. */ if (p == pend) FREE_STACK_RETURN (REG_EBRACK); for (;;) { PATFETCH (c); if ((c == '.' && *p == ']') || p == pend) break; if (c1 < sizeof (str)) str[c1++] = c; else /* This is in any case an invalid class name. */ str[0] = '\0'; } str[c1] = '\0'; if (c == '.' && *p == ']' && str[0] != '\0') { /* If we have no collation data we use the default collation in which each character is the name for its own class which contains only the one character. It also means that ASCII is the character set and therefore we cannot have character with more than one byte in the multibyte representation. */ # ifdef _LIBC if (nrules == 0) # endif { if (c1 != 1) FREE_STACK_RETURN (REG_ECOLLATE); /* Throw away the ] at the end of the equivalence class. */ PATFETCH (c); /* Set the bit for the character. */ SET_LIST_BIT (str[0]); range_start = ((const unsigned char *) str)[0]; } # ifdef _LIBC else { /* Try to match the byte sequence in `str' against those known to the collate implementation. First find out whether the bytes in `str' are actually from exactly one character. */ int32_t table_size; const int32_t *symb_table; const unsigned char *extra; int32_t idx; int32_t elem; int32_t second; int32_t hash; table_size = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_SYMB_HASH_SIZEMB); symb_table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_TABLEMB); extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB); /* Locate the character in the hashing table. */ hash = elem_hash (str, c1); idx = 0; elem = hash % table_size; second = hash % (table_size - 2); while (symb_table[2 * elem] != 0) { /* First compare the hashing value. */ if (symb_table[2 * elem] == hash && c1 == extra[symb_table[2 * elem + 1]] && memcmp (str, &extra[symb_table[2 * elem + 1] + 1], c1) == 0) { /* Yep, this is the entry. */ idx = symb_table[2 * elem + 1]; idx += 1 + extra[idx]; break; } /* Next entry. */ elem += second; } if (symb_table[2 * elem] == 0) /* This is no valid character. */ FREE_STACK_RETURN (REG_ECOLLATE); /* Throw away the ] at the end of the equivalence class. */ PATFETCH (c); /* Now add the multibyte character(s) we found to the accept list. XXX Note that this is not entirely correct. we would have to match multibyte sequences but this is not possible with the current implementation. Also, we have to match collating symbols, which expand to more than one file, as a whole and not allow the individual bytes. */ c1 = extra[idx++]; if (c1 == 1) range_start = extra[idx]; while (c1-- > 0) { SET_LIST_BIT (extra[idx]); ++idx; } } # endif had_char_class = false; } else { c1++; while (c1--) PATUNFETCH; SET_LIST_BIT ('['); SET_LIST_BIT ('.'); range_start = '.'; had_char_class = false; } } else { had_char_class = false; SET_LIST_BIT (c); range_start = c; } } /* Discard any (non)matching list bytes that are all 0 at the end of the map. Decrease the map-length byte too. */ while ((int) b[-1] > 0 && b[b[-1] - 1] == 0) b[-1]--; b += b[-1]; #endif /* MBS_SUPPORT */ } break; case '(': if (syntax & RE_NO_BK_PARENS) goto handle_open; else goto normal_char; case ')': if (syntax & RE_NO_BK_PARENS) goto handle_close; else goto normal_char; case '\n': if (syntax & RE_NEWLINE_ALT) goto handle_alt; else goto normal_char; case '|': if (syntax & RE_NO_BK_VBAR) goto handle_alt; else goto normal_char; case '{': if (syntax & RE_INTERVALS && syntax & RE_NO_BK_BRACES) goto handle_interval; else goto normal_char; case '\\': if (p == pend) FREE_STACK_RETURN (REG_EESCAPE); /* Do not translate the character after the \, so that we can distinguish, e.g., \B from \b, even if we normally would translate, e.g., B to b. */ PATFETCH_RAW (c); switch (c) { case '(': if (syntax & RE_NO_BK_PARENS) goto normal_backslash; handle_open: bufp->re_nsub++; regnum++; if (COMPILE_STACK_FULL) { RETALLOC (compile_stack.stack, compile_stack.size << 1, compile_stack_elt_t); if (compile_stack.stack == NULL) return REG_ESPACE; compile_stack.size <<= 1; } /* These are the values to restore when we hit end of this group. They are all relative offsets, so that if the whole pattern moves because of realloc, they will still be valid. */ COMPILE_STACK_TOP.begalt_offset = begalt - COMPILED_BUFFER_VAR; COMPILE_STACK_TOP.fixup_alt_jump = fixup_alt_jump ? fixup_alt_jump - COMPILED_BUFFER_VAR + 1 : 0; COMPILE_STACK_TOP.laststart_offset = b - COMPILED_BUFFER_VAR; COMPILE_STACK_TOP.regnum = regnum; /* We will eventually replace the 0 with the number of groups inner to this one. But do not push a start_memory for groups beyond the last one we can represent in the compiled pattern. */ if (regnum <= MAX_REGNUM) { COMPILE_STACK_TOP.inner_group_offset = b - COMPILED_BUFFER_VAR + 2; BUF_PUSH_3 (start_memory, regnum, 0); } compile_stack.avail++; fixup_alt_jump = 0; laststart = 0; begalt = b; /* If we've reached MAX_REGNUM groups, then this open won't actually generate any code, so we'll have to clear pending_exact explicitly. */ pending_exact = 0; break; case ')': if (syntax & RE_NO_BK_PARENS) goto normal_backslash; if (COMPILE_STACK_EMPTY) { if (syntax & RE_UNMATCHED_RIGHT_PAREN_ORD) goto normal_backslash; else FREE_STACK_RETURN (REG_ERPAREN); } handle_close: if (fixup_alt_jump) { /* Push a dummy failure point at the end of the alternative for a possible future `pop_failure_jump' to pop. See comments at `push_dummy_failure' in `re_match_2'. */ BUF_PUSH (push_dummy_failure); /* We allocated space for this jump when we assigned to `fixup_alt_jump', in the `handle_alt' case below. */ STORE_JUMP (jump_past_alt, fixup_alt_jump, b - 1); } /* See similar code for backslashed left paren above. */ if (COMPILE_STACK_EMPTY) { if (syntax & RE_UNMATCHED_RIGHT_PAREN_ORD) goto normal_char; else FREE_STACK_RETURN (REG_ERPAREN); } /* Since we just checked for an empty stack above, this ``can't happen''. */ assert (compile_stack.avail != 0); { /* We don't just want to restore into `regnum', because later groups should continue to be numbered higher, as in `(ab)c(de)' -- the second group is #2. */ regnum_t this_group_regnum; compile_stack.avail--; begalt = COMPILED_BUFFER_VAR + COMPILE_STACK_TOP.begalt_offset; fixup_alt_jump = COMPILE_STACK_TOP.fixup_alt_jump ? COMPILED_BUFFER_VAR + COMPILE_STACK_TOP.fixup_alt_jump - 1 : 0; laststart = COMPILED_BUFFER_VAR + COMPILE_STACK_TOP.laststart_offset; this_group_regnum = COMPILE_STACK_TOP.regnum; /* If we've reached MAX_REGNUM groups, then this open won't actually generate any code, so we'll have to clear pending_exact explicitly. */ pending_exact = 0; /* We're at the end of the group, so now we know how many groups were inside this one. */ if (this_group_regnum <= MAX_REGNUM) { US_CHAR_TYPE *inner_group_loc = COMPILED_BUFFER_VAR + COMPILE_STACK_TOP.inner_group_offset; *inner_group_loc = regnum - this_group_regnum; BUF_PUSH_3 (stop_memory, this_group_regnum, regnum - this_group_regnum); } } break; case '|': /* `\|'. */ if (syntax & RE_LIMITED_OPS || syntax & RE_NO_BK_VBAR) goto normal_backslash; handle_alt: if (syntax & RE_LIMITED_OPS) goto normal_char; /* Insert before the previous alternative a jump which jumps to this alternative if the former fails. */ GET_BUFFER_SPACE (1 + OFFSET_ADDRESS_SIZE); INSERT_JUMP (on_failure_jump, begalt, b + 2 + 2 * OFFSET_ADDRESS_SIZE); pending_exact = 0; b += 1 + OFFSET_ADDRESS_SIZE; /* The alternative before this one has a jump after it which gets executed if it gets matched. Adjust that jump so it will jump to this alternative's analogous jump (put in below, which in turn will jump to the next (if any) alternative's such jump, etc.). The last such jump jumps to the correct final destination. A picture: _____ _____ | | | | | v | v a | b | c If we are at `b', then fixup_alt_jump right now points to a three-byte space after `a'. We'll put in the jump, set fixup_alt_jump to right after `b', and leave behind three bytes which we'll fill in when we get to after `c'. */ if (fixup_alt_jump) STORE_JUMP (jump_past_alt, fixup_alt_jump, b); /* Mark and leave space for a jump after this alternative, to be filled in later either by next alternative or when know we're at the end of a series of alternatives. */ fixup_alt_jump = b; GET_BUFFER_SPACE (1 + OFFSET_ADDRESS_SIZE); b += 1 + OFFSET_ADDRESS_SIZE; laststart = 0; begalt = b; break; case '{': /* If \{ is a literal. */ if (!(syntax & RE_INTERVALS) /* If we're at `\{' and it's not the open-interval operator. */ || (syntax & RE_NO_BK_BRACES)) goto normal_backslash; handle_interval: { /* If got here, then the syntax allows intervals. */ /* At least (most) this many matches must be made. */ int lower_bound = -1, upper_bound = -1; /* Place in the uncompiled pattern (i.e., just after the '{') to go back to if the interval is invalid. */ const CHAR_TYPE *beg_interval = p; if (p == pend) goto invalid_interval; GET_UNSIGNED_NUMBER (lower_bound); if (c == ',') { GET_UNSIGNED_NUMBER (upper_bound); if (upper_bound < 0) upper_bound = RE_DUP_MAX; } else /* Interval such as `{1}' => match exactly once. */ upper_bound = lower_bound; if (! (0 <= lower_bound && lower_bound <= upper_bound)) goto invalid_interval; if (!(syntax & RE_NO_BK_BRACES)) { if (c != '\\' || p == pend) goto invalid_interval; PATFETCH (c); } if (c != '}') goto invalid_interval; /* If it's invalid to have no preceding re. */ if (!laststart) { if (syntax & RE_CONTEXT_INVALID_OPS && !(syntax & RE_INVALID_INTERVAL_ORD)) FREE_STACK_RETURN (REG_BADRPT); else if (syntax & RE_CONTEXT_INDEP_OPS) laststart = b; else goto unfetch_interval; } /* We just parsed a valid interval. */ if (RE_DUP_MAX < upper_bound) FREE_STACK_RETURN (REG_BADBR); /* If the upper bound is zero, don't want to succeed at all; jump from `laststart' to `b + 3', which will be the end of the buffer after we insert the jump. */ /* ifdef MBS_SUPPORT, 'b + 1 + OFFSET_ADDRESS_SIZE' instead of 'b + 3'. */ if (upper_bound == 0) { GET_BUFFER_SPACE (1 + OFFSET_ADDRESS_SIZE); INSERT_JUMP (jump, laststart, b + 1 + OFFSET_ADDRESS_SIZE); b += 1 + OFFSET_ADDRESS_SIZE; } /* Otherwise, we have a nontrivial interval. When we're all done, the pattern will look like: set_number_at <jump count> <upper bound> set_number_at <succeed_n count> <lower bound> succeed_n <after jump addr> <succeed_n count> <body of loop> jump_n <succeed_n addr> <jump count> (The upper bound and `jump_n' are omitted if `upper_bound' is 1, though.) */ else { /* If the upper bound is > 1, we need to insert more at the end of the loop. */ unsigned nbytes = 2 + 4 * OFFSET_ADDRESS_SIZE + (upper_bound > 1) * (2 + 4 * OFFSET_ADDRESS_SIZE); GET_BUFFER_SPACE (nbytes); /* Initialize lower bound of the `succeed_n', even though it will be set during matching by its attendant `set_number_at' (inserted next), because `re_compile_fastmap' needs to know. Jump to the `jump_n' we might insert below. */ INSERT_JUMP2 (succeed_n, laststart, b + 1 + 2 * OFFSET_ADDRESS_SIZE + (upper_bound > 1) * (1 + 2 * OFFSET_ADDRESS_SIZE) , lower_bound); b += 1 + 2 * OFFSET_ADDRESS_SIZE; /* Code to initialize the lower bound. Insert before the `succeed_n'. The `5' is the last two bytes of this `set_number_at', plus 3 bytes of the following `succeed_n'. */ /* ifdef MBS_SUPPORT, The '1+2*OFFSET_ADDRESS_SIZE' is the 'set_number_at', plus '1+OFFSET_ADDRESS_SIZE' of the following `succeed_n'. */ insert_op2 (set_number_at, laststart, 1 + 2 * OFFSET_ADDRESS_SIZE, lower_bound, b); b += 1 + 2 * OFFSET_ADDRESS_SIZE; if (upper_bound > 1) { /* More than one repetition is allowed, so append a backward jump to the `succeed_n' that starts this interval. When we've reached this during matching, we'll have matched the interval once, so jump back only `upper_bound - 1' times. */ STORE_JUMP2 (jump_n, b, laststart + 2 * OFFSET_ADDRESS_SIZE + 1, upper_bound - 1); b += 1 + 2 * OFFSET_ADDRESS_SIZE; /* The location we want to set is the second parameter of the `jump_n'; that is `b-2' as an absolute address. `laststart' will be the `set_number_at' we're about to insert; `laststart+3' the number to set, the source for the relative address. But we are inserting into the middle of the pattern -- so everything is getting moved up by 5. Conclusion: (b - 2) - (laststart + 3) + 5, i.e., b - laststart. We insert this at the beginning of the loop so that if we fail during matching, we'll reinitialize the bounds. */ insert_op2 (set_number_at, laststart, b - laststart, upper_bound - 1, b); b += 1 + 2 * OFFSET_ADDRESS_SIZE; } } pending_exact = 0; break; invalid_interval: if (!(syntax & RE_INVALID_INTERVAL_ORD)) FREE_STACK_RETURN (p == pend ? REG_EBRACE : REG_BADBR); unfetch_interval: /* Match the characters as literals. */ p = beg_interval; c = '{'; if (syntax & RE_NO_BK_BRACES) goto normal_char; else goto normal_backslash; } #ifdef emacs /* There is no way to specify the before_dot and after_dot operators. rms says this is ok. --karl */ case '=': BUF_PUSH (at_dot); break; case 's': laststart = b; PATFETCH (c); BUF_PUSH_2 (syntaxspec, syntax_spec_code[c]); break; case 'S': laststart = b; PATFETCH (c); BUF_PUSH_2 (notsyntaxspec, syntax_spec_code[c]); break; #endif /* emacs */ case 'w': if (syntax & RE_NO_GNU_OPS) goto normal_char; laststart = b; BUF_PUSH (wordchar); break; case 'W': if (syntax & RE_NO_GNU_OPS) goto normal_char; laststart = b; BUF_PUSH (notwordchar); break; case '<': if (syntax & RE_NO_GNU_OPS) goto normal_char; BUF_PUSH (wordbeg); break; case '>': if (syntax & RE_NO_GNU_OPS) goto normal_char; BUF_PUSH (wordend); break; case 'b': if (syntax & RE_NO_GNU_OPS) goto normal_char; BUF_PUSH (wordbound); break; case 'B': if (syntax & RE_NO_GNU_OPS) goto normal_char; BUF_PUSH (notwordbound); break; case '`': if (syntax & RE_NO_GNU_OPS) goto normal_char; BUF_PUSH (begbuf); break; case '\'': if (syntax & RE_NO_GNU_OPS) goto normal_char; BUF_PUSH (endbuf); break; case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': if (syntax & RE_NO_BK_REFS) goto normal_char; c1 = c - '0'; if (c1 > regnum) FREE_STACK_RETURN (REG_ESUBREG); /* Can't back reference to a subexpression if inside of it. */ if (group_in_compile_stack (compile_stack, (regnum_t) c1)) goto normal_char; laststart = b; BUF_PUSH_2 (duplicate, c1); break; case '+': case '?': if (syntax & RE_BK_PLUS_QM) goto handle_plus; else goto normal_backslash; default: normal_backslash: /* You might think it would be useful for \ to mean not to translate; but if we don't translate it it will never match anything. */ c = TRANSLATE (c); goto normal_char; } break; default: /* Expects the character in `c'. */ normal_char: /* If no exactn currently being built. */ if (!pending_exact #ifdef MBS_SUPPORT /* If last exactn handle binary(or character) and new exactn handle character(or binary). */ || is_exactn_bin != is_binary[p - 1 - pattern] #endif /* MBS_SUPPORT */ /* If last exactn not at current position. */ || pending_exact + *pending_exact + 1 != b /* We have only one byte following the exactn for the count. */ || *pending_exact == (1 << BYTEWIDTH) - 1 /* If followed by a repetition operator. */ || *p == '*' || *p == '^' || ((syntax & RE_BK_PLUS_QM) ? *p == '\\' && (p[1] == '+' || p[1] == '?') : (*p == '+' || *p == '?')) || ((syntax & RE_INTERVALS) && ((syntax & RE_NO_BK_BRACES) ? *p == '{' : (p[0] == '\\' && p[1] == '{')))) { /* Start building a new exactn. */ laststart = b; #ifdef MBS_SUPPORT /* Is this exactn binary data or character? */ is_exactn_bin = is_binary[p - 1 - pattern]; if (is_exactn_bin) BUF_PUSH_2 (exactn_bin, 0); else BUF_PUSH_2 (exactn, 0); #else BUF_PUSH_2 (exactn, 0); #endif /* MBS_SUPPORT */ pending_exact = b - 1; } BUF_PUSH (c); (*pending_exact)++; break; } /* switch (c) */ } /* while p != pend */ /* Through the pattern now. */ if (fixup_alt_jump) STORE_JUMP (jump_past_alt, fixup_alt_jump, b); if (!COMPILE_STACK_EMPTY) FREE_STACK_RETURN (REG_EPAREN); /* If we don't want backtracking, force success the first time we reach the end of the compiled pattern. */ if (syntax & RE_NO_POSIX_BACKTRACKING) BUF_PUSH (succeed); #ifdef MBS_SUPPORT free (pattern); free (mbs_offset); free (is_binary); #endif free (compile_stack.stack); /* We have succeeded; set the length of the buffer. */ #ifdef MBS_SUPPORT bufp->used = (uintptr_t) b - (uintptr_t) COMPILED_BUFFER_VAR; #else bufp->used = b - bufp->buffer; #endif #ifdef DEBUG if (debug) { DEBUG_PRINT1 ("\nCompiled pattern: \n"); print_compiled_pattern (bufp); } #endif /* DEBUG */ #ifndef MATCH_MAY_ALLOCATE /* Initialize the failure stack to the largest possible stack. This isn't necessary unless we're trying to avoid calling alloca in the search and match routines. */ { int num_regs = bufp->re_nsub + 1; /* Since DOUBLE_FAIL_STACK refuses to double only if the current size is strictly greater than re_max_failures, the largest possible stack is 2 * re_max_failures failure points. */ if (fail_stack.size < (2 * re_max_failures * MAX_FAILURE_ITEMS)) { fail_stack.size = (2 * re_max_failures * MAX_FAILURE_ITEMS); # ifdef emacs if (! fail_stack.stack) fail_stack.stack = (fail_stack_elt_t *) xmalloc (fail_stack.size * sizeof (fail_stack_elt_t)); else fail_stack.stack = (fail_stack_elt_t *) xrealloc (fail_stack.stack, (fail_stack.size * sizeof (fail_stack_elt_t))); # else /* not emacs */ if (! fail_stack.stack) fail_stack.stack = (fail_stack_elt_t *) malloc (fail_stack.size * sizeof (fail_stack_elt_t)); else fail_stack.stack = (fail_stack_elt_t *) realloc (fail_stack.stack, (fail_stack.size * sizeof (fail_stack_elt_t))); # endif /* not emacs */ } regex_grow_registers (num_regs); } #endif /* not MATCH_MAY_ALLOCATE */ return REG_NOERROR; } /* regex_compile */ /* Subroutines for `regex_compile'. */ /* Store OP at LOC followed by two-byte integer parameter ARG. */ /* ifdef MBS_SUPPORT, integer parameter is 1 wchar_t. */ static void store_op1 (op, loc, arg) re_opcode_t op; US_CHAR_TYPE *loc; int arg; { *loc = (US_CHAR_TYPE) op; STORE_NUMBER (loc + 1, arg); } /* Like `store_op1', but for two two-byte parameters ARG1 and ARG2. */ /* ifdef MBS_SUPPORT, integer parameter is 1 wchar_t. */ static void store_op2 (op, loc, arg1, arg2) re_opcode_t op; US_CHAR_TYPE *loc; int arg1, arg2; { *loc = (US_CHAR_TYPE) op; STORE_NUMBER (loc + 1, arg1); STORE_NUMBER (loc + 1 + OFFSET_ADDRESS_SIZE, arg2); } /* Copy the bytes from LOC to END to open up three bytes of space at LOC for OP followed by two-byte integer parameter ARG. */ /* ifdef MBS_SUPPORT, integer parameter is 1 wchar_t. */ static void insert_op1 (op, loc, arg, end) re_opcode_t op; US_CHAR_TYPE *loc; int arg; US_CHAR_TYPE *end; { register US_CHAR_TYPE *pfrom = end; register US_CHAR_TYPE *pto = end + 1 + OFFSET_ADDRESS_SIZE; while (pfrom != loc) *--pto = *--pfrom; store_op1 (op, loc, arg); } /* Like `insert_op1', but for two two-byte parameters ARG1 and ARG2. */ /* ifdef MBS_SUPPORT, integer parameter is 1 wchar_t. */ static void insert_op2 (op, loc, arg1, arg2, end) re_opcode_t op; US_CHAR_TYPE *loc; int arg1, arg2; US_CHAR_TYPE *end; { register US_CHAR_TYPE *pfrom = end; register US_CHAR_TYPE *pto = end + 1 + 2 * OFFSET_ADDRESS_SIZE; while (pfrom != loc) *--pto = *--pfrom; store_op2 (op, loc, arg1, arg2); } /* P points to just after a ^ in PATTERN. Return true if that ^ comes after an alternative or a begin-subexpression. We assume there is at least one character before the ^. */ static boolean at_begline_loc_p (pattern, p, syntax) const CHAR_TYPE *pattern, *p; reg_syntax_t syntax; { const CHAR_TYPE *prev = p - 2; boolean prev_prev_backslash = prev > pattern && prev[-1] == '\\'; return /* After a subexpression? */ (*prev == '(' && (syntax & RE_NO_BK_PARENS || prev_prev_backslash)) /* After an alternative? */ || (*prev == '|' && (syntax & RE_NO_BK_VBAR || prev_prev_backslash)); } /* The dual of at_begline_loc_p. This one is for $. We assume there is at least one character after the $, i.e., `P < PEND'. */ static boolean at_endline_loc_p (p, pend, syntax) const CHAR_TYPE *p, *pend; reg_syntax_t syntax; { const CHAR_TYPE *next = p; boolean next_backslash = *next == '\\'; const CHAR_TYPE *next_next = p + 1 < pend ? p + 1 : 0; return /* Before a subexpression? */ (syntax & RE_NO_BK_PARENS ? *next == ')' : next_backslash && next_next && *next_next == ')') /* Before an alternative? */ || (syntax & RE_NO_BK_VBAR ? *next == '|' : next_backslash && next_next && *next_next == '|'); } /* Returns true if REGNUM is in one of COMPILE_STACK's elements and false if it's not. */ static boolean group_in_compile_stack (compile_stack, regnum) compile_stack_type compile_stack; regnum_t regnum; { int this_element; for (this_element = compile_stack.avail - 1; this_element >= 0; this_element--) if (compile_stack.stack[this_element].regnum == regnum) return true; return false; } #ifdef MBS_SUPPORT /* This insert space, which size is "num", into the pattern at "loc". "end" must point the end of the allocated buffer. */ static void insert_space (num, loc, end) int num; CHAR_TYPE *loc; CHAR_TYPE *end; { register CHAR_TYPE *pto = end; register CHAR_TYPE *pfrom = end - num; while (pfrom >= loc) *pto-- = *pfrom--; } #endif /* MBS_SUPPORT */ #ifdef MBS_SUPPORT static reg_errcode_t compile_range (range_start_char, p_ptr, pend, translate, syntax, b, char_set) CHAR_TYPE range_start_char; const CHAR_TYPE **p_ptr, *pend; CHAR_TYPE *char_set, *b; RE_TRANSLATE_TYPE translate; reg_syntax_t syntax; { const CHAR_TYPE *p = *p_ptr; CHAR_TYPE range_start, range_end; reg_errcode_t ret; # ifdef _LIBC uint32_t nrules; uint32_t start_val, end_val; # endif if (p == pend) return REG_ERANGE; # ifdef _LIBC nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); if (nrules != 0) { const char *collseq = (const char *) _NL_CURRENT(LC_COLLATE, _NL_COLLATE_COLLSEQWC); const unsigned char *extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB); if (range_start_char < -1) { /* range_start is a collating symbol. */ int32_t *wextra; /* Retreive the index and get collation sequence value. */ wextra = (int32_t*)(extra + char_set[-range_start_char]); start_val = wextra[1 + *wextra]; } else start_val = collseq_table_lookup(collseq, TRANSLATE(range_start_char)); end_val = collseq_table_lookup (collseq, TRANSLATE (p[0])); /* Report an error if the range is empty and the syntax prohibits this. */ ret = ((syntax & RE_NO_EMPTY_RANGES) && (start_val > end_val))? REG_ERANGE : REG_NOERROR; /* Insert space to the end of the char_ranges. */ insert_space(2, b - char_set[5] - 2, b - 1); *(b - char_set[5] - 2) = (wchar_t)start_val; *(b - char_set[5] - 1) = (wchar_t)end_val; char_set[4]++; /* ranges_index */ } else # endif { range_start = (range_start_char >= 0)? TRANSLATE (range_start_char): range_start_char; range_end = TRANSLATE (p[0]); /* Report an error if the range is empty and the syntax prohibits this. */ ret = ((syntax & RE_NO_EMPTY_RANGES) && (range_start > range_end))? REG_ERANGE : REG_NOERROR; /* Insert space to the end of the char_ranges. */ insert_space(2, b - char_set[5] - 2, b - 1); *(b - char_set[5] - 2) = range_start; *(b - char_set[5] - 1) = range_end; char_set[4]++; /* ranges_index */ } /* Have to increment the pointer into the pattern string, so the caller isn't still at the ending character. */ (*p_ptr)++; return ret; } #else /* Read the ending character of a range (in a bracket expression) from the uncompiled pattern *P_PTR (which ends at PEND). We assume the starting character is in `P[-2]'. (`P[-1]' is the character `-'.) Then we set the translation of all bits between the starting and ending characters (inclusive) in the compiled pattern B. Return an error code. We use these short variable names so we can use the same macros as `regex_compile' itself. */ static reg_errcode_t compile_range (range_start_char, p_ptr, pend, translate, syntax, b) unsigned int range_start_char; const char **p_ptr, *pend; RE_TRANSLATE_TYPE translate; reg_syntax_t syntax; unsigned char *b; { unsigned this_char; const char *p = *p_ptr; reg_errcode_t ret; # if _LIBC const unsigned char *collseq; unsigned int start_colseq; unsigned int end_colseq; # else unsigned end_char; # endif if (p == pend) return REG_ERANGE; /* Have to increment the pointer into the pattern string, so the caller isn't still at the ending character. */ (*p_ptr)++; /* Report an error if the range is empty and the syntax prohibits this. */ ret = syntax & RE_NO_EMPTY_RANGES ? REG_ERANGE : REG_NOERROR; # if _LIBC collseq = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQMB); start_colseq = collseq[(unsigned char) TRANSLATE (range_start_char)]; end_colseq = collseq[(unsigned char) TRANSLATE (p[0])]; for (this_char = 0; this_char <= (unsigned char) -1; ++this_char) { unsigned int this_colseq = collseq[(unsigned char) TRANSLATE (this_char)]; if (start_colseq <= this_colseq && this_colseq <= end_colseq) { SET_LIST_BIT (TRANSLATE (this_char)); ret = REG_NOERROR; } } # else /* Here we see why `this_char' has to be larger than an `unsigned char' -- we would otherwise go into an infinite loop, since all characters <= 0xff. */ range_start_char = TRANSLATE (range_start_char); /* TRANSLATE(p[0]) is casted to char (not unsigned char) in TRANSLATE, and some compilers cast it to int implicitly, so following for_loop may fall to (almost) infinite loop. e.g. If translate[p[0]] = 0xff, end_char may equals to 0xffffffff. To avoid this, we cast p[0] to unsigned int and truncate it. */ end_char = ((unsigned)TRANSLATE(p[0]) & ((1 << BYTEWIDTH) - 1)); for (this_char = range_start_char; this_char <= end_char; ++this_char) { SET_LIST_BIT (TRANSLATE (this_char)); ret = REG_NOERROR; } # endif return ret; } #endif /* MBS_SUPPORT */ /* re_compile_fastmap computes a ``fastmap'' for the compiled pattern in BUFP. A fastmap records which of the (1 << BYTEWIDTH) possible characters can start a string that matches the pattern. This fastmap is used by re_search to skip quickly over impossible starting points. The caller must supply the address of a (1 << BYTEWIDTH)-byte data area as BUFP->fastmap. We set the `fastmap', `fastmap_accurate', and `can_be_null' fields in the pattern buffer. Returns 0 if we succeed, -2 if an internal error. */ #ifdef MBS_SUPPORT /* local function for re_compile_fastmap. truncate wchar_t character to char. */ static unsigned char truncate_wchar (CHAR_TYPE c); static unsigned char truncate_wchar (c) CHAR_TYPE c; { unsigned char buf[MB_LEN_MAX]; int retval = wctomb(buf, c); return retval > 0 ? buf[0] : (unsigned char)c; } #endif /* MBS_SUPPORT */ int re_compile_fastmap (bufp) struct re_pattern_buffer *bufp; { int j, k; #ifdef MATCH_MAY_ALLOCATE fail_stack_type fail_stack; #endif #ifndef REGEX_MALLOC char *destination; #endif register char *fastmap = bufp->fastmap; #ifdef MBS_SUPPORT /* We need to cast pattern to (wchar_t*), because we casted this compiled pattern to (char*) in regex_compile. */ US_CHAR_TYPE *pattern = (US_CHAR_TYPE*)bufp->buffer; register US_CHAR_TYPE *pend = (US_CHAR_TYPE*) (bufp->buffer + bufp->used); #else US_CHAR_TYPE *pattern = bufp->buffer; register US_CHAR_TYPE *pend = pattern + bufp->used; #endif /* MBS_SUPPORT */ US_CHAR_TYPE *p = pattern; #ifdef REL_ALLOC /* This holds the pointer to the failure stack, when it is allocated relocatably. */ fail_stack_elt_t *failure_stack_ptr; #endif /* Assume that each path through the pattern can be null until proven otherwise. We set this false at the bottom of switch statement, to which we get only if a particular path doesn't match the empty string. */ boolean path_can_be_null = true; /* We aren't doing a `succeed_n' to begin with. */ boolean succeed_n_p = false; assert (fastmap != NULL && p != NULL); INIT_FAIL_STACK (); bzero (fastmap, 1 << BYTEWIDTH); /* Assume nothing's valid. */ bufp->fastmap_accurate = 1; /* It will be when we're done. */ bufp->can_be_null = 0; while (1) { if (p == pend || *p == succeed) { /* We have reached the (effective) end of pattern. */ if (!FAIL_STACK_EMPTY ()) { bufp->can_be_null |= path_can_be_null; /* Reset for next path. */ path_can_be_null = true; p = fail_stack.stack[--fail_stack.avail].pointer; continue; } else break; } /* We should never be about to go beyond the end of the pattern. */ assert (p < pend); switch (SWITCH_ENUM_CAST ((re_opcode_t) *p++)) { /* I guess the idea here is to simply not bother with a fastmap if a backreference is used, since it's too hard to figure out the fastmap for the corresponding group. Setting `can_be_null' stops `re_search_2' from using the fastmap, so that is all we do. */ case duplicate: bufp->can_be_null = 1; goto done; /* Following are the cases which match a character. These end with `break'. */ #ifdef MBS_SUPPORT case exactn: fastmap[truncate_wchar(p[1])] = 1; break; case exactn_bin: fastmap[p[1]] = 1; break; #else case exactn: fastmap[p[1]] = 1; break; #endif /* MBS_SUPPORT */ #ifdef MBS_SUPPORT /* It is hard to distinguish fastmap from (multi byte) characters which depends on current locale. */ case charset: case charset_not: case wordchar: case notwordchar: bufp->can_be_null = 1; goto done; #else case charset: for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--) if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))) fastmap[j] = 1; break; case charset_not: /* Chars beyond end of map must be allowed. */ for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++) fastmap[j] = 1; for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--) if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))) fastmap[j] = 1; break; case wordchar: for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) == Sword) fastmap[j] = 1; break; case notwordchar: for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) != Sword) fastmap[j] = 1; break; #endif case anychar: { int fastmap_newline = fastmap['\n']; /* `.' matches anything ... */ for (j = 0; j < (1 << BYTEWIDTH); j++) fastmap[j] = 1; /* ... except perhaps newline. */ if (!(bufp->syntax & RE_DOT_NEWLINE)) fastmap['\n'] = fastmap_newline; /* Return if we have already set `can_be_null'; if we have, then the fastmap is irrelevant. Something's wrong here. */ else if (bufp->can_be_null) goto done; /* Otherwise, have to check alternative paths. */ break; } #ifdef emacs case syntaxspec: k = *p++; for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) == (enum syntaxcode) k) fastmap[j] = 1; break; case notsyntaxspec: k = *p++; for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) != (enum syntaxcode) k) fastmap[j] = 1; break; /* All cases after this match the empty string. These end with `continue'. */ case before_dot: case at_dot: case after_dot: continue; #endif /* emacs */ case no_op: case begline: case endline: case begbuf: case endbuf: case wordbound: case notwordbound: case wordbeg: case wordend: case push_dummy_failure: continue; case jump_n: case pop_failure_jump: case maybe_pop_jump: case jump: case jump_past_alt: case dummy_failure_jump: EXTRACT_NUMBER_AND_INCR (j, p); p += j; if (j > 0) continue; /* Jump backward implies we just went through the body of a loop and matched nothing. Opcode jumped to should be `on_failure_jump' or `succeed_n'. Just treat it like an ordinary jump. For a * loop, it has pushed its failure point already; if so, discard that as redundant. */ if ((re_opcode_t) *p != on_failure_jump && (re_opcode_t) *p != succeed_n) continue; p++; EXTRACT_NUMBER_AND_INCR (j, p); p += j; /* If what's on the stack is where we are now, pop it. */ if (!FAIL_STACK_EMPTY () && fail_stack.stack[fail_stack.avail - 1].pointer == p) fail_stack.avail--; continue; case on_failure_jump: case on_failure_keep_string_jump: handle_on_failure_jump: EXTRACT_NUMBER_AND_INCR (j, p); /* For some patterns, e.g., `(a?)?', `p+j' here points to the end of the pattern. We don't want to push such a point, since when we restore it above, entering the switch will increment `p' past the end of the pattern. We don't need to push such a point since we obviously won't find any more fastmap entries beyond `pend'. Such a pattern can match the null string, though. */ if (p + j < pend) { if (!PUSH_PATTERN_OP (p + j, fail_stack)) { RESET_FAIL_STACK (); return -2; } } else bufp->can_be_null = 1; if (succeed_n_p) { EXTRACT_NUMBER_AND_INCR (k, p); /* Skip the n. */ succeed_n_p = false; } continue; case succeed_n: /* Get to the number of times to succeed. */ p += OFFSET_ADDRESS_SIZE; /* Increment p past the n for when k != 0. */ EXTRACT_NUMBER_AND_INCR (k, p); if (k == 0) { p -= 2 * OFFSET_ADDRESS_SIZE; succeed_n_p = true; /* Spaghetti code alert. */ goto handle_on_failure_jump; } continue; case set_number_at: p += 2 * OFFSET_ADDRESS_SIZE; continue; case start_memory: case stop_memory: p += 2; continue; default: abort (); /* We have listed all the cases. */ } /* switch *p++ */ /* Getting here means we have found the possible starting characters for one path of the pattern -- and that the empty string does not match. We need not follow this path further. Instead, look at the next alternative (remembered on the stack), or quit if no more. The test at the top of the loop does these things. */ path_can_be_null = false; p = pend; } /* while p */ /* Set `can_be_null' for the last path (also the first path, if the pattern is empty). */ bufp->can_be_null |= path_can_be_null; done: RESET_FAIL_STACK (); return 0; } /* re_compile_fastmap */ #ifdef _LIBC weak_alias (__re_compile_fastmap, re_compile_fastmap) #endif /* Set REGS to hold NUM_REGS registers, storing them in STARTS and ENDS. Subsequent matches using PATTERN_BUFFER and REGS will use this memory for recording register information. STARTS and ENDS must be allocated using the malloc library routine, and must each be at least NUM_REGS * sizeof (regoff_t) bytes long. If NUM_REGS == 0, then subsequent matches should allocate their own register data. Unless this function is called, the first search or match using PATTERN_BUFFER will allocate its own register data, without freeing the old data. */ void re_set_registers (bufp, regs, num_regs, starts, ends) struct re_pattern_buffer *bufp; struct re_registers *regs; unsigned num_regs; regoff_t *starts, *ends; { if (num_regs) { bufp->regs_allocated = REGS_REALLOCATE; regs->num_regs = num_regs; regs->start = starts; regs->end = ends; } else { bufp->regs_allocated = REGS_UNALLOCATED; regs->num_regs = 0; regs->start = regs->end = (regoff_t *) 0; } } #ifdef _LIBC weak_alias (__re_set_registers, re_set_registers) #endif /* Searching routines. */ /* Like re_search_2, below, but only one string is specified, and doesn't let you say where to stop matching. */ int re_search (bufp, string, size, startpos, range, regs) struct re_pattern_buffer *bufp; const char *string; int size, startpos, range; struct re_registers *regs; { return re_search_2 (bufp, NULL, 0, string, size, startpos, range, regs, size); } #ifdef _LIBC weak_alias (__re_search, re_search) #endif /* Using the compiled pattern in BUFP->buffer, first tries to match the virtual concatenation of STRING1 and STRING2, starting first at index STARTPOS, then at STARTPOS + 1, and so on. STRING1 and STRING2 have length SIZE1 and SIZE2, respectively. RANGE is how far to scan while trying to match. RANGE = 0 means try only at STARTPOS; in general, the last start tried is STARTPOS + RANGE. In REGS, return the indices of the virtual concatenation of STRING1 and STRING2 that matched the entire BUFP->buffer and its contained subexpressions. Do not consider matching one past the index STOP in the virtual concatenation of STRING1 and STRING2. We return either the position in the strings at which the match was found, -1 if no match, or -2 if error (such as failure stack overflow). */ int re_search_2 (bufp, string1, size1, string2, size2, startpos, range, regs, stop) struct re_pattern_buffer *bufp; const char *string1, *string2; int size1, size2; int startpos; int range; struct re_registers *regs; int stop; { int val; register char *fastmap = bufp->fastmap; register RE_TRANSLATE_TYPE translate = bufp->translate; int total_size = size1 + size2; int endpos = startpos + range; /* Check for out-of-range STARTPOS. */ if (startpos < 0 || startpos > total_size) return -1; /* Fix up RANGE if it might eventually take us outside the virtual concatenation of STRING1 and STRING2. Make sure we won't move STARTPOS below 0 or above TOTAL_SIZE. */ if (endpos < 0) range = 0 - startpos; else if (endpos > total_size) range = total_size - startpos; /* If the search isn't to be a backwards one, don't waste time in a search for a pattern that must be anchored. */ if (bufp->used > 0 && range > 0 && ((re_opcode_t) bufp->buffer[0] == begbuf /* `begline' is like `begbuf' if it cannot match at newlines. */ || ((re_opcode_t) bufp->buffer[0] == begline && !bufp->newline_anchor))) { if (startpos > 0) return -1; else range = 1; } #ifdef emacs /* In a forward search for something that starts with \=. don't keep searching past point. */ if (bufp->used > 0 && (re_opcode_t) bufp->buffer[0] == at_dot && range > 0) { range = PT - startpos; if (range <= 0) return -1; } #endif /* emacs */ /* Update the fastmap now if not correct already. */ if (fastmap && !bufp->fastmap_accurate) if (re_compile_fastmap (bufp) == -2) return -2; /* Loop through the string, looking for a place to start matching. */ for (;;) { /* If a fastmap is supplied, skip quickly over characters that cannot be the start of a match. If the pattern can match the null string, however, we don't need to skip characters; we want the first null string. */ if (fastmap && startpos < total_size && !bufp->can_be_null) { if (range > 0) /* Searching forwards. */ { register const char *d; register int lim = 0; int irange = range; if (startpos < size1 && startpos + range >= size1) lim = range - (size1 - startpos); d = (startpos >= size1 ? string2 - size1 : string1) + startpos; /* Written out as an if-else to avoid testing `translate' inside the loop. */ if (translate) while (range > lim && !fastmap[(unsigned char) translate[(unsigned char) *d++]]) range--; else while (range > lim && !fastmap[(unsigned char) *d++]) range--; startpos += irange - range; } else /* Searching backwards. */ { register CHAR_TYPE c = (size1 == 0 || startpos >= size1 ? string2[startpos - size1] : string1[startpos]); if (!fastmap[(unsigned char) TRANSLATE (c)]) goto advance; } } /* If can't match the null string, and that's all we have left, fail. */ if (range >= 0 && startpos == total_size && fastmap && !bufp->can_be_null) return -1; val = re_match_2_internal (bufp, string1, size1, string2, size2, startpos, regs, stop); #ifndef REGEX_MALLOC # ifdef C_ALLOCA alloca (0); # endif #endif if (val >= 0) return startpos; if (val == -2) return -2; advance: if (!range) break; else if (range > 0) { range--; startpos++; } else { range++; startpos--; } } return -1; } /* re_search_2 */ #ifdef _LIBC weak_alias (__re_search_2, re_search_2) #endif #ifdef MBS_SUPPORT /* This converts PTR, a pointer into one of the search wchar_t strings `string1' and `string2' into an multibyte string offset from the beginning of that string. We use mbs_offset to optimize. See convert_mbs_to_wcs. */ # define POINTER_TO_OFFSET(ptr) \ (FIRST_STRING_P (ptr) \ ? ((regoff_t)(mbs_offset1 != NULL? mbs_offset1[(ptr)-string1] : 0)) \ : ((regoff_t)((mbs_offset2 != NULL? mbs_offset2[(ptr)-string2] : 0) \ + csize1))) #else /* This converts PTR, a pointer into one of the search strings `string1' and `string2' into an offset from the beginning of that string. */ # define POINTER_TO_OFFSET(ptr) \ (FIRST_STRING_P (ptr) \ ? ((regoff_t) ((ptr) - string1)) \ : ((regoff_t) ((ptr) - string2 + size1))) #endif /* MBS_SUPPORT */ /* Macros for dealing with the split strings in re_match_2. */ #define MATCHING_IN_FIRST_STRING (dend == end_match_1) /* Call before fetching a character with *d. This switches over to string2 if necessary. */ #define PREFETCH() \ while (d == dend) \ { \ /* End of string2 => fail. */ \ if (dend == end_match_2) \ goto fail; \ /* End of string1 => advance to string2. */ \ d = string2; \ dend = end_match_2; \ } /* Test if at very beginning or at very end of the virtual concatenation of `string1' and `string2'. If only one string, it's `string2'. */ #define AT_STRINGS_BEG(d) ((d) == (size1 ? string1 : string2) || !size2) #define AT_STRINGS_END(d) ((d) == end2) /* Test if D points to a character which is word-constituent. We have two special cases to check for: if past the end of string1, look at the first character in string2; and if before the beginning of string2, look at the last character in string1. */ #ifdef MBS_SUPPORT /* Use internationalized API instead of SYNTAX. */ # define WORDCHAR_P(d) \ (iswalnum ((wint_t)((d) == end1 ? *string2 \ : (d) == string2 - 1 ? *(end1 - 1) : *(d))) != 0) #else # define WORDCHAR_P(d) \ (SYNTAX ((d) == end1 ? *string2 \ : (d) == string2 - 1 ? *(end1 - 1) : *(d)) \ == Sword) #endif /* MBS_SUPPORT */ /* Disabled due to a compiler bug -- see comment at case wordbound */ #if 0 /* Test if the character before D and the one at D differ with respect to being word-constituent. */ #define AT_WORD_BOUNDARY(d) \ (AT_STRINGS_BEG (d) || AT_STRINGS_END (d) \ || WORDCHAR_P (d - 1) != WORDCHAR_P (d)) #endif /* Free everything we malloc. */ #ifdef MATCH_MAY_ALLOCATE # define FREE_VAR(var) if (var) REGEX_FREE (var); var = NULL # ifdef MBS_SUPPORT # define FREE_VARIABLES() \ do { \ REGEX_FREE_STACK (fail_stack.stack); \ FREE_VAR (regstart); \ FREE_VAR (regend); \ FREE_VAR (old_regstart); \ FREE_VAR (old_regend); \ FREE_VAR (best_regstart); \ FREE_VAR (best_regend); \ FREE_VAR (reg_info); \ FREE_VAR (reg_dummy); \ FREE_VAR (reg_info_dummy); \ FREE_VAR (string1); \ FREE_VAR (string2); \ FREE_VAR (mbs_offset1); \ FREE_VAR (mbs_offset2); \ } while (0) # else /* not MBS_SUPPORT */ # define FREE_VARIABLES() \ do { \ REGEX_FREE_STACK (fail_stack.stack); \ FREE_VAR (regstart); \ FREE_VAR (regend); \ FREE_VAR (old_regstart); \ FREE_VAR (old_regend); \ FREE_VAR (best_regstart); \ FREE_VAR (best_regend); \ FREE_VAR (reg_info); \ FREE_VAR (reg_dummy); \ FREE_VAR (reg_info_dummy); \ } while (0) # endif /* MBS_SUPPORT */ #else # define FREE_VAR(var) if (var) free (var); var = NULL # ifdef MBS_SUPPORT # define FREE_VARIABLES() \ do { \ FREE_VAR (string1); \ FREE_VAR (string2); \ FREE_VAR (mbs_offset1); \ FREE_VAR (mbs_offset2); \ } while (0) # else # define FREE_VARIABLES() ((void)0) /* Do nothing! But inhibit gcc warning. */ # endif /* MBS_SUPPORT */ #endif /* not MATCH_MAY_ALLOCATE */ /* These values must meet several constraints. They must not be valid register values; since we have a limit of 255 registers (because we use only one byte in the pattern for the register number), we can use numbers larger than 255. They must differ by 1, because of NUM_FAILURE_ITEMS above. And the value for the lowest register must be larger than the value for the highest register, so we do not try to actually save any registers when none are active. */ #define NO_HIGHEST_ACTIVE_REG (1 << BYTEWIDTH) #define NO_LOWEST_ACTIVE_REG (NO_HIGHEST_ACTIVE_REG + 1) /* Matching routines. */ #ifndef emacs /* Emacs never uses this. */ /* re_match is like re_match_2 except it takes only a single string. */ int re_match (bufp, string, size, pos, regs) struct re_pattern_buffer *bufp; const char *string; int size, pos; struct re_registers *regs; { int result = re_match_2_internal (bufp, NULL, 0, string, size, pos, regs, size); # ifndef REGEX_MALLOC # ifdef C_ALLOCA alloca (0); # endif # endif return result; } # ifdef _LIBC weak_alias (__re_match, re_match) # endif #endif /* not emacs */ static boolean group_match_null_string_p _RE_ARGS ((US_CHAR_TYPE **p, US_CHAR_TYPE *end, register_info_type *reg_info)); static boolean alt_match_null_string_p _RE_ARGS ((US_CHAR_TYPE *p, US_CHAR_TYPE *end, register_info_type *reg_info)); static boolean common_op_match_null_string_p _RE_ARGS ((US_CHAR_TYPE **p, US_CHAR_TYPE *end, register_info_type *reg_info)); static int bcmp_translate _RE_ARGS ((const CHAR_TYPE *s1, const CHAR_TYPE *s2, int len, char *translate)); /* re_match_2 matches the compiled pattern in BUFP against the the (virtual) concatenation of STRING1 and STRING2 (of length SIZE1 and SIZE2, respectively). We start matching at POS, and stop matching at STOP. If REGS is non-null and the `no_sub' field of BUFP is nonzero, we store offsets for the substring each group matched in REGS. See the documentation for exactly how many groups we fill. We return -1 if no match, -2 if an internal error (such as the failure stack overflowing). Otherwise, we return the length of the matched substring. */ int re_match_2 (bufp, string1, size1, string2, size2, pos, regs, stop) struct re_pattern_buffer *bufp; const char *string1, *string2; int size1, size2; int pos; struct re_registers *regs; int stop; { int result = re_match_2_internal (bufp, string1, size1, string2, size2, pos, regs, stop); #ifndef REGEX_MALLOC # ifdef C_ALLOCA alloca (0); # endif #endif return result; } #ifdef _LIBC weak_alias (__re_match_2, re_match_2) #endif #ifdef MBS_SUPPORT static int count_mbs_length PARAMS ((int *, int)); /* This check the substring (from 0, to length) of the multibyte string, to which offset_buffer correspond. And count how many wchar_t_characters the substring occupy. We use offset_buffer to optimization. See convert_mbs_to_wcs. */ static int count_mbs_length(offset_buffer, length) int *offset_buffer; int length; { int wcs_size; /* Check whether the size is valid. */ if (length < 0) return -1; if (offset_buffer == NULL) return 0; for (wcs_size = 0 ; offset_buffer[wcs_size] != -1 ; wcs_size++) { if (offset_buffer[wcs_size] == length) return wcs_size; if (offset_buffer[wcs_size] > length) /* It is a fragment of a wide character. */ return -1; } /* We reached at the sentinel. */ return -1; } #endif /* MBS_SUPPORT */ /* This is a separate function so that we can force an alloca cleanup afterwards. */ static int #ifdef MBS_SUPPORT re_match_2_internal (bufp, cstring1, csize1, cstring2, csize2, pos, regs, stop) struct re_pattern_buffer *bufp; const char *cstring1, *cstring2; int csize1, csize2; #else re_match_2_internal (bufp, string1, size1, string2, size2, pos, regs, stop) struct re_pattern_buffer *bufp; const char *string1, *string2; int size1, size2; #endif int pos; struct re_registers *regs; int stop; { /* General temporaries. */ int mcnt; US_CHAR_TYPE *p1; #ifdef MBS_SUPPORT /* We need wchar_t* buffers correspond to string1, string2. */ CHAR_TYPE *string1 = NULL, *string2 = NULL; /* We need the size of wchar_t buffers correspond to csize1, csize2. */ int size1 = 0, size2 = 0; /* offset buffer for optimizatoin. See convert_mbs_to_wc. */ int *mbs_offset1 = NULL, *mbs_offset2 = NULL; /* They hold whether each wchar_t is binary data or not. */ char *is_binary = NULL; #endif /* MBS_SUPPORT */ /* Just past the end of the corresponding string. */ const CHAR_TYPE *end1, *end2; /* Pointers into string1 and string2, just past the last characters in each to consider matching. */ const CHAR_TYPE *end_match_1, *end_match_2; /* Where we are in the data, and the end of the current string. */ const CHAR_TYPE *d, *dend; /* Where we are in the pattern, and the end of the pattern. */ #ifdef MBS_SUPPORT US_CHAR_TYPE *pattern, *p; register US_CHAR_TYPE *pend; #else US_CHAR_TYPE *p = bufp->buffer; register US_CHAR_TYPE *pend = p + bufp->used; #endif /* MBS_SUPPORT */ /* Mark the opcode just after a start_memory, so we can test for an empty subpattern when we get to the stop_memory. */ US_CHAR_TYPE *just_past_start_mem = 0; /* We use this to map every character in the string. */ RE_TRANSLATE_TYPE translate = bufp->translate; /* Failure point stack. Each place that can handle a failure further down the line pushes a failure point on this stack. It consists of restart, regend, and reg_info for all registers corresponding to the subexpressions we're currently inside, plus the number of such registers, and, finally, two char *'s. The first char * is where to resume scanning the pattern; the second one is where to resume scanning the strings. If the latter is zero, the failure point is a ``dummy''; if a failure happens and the failure point is a dummy, it gets discarded and the next next one is tried. */ #ifdef MATCH_MAY_ALLOCATE /* otherwise, this is global. */ fail_stack_type fail_stack; #endif #ifdef DEBUG static unsigned failure_id; unsigned nfailure_points_pushed = 0, nfailure_points_popped = 0; #endif #ifdef REL_ALLOC /* This holds the pointer to the failure stack, when it is allocated relocatably. */ fail_stack_elt_t *failure_stack_ptr; #endif /* We fill all the registers internally, independent of what we return, for use in backreferences. The number here includes an element for register zero. */ size_t num_regs = bufp->re_nsub + 1; /* The currently active registers. */ active_reg_t lowest_active_reg = NO_LOWEST_ACTIVE_REG; active_reg_t highest_active_reg = NO_HIGHEST_ACTIVE_REG; /* Information on the contents of registers. These are pointers into the input strings; they record just what was matched (on this attempt) by a subexpression part of the pattern, that is, the regnum-th regstart pointer points to where in the pattern we began matching and the regnum-th regend points to right after where we stopped matching the regnum-th subexpression. (The zeroth register keeps track of what the whole pattern matches.) */ #ifdef MATCH_MAY_ALLOCATE /* otherwise, these are global. */ const CHAR_TYPE **regstart, **regend; #endif /* If a group that's operated upon by a repetition operator fails to match anything, then the register for its start will need to be restored because it will have been set to wherever in the string we are when we last see its open-group operator. Similarly for a register's end. */ #ifdef MATCH_MAY_ALLOCATE /* otherwise, these are global. */ const CHAR_TYPE **old_regstart, **old_regend; #endif /* The is_active field of reg_info helps us keep track of which (possibly nested) subexpressions we are currently in. The matched_something field of reg_info[reg_num] helps us tell whether or not we have matched any of the pattern so far this time through the reg_num-th subexpression. These two fields get reset each time through any loop their register is in. */ #ifdef MATCH_MAY_ALLOCATE /* otherwise, this is global. */ register_info_type *reg_info; #endif /* The following record the register info as found in the above variables when we find a match better than any we've seen before. This happens as we backtrack through the failure points, which in turn happens only if we have not yet matched the entire string. */ unsigned best_regs_set = false; #ifdef MATCH_MAY_ALLOCATE /* otherwise, these are global. */ const CHAR_TYPE **best_regstart, **best_regend; #endif /* Logically, this is `best_regend[0]'. But we don't want to have to allocate space for that if we're not allocating space for anything else (see below). Also, we never need info about register 0 for any of the other register vectors, and it seems rather a kludge to treat `best_regend' differently than the rest. So we keep track of the end of the best match so far in a separate variable. We initialize this to NULL so that when we backtrack the first time and need to test it, it's not garbage. */ const CHAR_TYPE *match_end = NULL; /* This helps SET_REGS_MATCHED avoid doing redundant work. */ int set_regs_matched_done = 0; /* Used when we pop values we don't care about. */ #ifdef MATCH_MAY_ALLOCATE /* otherwise, these are global. */ const CHAR_TYPE **reg_dummy; register_info_type *reg_info_dummy; #endif #ifdef DEBUG /* Counts the total number of registers pushed. */ unsigned num_regs_pushed = 0; #endif DEBUG_PRINT1 ("\n\nEntering re_match_2.\n"); INIT_FAIL_STACK (); #ifdef MATCH_MAY_ALLOCATE /* Do not bother to initialize all the register variables if there are no groups in the pattern, as it takes a fair amount of time. If there are groups, we include space for register 0 (the whole pattern), even though we never use it, since it simplifies the array indexing. We should fix this. */ if (bufp->re_nsub) { regstart = REGEX_TALLOC (num_regs, const CHAR_TYPE *); regend = REGEX_TALLOC (num_regs, const CHAR_TYPE *); old_regstart = REGEX_TALLOC (num_regs, const CHAR_TYPE *); old_regend = REGEX_TALLOC (num_regs, const CHAR_TYPE *); best_regstart = REGEX_TALLOC (num_regs, const CHAR_TYPE *); best_regend = REGEX_TALLOC (num_regs, const CHAR_TYPE *); reg_info = REGEX_TALLOC (num_regs, register_info_type); reg_dummy = REGEX_TALLOC (num_regs, const CHAR_TYPE *); reg_info_dummy = REGEX_TALLOC (num_regs, register_info_type); if (!(regstart && regend && old_regstart && old_regend && reg_info && best_regstart && best_regend && reg_dummy && reg_info_dummy)) { FREE_VARIABLES (); return -2; } } else { /* We must initialize all our variables to NULL, so that `FREE_VARIABLES' doesn't try to free them. */ regstart = regend = old_regstart = old_regend = best_regstart = best_regend = reg_dummy = NULL; reg_info = reg_info_dummy = (register_info_type *) NULL; } #endif /* MATCH_MAY_ALLOCATE */ /* The starting position is bogus. */ #ifdef MBS_SUPPORT if (pos < 0 || pos > csize1 + csize2) #else if (pos < 0 || pos > size1 + size2) #endif { FREE_VARIABLES (); return -1; } #ifdef MBS_SUPPORT /* Allocate wchar_t array for string1 and string2 and fill them with converted string. */ if (csize1 != 0) { string1 = REGEX_TALLOC (csize1 + 1, CHAR_TYPE); mbs_offset1 = REGEX_TALLOC (csize1 + 1, int); is_binary = REGEX_TALLOC (csize1 + 1, char); if (!string1 || !mbs_offset1 || !is_binary) { FREE_VAR (string1); FREE_VAR (mbs_offset1); FREE_VAR (is_binary); return -2; } size1 = convert_mbs_to_wcs(string1, cstring1, csize1, mbs_offset1, is_binary); string1[size1] = L'\0'; /* for a sentinel */ FREE_VAR (is_binary); } if (csize2 != 0) { string2 = REGEX_TALLOC (csize2 + 1, CHAR_TYPE); mbs_offset2 = REGEX_TALLOC (csize2 + 1, int); is_binary = REGEX_TALLOC (csize2 + 1, char); if (!string2 || !mbs_offset2 || !is_binary) { FREE_VAR (string1); FREE_VAR (mbs_offset1); FREE_VAR (string2); FREE_VAR (mbs_offset2); FREE_VAR (is_binary); return -2; } size2 = convert_mbs_to_wcs(string2, cstring2, csize2, mbs_offset2, is_binary); string2[size2] = L'\0'; /* for a sentinel */ FREE_VAR (is_binary); } /* We need to cast pattern to (wchar_t*), because we casted this compiled pattern to (char*) in regex_compile. */ p = pattern = (CHAR_TYPE*)bufp->buffer; pend = (CHAR_TYPE*)(bufp->buffer + bufp->used); #endif /* MBS_SUPPORT */ /* Initialize subexpression text positions to -1 to mark ones that no start_memory/stop_memory has been seen for. Also initialize the register information struct. */ for (mcnt = 1; (unsigned) mcnt < num_regs; mcnt++) { regstart[mcnt] = regend[mcnt] = old_regstart[mcnt] = old_regend[mcnt] = REG_UNSET_VALUE; REG_MATCH_NULL_STRING_P (reg_info[mcnt]) = MATCH_NULL_UNSET_VALUE; IS_ACTIVE (reg_info[mcnt]) = 0; MATCHED_SOMETHING (reg_info[mcnt]) = 0; EVER_MATCHED_SOMETHING (reg_info[mcnt]) = 0; } /* We move `string1' into `string2' if the latter's empty -- but not if `string1' is null. */ if (size2 == 0 && string1 != NULL) { string2 = string1; size2 = size1; string1 = 0; size1 = 0; } end1 = string1 + size1; end2 = string2 + size2; /* Compute where to stop matching, within the two strings. */ #ifdef MBS_SUPPORT if (stop <= csize1) { mcnt = count_mbs_length(mbs_offset1, stop); end_match_1 = string1 + mcnt; end_match_2 = string2; } else { end_match_1 = end1; mcnt = count_mbs_length(mbs_offset2, stop-csize1); end_match_2 = string2 + mcnt; } if (mcnt < 0) { /* count_mbs_length return error. */ FREE_VARIABLES (); return -1; } #else if (stop <= size1) { end_match_1 = string1 + stop; end_match_2 = string2; } else { end_match_1 = end1; end_match_2 = string2 + stop - size1; } #endif /* MBS_SUPPORT */ /* `p' scans through the pattern as `d' scans through the data. `dend' is the end of the input string that `d' points within. `d' is advanced into the following input string whenever necessary, but this happens before fetching; therefore, at the beginning of the loop, `d' can be pointing at the end of a string, but it cannot equal `string2'. */ #ifdef MBS_SUPPORT if (size1 > 0 && pos <= csize1) { mcnt = count_mbs_length(mbs_offset1, pos); d = string1 + mcnt; dend = end_match_1; } else { mcnt = count_mbs_length(mbs_offset2, pos-csize1); d = string2 + mcnt; dend = end_match_2; } if (mcnt < 0) { /* count_mbs_length return error. */ FREE_VARIABLES (); return -1; } #else if (size1 > 0 && pos <= size1) { d = string1 + pos; dend = end_match_1; } else { d = string2 + pos - size1; dend = end_match_2; } #endif /* MBS_SUPPORT */ DEBUG_PRINT1 ("The compiled pattern is:\n"); DEBUG_PRINT_COMPILED_PATTERN (bufp, p, pend); DEBUG_PRINT1 ("The string to match is: `"); DEBUG_PRINT_DOUBLE_STRING (d, string1, size1, string2, size2); DEBUG_PRINT1 ("'\n"); /* This loops over pattern commands. It exits by returning from the function if the match is complete, or it drops through if the match fails at this starting point in the input data. */ for (;;) { #ifdef _LIBC DEBUG_PRINT2 ("\n%p: ", p); #else DEBUG_PRINT2 ("\n0x%x: ", p); #endif if (p == pend) { /* End of pattern means we might have succeeded. */ DEBUG_PRINT1 ("end of pattern ... "); /* If we haven't matched the entire string, and we want the longest match, try backtracking. */ if (d != end_match_2) { /* 1 if this match ends in the same string (string1 or string2) as the best previous match. */ boolean same_str_p = (FIRST_STRING_P (match_end) == MATCHING_IN_FIRST_STRING); /* 1 if this match is the best seen so far. */ boolean best_match_p; /* AIX compiler got confused when this was combined with the previous declaration. */ if (same_str_p) best_match_p = d > match_end; else best_match_p = !MATCHING_IN_FIRST_STRING; DEBUG_PRINT1 ("backtracking.\n"); if (!FAIL_STACK_EMPTY ()) { /* More failure points to try. */ /* If exceeds best match so far, save it. */ if (!best_regs_set || best_match_p) { best_regs_set = true; match_end = d; DEBUG_PRINT1 ("\nSAVING match as best so far.\n"); for (mcnt = 1; (unsigned) mcnt < num_regs; mcnt++) { best_regstart[mcnt] = regstart[mcnt]; best_regend[mcnt] = regend[mcnt]; } } goto fail; } /* If no failure points, don't restore garbage. And if last match is real best match, don't restore second best one. */ else if (best_regs_set && !best_match_p) { restore_best_regs: /* Restore best match. It may happen that `dend == end_match_1' while the restored d is in string2. For example, the pattern `x.*y.*z' against the strings `x-' and `y-z-', if the two strings are not consecutive in memory. */ DEBUG_PRINT1 ("Restoring best registers.\n"); d = match_end; dend = ((d >= string1 && d <= end1) ? end_match_1 : end_match_2); for (mcnt = 1; (unsigned) mcnt < num_regs; mcnt++) { regstart[mcnt] = best_regstart[mcnt]; regend[mcnt] = best_regend[mcnt]; } } } /* d != end_match_2 */ succeed_label: DEBUG_PRINT1 ("Accepting match.\n"); /* If caller wants register contents data back, do it. */ if (regs && !bufp->no_sub) { /* Have the register data arrays been allocated? */ if (bufp->regs_allocated == REGS_UNALLOCATED) { /* No. So allocate them with malloc. We need one extra element beyond `num_regs' for the `-1' marker GNU code uses. */ regs->num_regs = MAX (RE_NREGS, num_regs + 1); regs->start = TALLOC (regs->num_regs, regoff_t); regs->end = TALLOC (regs->num_regs, regoff_t); if (regs->start == NULL || regs->end == NULL) { FREE_VARIABLES (); return -2; } bufp->regs_allocated = REGS_REALLOCATE; } else if (bufp->regs_allocated == REGS_REALLOCATE) { /* Yes. If we need more elements than were already allocated, reallocate them. If we need fewer, just leave it alone. */ if (regs->num_regs < num_regs + 1) { regs->num_regs = num_regs + 1; RETALLOC (regs->start, regs->num_regs, regoff_t); RETALLOC (regs->end, regs->num_regs, regoff_t); if (regs->start == NULL || regs->end == NULL) { FREE_VARIABLES (); return -2; } } } else { /* These braces fend off a "empty body in an else-statement" warning under GCC when assert expands to nothing. */ assert (bufp->regs_allocated == REGS_FIXED); } /* Convert the pointer data in `regstart' and `regend' to indices. Register zero has to be set differently, since we haven't kept track of any info for it. */ if (regs->num_regs > 0) { regs->start[0] = pos; #ifdef MBS_SUPPORT if (MATCHING_IN_FIRST_STRING) regs->end[0] = mbs_offset1 != NULL ? mbs_offset1[d-string1] : 0; else regs->end[0] = csize1 + (mbs_offset2 != NULL ? mbs_offset2[d-string2] : 0); #else regs->end[0] = (MATCHING_IN_FIRST_STRING ? ((regoff_t) (d - string1)) : ((regoff_t) (d - string2 + size1))); #endif /* MBS_SUPPORT */ } /* Go through the first `min (num_regs, regs->num_regs)' registers, since that is all we initialized. */ for (mcnt = 1; (unsigned) mcnt < MIN (num_regs, regs->num_regs); mcnt++) { if (REG_UNSET (regstart[mcnt]) || REG_UNSET (regend[mcnt])) regs->start[mcnt] = regs->end[mcnt] = -1; else { regs->start[mcnt] = (regoff_t) POINTER_TO_OFFSET (regstart[mcnt]); regs->end[mcnt] = (regoff_t) POINTER_TO_OFFSET (regend[mcnt]); } } /* If the regs structure we return has more elements than were in the pattern, set the extra elements to -1. If we (re)allocated the registers, this is the case, because we always allocate enough to have at least one -1 at the end. */ for (mcnt = num_regs; (unsigned) mcnt < regs->num_regs; mcnt++) regs->start[mcnt] = regs->end[mcnt] = -1; } /* regs && !bufp->no_sub */ DEBUG_PRINT4 ("%u failure points pushed, %u popped (%u remain).\n", nfailure_points_pushed, nfailure_points_popped, nfailure_points_pushed - nfailure_points_popped); DEBUG_PRINT2 ("%u registers pushed.\n", num_regs_pushed); #ifdef MBS_SUPPORT if (MATCHING_IN_FIRST_STRING) mcnt = mbs_offset1 != NULL ? mbs_offset1[d-string1] : 0; else mcnt = (mbs_offset2 != NULL ? mbs_offset2[d-string2] : 0) + csize1; mcnt -= pos; #else mcnt = d - pos - (MATCHING_IN_FIRST_STRING ? string1 : string2 - size1); #endif /* MBS_SUPPORT */ DEBUG_PRINT2 ("Returning %d from re_match_2.\n", mcnt); FREE_VARIABLES (); return mcnt; } /* Otherwise match next pattern command. */ switch (SWITCH_ENUM_CAST ((re_opcode_t) *p++)) { /* Ignore these. Used to ignore the n of succeed_n's which currently have n == 0. */ case no_op: DEBUG_PRINT1 ("EXECUTING no_op.\n"); break; case succeed: DEBUG_PRINT1 ("EXECUTING succeed.\n"); goto succeed_label; /* Match the next n pattern characters exactly. The following byte in the pattern defines n, and the n bytes after that are the characters to match. */ case exactn: #ifdef MBS_SUPPORT case exactn_bin: #endif mcnt = *p++; DEBUG_PRINT2 ("EXECUTING exactn %d.\n", mcnt); /* This is written out as an if-else so we don't waste time testing `translate' inside the loop. */ if (translate) { do { PREFETCH (); #ifdef MBS_SUPPORT if (*d <= 0xff) { if ((US_CHAR_TYPE) translate[(unsigned char) *d++] != (US_CHAR_TYPE) *p++) goto fail; } else { if (*d++ != (CHAR_TYPE) *p++) goto fail; } #else if ((US_CHAR_TYPE) translate[(unsigned char) *d++] != (US_CHAR_TYPE) *p++) goto fail; #endif /* MBS_SUPPORT */ } while (--mcnt); } else { do { PREFETCH (); if (*d++ != (CHAR_TYPE) *p++) goto fail; } while (--mcnt); } SET_REGS_MATCHED (); break; /* Match any character except possibly a newline or a null. */ case anychar: DEBUG_PRINT1 ("EXECUTING anychar.\n"); PREFETCH (); if ((!(bufp->syntax & RE_DOT_NEWLINE) && TRANSLATE (*d) == '\n') || (bufp->syntax & RE_DOT_NOT_NULL && TRANSLATE (*d) == '\000')) goto fail; SET_REGS_MATCHED (); DEBUG_PRINT2 (" Matched `%ld'.\n", (long int) *d); d++; break; case charset: case charset_not: { register US_CHAR_TYPE c; #ifdef MBS_SUPPORT unsigned int i, char_class_length, coll_symbol_length, equiv_class_length, ranges_length, chars_length, length; CHAR_TYPE *workp, *workp2, *charset_top; #define WORK_BUFFER_SIZE 128 CHAR_TYPE str_buf[WORK_BUFFER_SIZE]; # ifdef _LIBC uint32_t nrules; # endif /* _LIBC */ #endif /* MBS_SUPPORT */ boolean not = (re_opcode_t) *(p - 1) == charset_not; DEBUG_PRINT2 ("EXECUTING charset%s.\n", not ? "_not" : ""); PREFETCH (); c = TRANSLATE (*d); /* The character to match. */ #ifdef MBS_SUPPORT # ifdef _LIBC nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); # endif /* _LIBC */ charset_top = p - 1; char_class_length = *p++; coll_symbol_length = *p++; equiv_class_length = *p++; ranges_length = *p++; chars_length = *p++; /* p points charset[6], so the address of the next instruction (charset[l+m+n+2o+k+p']) equals p[l+m+n+2*o+p'], where l=length of char_classes, m=length of collating_symbol, n=equivalence_class, o=length of char_range, p'=length of character. */ workp = p; /* Update p to indicate the next instruction. */ p += char_class_length + coll_symbol_length+ equiv_class_length + 2*ranges_length + chars_length; /* match with char_class? */ for (i = 0; i < char_class_length ; i += CHAR_CLASS_SIZE) { wctype_t wctype; uintptr_t alignedp = ((uintptr_t)workp + __alignof__(wctype_t) - 1) & ~(uintptr_t)(__alignof__(wctype_t) - 1); wctype = *((wctype_t*)alignedp); workp += CHAR_CLASS_SIZE; if (iswctype((wint_t)c, wctype)) goto char_set_matched; } /* match with collating_symbol? */ # ifdef _LIBC if (nrules != 0) { const unsigned char *extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB); for (workp2 = workp + coll_symbol_length ; workp < workp2 ; workp++) { int32_t *wextra; wextra = (int32_t*)(extra + *workp++); for (i = 0; i < *wextra; ++i) if (TRANSLATE(d[i]) != wextra[1 + i]) break; if (i == *wextra) { /* Update d, however d will be incremented at char_set_matched:, we decrement d here. */ d += i - 1; goto char_set_matched; } } } else /* (nrules == 0) */ # endif /* If we can't look up collation data, we use wcscoll instead. */ { for (workp2 = workp + coll_symbol_length ; workp < workp2 ;) { const CHAR_TYPE *backup_d = d, *backup_dend = dend; length = wcslen(workp); /* If wcscoll(the collating symbol, whole string) > 0, any substring of the string never match with the collating symbol. */ if (wcscoll(workp, d) > 0) { workp += length + 1; continue; } /* First, we compare the collating symbol with the first character of the string. If it don't match, we add the next character to the compare buffer in turn. */ for (i = 0 ; i < WORK_BUFFER_SIZE-1 ; i++, d++) { int match; if (d == dend) { if (dend == end_match_2) break; d = string2; dend = end_match_2; } /* add next character to the compare buffer. */ str_buf[i] = TRANSLATE(*d); str_buf[i+1] = '\0'; match = wcscoll(workp, str_buf); if (match == 0) goto char_set_matched; if (match < 0) /* (str_buf > workp) indicate (str_buf + X > workp), because for all X (str_buf + X > str_buf). So we don't need continue this loop. */ break; /* Otherwise(str_buf < workp), (str_buf+next_character) may equals (workp). So we continue this loop. */ } /* not matched */ d = backup_d; dend = backup_dend; workp += length + 1; } } /* match with equivalence_class? */ # ifdef _LIBC if (nrules != 0) { const CHAR_TYPE *backup_d = d, *backup_dend = dend; /* Try to match the equivalence class against those known to the collate implementation. */ const int32_t *table; const int32_t *weights; const int32_t *extra; const int32_t *indirect; int32_t idx, idx2; wint_t *cp; size_t len; /* This #include defines a local function! */ # include <locale/weightwc.h> table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEWC); weights = (const wint_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_WEIGHTWC); extra = (const wint_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAWC); indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTWC); /* Write 1 collating element to str_buf, and get its index. */ idx2 = 0; for (i = 0 ; idx2 == 0 && i < WORK_BUFFER_SIZE - 1; i++) { cp = (wint_t*)str_buf; if (d == dend) { if (dend == end_match_2) break; d = string2; dend = end_match_2; } str_buf[i] = TRANSLATE(*(d+i)); str_buf[i+1] = '\0'; /* sentinel */ idx2 = findidx ((const wint_t**)&cp); } /* Update d, however d will be incremented at char_set_matched:, we decrement d here. */ d = backup_d + ((wchar_t*)cp - (wchar_t*)str_buf - 1); if (d >= dend) { if (dend == end_match_2) d = dend; else { d = string2; dend = end_match_2; } } len = weights[idx2]; for (workp2 = workp + equiv_class_length ; workp < workp2 ; workp++) { idx = (int32_t)*workp; /* We already checked idx != 0 in regex_compile. */ if (idx2 != 0 && len == weights[idx]) { int cnt = 0; while (cnt < len && (weights[idx + 1 + cnt] == weights[idx2 + 1 + cnt])) ++cnt; if (cnt == len) goto char_set_matched; } } /* not matched */ d = backup_d; dend = backup_dend; } else /* (nrules == 0) */ # endif /* If we can't look up collation data, we use wcscoll instead. */ { for (workp2 = workp + equiv_class_length ; workp < workp2 ;) { const CHAR_TYPE *backup_d = d, *backup_dend = dend; length = wcslen(workp); /* If wcscoll(the collating symbol, whole string) > 0, any substring of the string never match with the collating symbol. */ if (wcscoll(workp, d) > 0) { workp += length + 1; break; } /* First, we compare the equivalence class with the first character of the string. If it don't match, we add the next character to the compare buffer in turn. */ for (i = 0 ; i < WORK_BUFFER_SIZE - 1 ; i++, d++) { int match; if (d == dend) { if (dend == end_match_2) break; d = string2; dend = end_match_2; } /* add next character to the compare buffer. */ str_buf[i] = TRANSLATE(*d); str_buf[i+1] = '\0'; match = wcscoll(workp, str_buf); if (match == 0) goto char_set_matched; if (match < 0) /* (str_buf > workp) indicate (str_buf + X > workp), because for all X (str_buf + X > str_buf). So we don't need continue this loop. */ break; /* Otherwise(str_buf < workp), (str_buf+next_character) may equals (workp). So we continue this loop. */ } /* not matched */ d = backup_d; dend = backup_dend; workp += length + 1; } } /* match with char_range? */ #ifdef _LIBC if (nrules != 0) { uint32_t collseqval; const char *collseq = (const char *) _NL_CURRENT(LC_COLLATE, _NL_COLLATE_COLLSEQWC); collseqval = collseq_table_lookup (collseq, c); for (; workp < p - chars_length ;) { uint32_t start_val, end_val; /* We already compute the collation sequence value of the characters (or collating symbols). */ start_val = (uint32_t) *workp++; /* range_start */ end_val = (uint32_t) *workp++; /* range_end */ if (start_val <= collseqval && collseqval <= end_val) goto char_set_matched; } } else #endif { /* We set range_start_char at str_buf[0], range_end_char at str_buf[4], and compared char at str_buf[2]. */ str_buf[1] = 0; str_buf[2] = c; str_buf[3] = 0; str_buf[5] = 0; for (; workp < p - chars_length ;) { wchar_t *range_start_char, *range_end_char; /* match if (range_start_char <= c <= range_end_char). */ /* If range_start(or end) < 0, we assume -range_start(end) is the offset of the collating symbol which is specified as the character of the range start(end). */ /* range_start */ if (*workp < 0) range_start_char = charset_top - (*workp++); else { str_buf[0] = *workp++; range_start_char = str_buf; } /* range_end */ if (*workp < 0) range_end_char = charset_top - (*workp++); else { str_buf[4] = *workp++; range_end_char = str_buf + 4; } if (wcscoll(range_start_char, str_buf+2) <= 0 && wcscoll(str_buf+2, range_end_char) <= 0) goto char_set_matched; } } /* match with char? */ for (; workp < p ; workp++) if (c == *workp) goto char_set_matched; not = !not; char_set_matched: if (not) goto fail; #else /* Cast to `unsigned' instead of `unsigned char' in case the bit list is a full 32 bytes long. */ if (c < (unsigned) (*p * BYTEWIDTH) && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) not = !not; p += 1 + *p; if (!not) goto fail; #undef WORK_BUFFER_SIZE #endif /* MBS_SUPPORT */ SET_REGS_MATCHED (); d++; break; } /* The beginning of a group is represented by start_memory. The arguments are the register number in the next byte, and the number of groups inner to this one in the next. The text matched within the group is recorded (in the internal registers data structure) under the register number. */ case start_memory: DEBUG_PRINT3 ("EXECUTING start_memory %ld (%ld):\n", (long int) *p, (long int) p[1]); /* Find out if this group can match the empty string. */ p1 = p; /* To send to group_match_null_string_p. */ if (REG_MATCH_NULL_STRING_P (reg_info[*p]) == MATCH_NULL_UNSET_VALUE) REG_MATCH_NULL_STRING_P (reg_info[*p]) = group_match_null_string_p (&p1, pend, reg_info); /* Save the position in the string where we were the last time we were at this open-group operator in case the group is operated upon by a repetition operator, e.g., with `(a*)*b' against `ab'; then we want to ignore where we are now in the string in case this attempt to match fails. */ old_regstart[*p] = REG_MATCH_NULL_STRING_P (reg_info[*p]) ? REG_UNSET (regstart[*p]) ? d : regstart[*p] : regstart[*p]; DEBUG_PRINT2 (" old_regstart: %d\n", POINTER_TO_OFFSET (old_regstart[*p])); regstart[*p] = d; DEBUG_PRINT2 (" regstart: %d\n", POINTER_TO_OFFSET (regstart[*p])); IS_ACTIVE (reg_info[*p]) = 1; MATCHED_SOMETHING (reg_info[*p]) = 0; /* Clear this whenever we change the register activity status. */ set_regs_matched_done = 0; /* This is the new highest active register. */ highest_active_reg = *p; /* If nothing was active before, this is the new lowest active register. */ if (lowest_active_reg == NO_LOWEST_ACTIVE_REG) lowest_active_reg = *p; /* Move past the register number and inner group count. */ p += 2; just_past_start_mem = p; break; /* The stop_memory opcode represents the end of a group. Its arguments are the same as start_memory's: the register number, and the number of inner groups. */ case stop_memory: DEBUG_PRINT3 ("EXECUTING stop_memory %ld (%ld):\n", (long int) *p, (long int) p[1]); /* We need to save the string position the last time we were at this close-group operator in case the group is operated upon by a repetition operator, e.g., with `((a*)*(b*)*)*' against `aba'; then we want to ignore where we are now in the string in case this attempt to match fails. */ old_regend[*p] = REG_MATCH_NULL_STRING_P (reg_info[*p]) ? REG_UNSET (regend[*p]) ? d : regend[*p] : regend[*p]; DEBUG_PRINT2 (" old_regend: %d\n", POINTER_TO_OFFSET (old_regend[*p])); regend[*p] = d; DEBUG_PRINT2 (" regend: %d\n", POINTER_TO_OFFSET (regend[*p])); /* This register isn't active anymore. */ IS_ACTIVE (reg_info[*p]) = 0; /* Clear this whenever we change the register activity status. */ set_regs_matched_done = 0; /* If this was the only register active, nothing is active anymore. */ if (lowest_active_reg == highest_active_reg) { lowest_active_reg = NO_LOWEST_ACTIVE_REG; highest_active_reg = NO_HIGHEST_ACTIVE_REG; } else { /* We must scan for the new highest active register, since it isn't necessarily one less than now: consider (a(b)c(d(e)f)g). When group 3 ends, after the f), the new highest active register is 1. */ US_CHAR_TYPE r = *p - 1; while (r > 0 && !IS_ACTIVE (reg_info[r])) r--; /* If we end up at register zero, that means that we saved the registers as the result of an `on_failure_jump', not a `start_memory', and we jumped to past the innermost `stop_memory'. For example, in ((.)*) we save registers 1 and 2 as a result of the *, but when we pop back to the second ), we are at the stop_memory 1. Thus, nothing is active. */ if (r == 0) { lowest_active_reg = NO_LOWEST_ACTIVE_REG; highest_active_reg = NO_HIGHEST_ACTIVE_REG; } else highest_active_reg = r; } /* If just failed to match something this time around with a group that's operated on by a repetition operator, try to force exit from the ``loop'', and restore the register information for this group that we had before trying this last match. */ if ((!MATCHED_SOMETHING (reg_info[*p]) || just_past_start_mem == p - 1) && (p + 2) < pend) { boolean is_a_jump_n = false; p1 = p + 2; mcnt = 0; switch ((re_opcode_t) *p1++) { case jump_n: is_a_jump_n = true; case pop_failure_jump: case maybe_pop_jump: case jump: case dummy_failure_jump: EXTRACT_NUMBER_AND_INCR (mcnt, p1); if (is_a_jump_n) p1 += OFFSET_ADDRESS_SIZE; break; default: /* do nothing */ ; } p1 += mcnt; /* If the next operation is a jump backwards in the pattern to an on_failure_jump right before the start_memory corresponding to this stop_memory, exit from the loop by forcing a failure after pushing on the stack the on_failure_jump's jump in the pattern, and d. */ if (mcnt < 0 && (re_opcode_t) *p1 == on_failure_jump && (re_opcode_t) p1[1+OFFSET_ADDRESS_SIZE] == start_memory && p1[2+OFFSET_ADDRESS_SIZE] == *p) { /* If this group ever matched anything, then restore what its registers were before trying this last failed match, e.g., with `(a*)*b' against `ab' for regstart[1], and, e.g., with `((a*)*(b*)*)*' against `aba' for regend[3]. Also restore the registers for inner groups for, e.g., `((a*)(b*))*' against `aba' (register 3 would otherwise get trashed). */ if (EVER_MATCHED_SOMETHING (reg_info[*p])) { unsigned r; EVER_MATCHED_SOMETHING (reg_info[*p]) = 0; /* Restore this and inner groups' (if any) registers. */ for (r = *p; r < (unsigned) *p + (unsigned) *(p + 1); r++) { regstart[r] = old_regstart[r]; /* xx why this test? */ if (old_regend[r] >= regstart[r]) regend[r] = old_regend[r]; } } p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); PUSH_FAILURE_POINT (p1 + mcnt, d, -2); goto fail; } } /* Move past the register number and the inner group count. */ p += 2; break; /* \<digit> has been turned into a `duplicate' command which is followed by the numeric value of <digit> as the register number. */ case duplicate: { register const CHAR_TYPE *d2, *dend2; int regno = *p++; /* Get which register to match against. */ DEBUG_PRINT2 ("EXECUTING duplicate %d.\n", regno); /* Can't back reference a group which we've never matched. */ if (REG_UNSET (regstart[regno]) || REG_UNSET (regend[regno])) goto fail; /* Where in input to try to start matching. */ d2 = regstart[regno]; /* Where to stop matching; if both the place to start and the place to stop matching are in the same string, then set to the place to stop, otherwise, for now have to use the end of the first string. */ dend2 = ((FIRST_STRING_P (regstart[regno]) == FIRST_STRING_P (regend[regno])) ? regend[regno] : end_match_1); for (;;) { /* If necessary, advance to next segment in register contents. */ while (d2 == dend2) { if (dend2 == end_match_2) break; if (dend2 == regend[regno]) break; /* End of string1 => advance to string2. */ d2 = string2; dend2 = regend[regno]; } /* At end of register contents => success */ if (d2 == dend2) break; /* If necessary, advance to next segment in data. */ PREFETCH (); /* How many characters left in this segment to match. */ mcnt = dend - d; /* Want how many consecutive characters we can match in one shot, so, if necessary, adjust the count. */ if (mcnt > dend2 - d2) mcnt = dend2 - d2; /* Compare that many; failure if mismatch, else move past them. */ if (translate ? bcmp_translate (d, d2, mcnt, translate) : memcmp (d, d2, mcnt*sizeof(US_CHAR_TYPE))) goto fail; d += mcnt, d2 += mcnt; /* Do this because we've match some characters. */ SET_REGS_MATCHED (); } } break; /* begline matches the empty string at the beginning of the string (unless `not_bol' is set in `bufp'), and, if `newline_anchor' is set, after newlines. */ case begline: DEBUG_PRINT1 ("EXECUTING begline.\n"); if (AT_STRINGS_BEG (d)) { if (!bufp->not_bol) break; } else if (d[-1] == '\n' && bufp->newline_anchor) { break; } /* In all other cases, we fail. */ goto fail; /* endline is the dual of begline. */ case endline: DEBUG_PRINT1 ("EXECUTING endline.\n"); if (AT_STRINGS_END (d)) { if (!bufp->not_eol) break; } /* We have to ``prefetch'' the next character. */ else if ((d == end1 ? *string2 : *d) == '\n' && bufp->newline_anchor) { break; } goto fail; /* Match at the very beginning of the data. */ case begbuf: DEBUG_PRINT1 ("EXECUTING begbuf.\n"); if (AT_STRINGS_BEG (d)) break; goto fail; /* Match at the very end of the data. */ case endbuf: DEBUG_PRINT1 ("EXECUTING endbuf.\n"); if (AT_STRINGS_END (d)) break; goto fail; /* on_failure_keep_string_jump is used to optimize `.*\n'. It pushes NULL as the value for the string on the stack. Then `pop_failure_point' will keep the current value for the string, instead of restoring it. To see why, consider matching `foo\nbar' against `.*\n'. The .* matches the foo; then the . fails against the \n. But the next thing we want to do is match the \n against the \n; if we restored the string value, we would be back at the foo. Because this is used only in specific cases, we don't need to check all the things that `on_failure_jump' does, to make sure the right things get saved on the stack. Hence we don't share its code. The only reason to push anything on the stack at all is that otherwise we would have to change `anychar's code to do something besides goto fail in this case; that seems worse than this. */ case on_failure_keep_string_jump: DEBUG_PRINT1 ("EXECUTING on_failure_keep_string_jump"); EXTRACT_NUMBER_AND_INCR (mcnt, p); #ifdef _LIBC DEBUG_PRINT3 (" %d (to %p):\n", mcnt, p + mcnt); #else DEBUG_PRINT3 (" %d (to 0x%x):\n", mcnt, p + mcnt); #endif PUSH_FAILURE_POINT (p + mcnt, NULL, -2); break; /* Uses of on_failure_jump: Each alternative starts with an on_failure_jump that points to the beginning of the next alternative. Each alternative except the last ends with a jump that in effect jumps past the rest of the alternatives. (They really jump to the ending jump of the following alternative, because tensioning these jumps is a hassle.) Repeats start with an on_failure_jump that points past both the repetition text and either the following jump or pop_failure_jump back to this on_failure_jump. */ case on_failure_jump: on_failure: DEBUG_PRINT1 ("EXECUTING on_failure_jump"); EXTRACT_NUMBER_AND_INCR (mcnt, p); #ifdef _LIBC DEBUG_PRINT3 (" %d (to %p)", mcnt, p + mcnt); #else DEBUG_PRINT3 (" %d (to 0x%x)", mcnt, p + mcnt); #endif /* If this on_failure_jump comes right before a group (i.e., the original * applied to a group), save the information for that group and all inner ones, so that if we fail back to this point, the group's information will be correct. For example, in \(a*\)*\1, we need the preceding group, and in \(zz\(a*\)b*\)\2, we need the inner group. */ /* We can't use `p' to check ahead because we push a failure point to `p + mcnt' after we do this. */ p1 = p; /* We need to skip no_op's before we look for the start_memory in case this on_failure_jump is happening as the result of a completed succeed_n, as in \(a\)\{1,3\}b\1 against aba. */ while (p1 < pend && (re_opcode_t) *p1 == no_op) p1++; if (p1 < pend && (re_opcode_t) *p1 == start_memory) { /* We have a new highest active register now. This will get reset at the start_memory we are about to get to, but we will have saved all the registers relevant to this repetition op, as described above. */ highest_active_reg = *(p1 + 1) + *(p1 + 2); if (lowest_active_reg == NO_LOWEST_ACTIVE_REG) lowest_active_reg = *(p1 + 1); } DEBUG_PRINT1 (":\n"); PUSH_FAILURE_POINT (p + mcnt, d, -2); break; /* A smart repeat ends with `maybe_pop_jump'. We change it to either `pop_failure_jump' or `jump'. */ case maybe_pop_jump: EXTRACT_NUMBER_AND_INCR (mcnt, p); DEBUG_PRINT2 ("EXECUTING maybe_pop_jump %d.\n", mcnt); { register US_CHAR_TYPE *p2 = p; /* Compare the beginning of the repeat with what in the pattern follows its end. If we can establish that there is nothing that they would both match, i.e., that we would have to backtrack because of (as in, e.g., `a*a') then we can change to pop_failure_jump, because we'll never have to backtrack. This is not true in the case of alternatives: in `(a|ab)*' we do need to backtrack to the `ab' alternative (e.g., if the string was `ab'). But instead of trying to detect that here, the alternative has put on a dummy failure point which is what we will end up popping. */ /* Skip over open/close-group commands. If what follows this loop is a ...+ construct, look at what begins its body, since we will have to match at least one of that. */ while (1) { if (p2 + 2 < pend && ((re_opcode_t) *p2 == stop_memory || (re_opcode_t) *p2 == start_memory)) p2 += 3; else if (p2 + 2 + 2 * OFFSET_ADDRESS_SIZE < pend && (re_opcode_t) *p2 == dummy_failure_jump) p2 += 2 + 2 * OFFSET_ADDRESS_SIZE; else break; } p1 = p + mcnt; /* p1[0] ... p1[2] are the `on_failure_jump' corresponding to the `maybe_finalize_jump' of this case. Examine what follows. */ /* If we're at the end of the pattern, we can change. */ if (p2 == pend) { /* Consider what happens when matching ":\(.*\)" against ":/". I don't really understand this code yet. */ p[-(1+OFFSET_ADDRESS_SIZE)] = (US_CHAR_TYPE) pop_failure_jump; DEBUG_PRINT1 (" End of pattern: change to `pop_failure_jump'.\n"); } else if ((re_opcode_t) *p2 == exactn #ifdef MBS_SUPPORT || (re_opcode_t) *p2 == exactn_bin #endif || (bufp->newline_anchor && (re_opcode_t) *p2 == endline)) { register US_CHAR_TYPE c = *p2 == (US_CHAR_TYPE) endline ? '\n' : p2[2]; if (((re_opcode_t) p1[1+OFFSET_ADDRESS_SIZE] == exactn #ifdef MBS_SUPPORT || (re_opcode_t) p1[1+OFFSET_ADDRESS_SIZE] == exactn_bin #endif ) && p1[3+OFFSET_ADDRESS_SIZE] != c) { p[-(1+OFFSET_ADDRESS_SIZE)] = (US_CHAR_TYPE) pop_failure_jump; #ifdef MBS_SUPPORT if (MB_CUR_MAX != 1) DEBUG_PRINT3 (" %C != %C => pop_failure_jump.\n", (wint_t) c, (wint_t) p1[3+OFFSET_ADDRESS_SIZE]); else #endif DEBUG_PRINT3 (" %c != %c => pop_failure_jump.\n", (char) c, (char) p1[3+OFFSET_ADDRESS_SIZE]); } #ifndef MBS_SUPPORT else if ((re_opcode_t) p1[3] == charset || (re_opcode_t) p1[3] == charset_not) { int not = (re_opcode_t) p1[3] == charset_not; if (c < (unsigned) (p1[4] * BYTEWIDTH) && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) not = !not; /* `not' is equal to 1 if c would match, which means that we can't change to pop_failure_jump. */ if (!not) { p[-3] = (unsigned char) pop_failure_jump; DEBUG_PRINT1 (" No match => pop_failure_jump.\n"); } } #endif /* not MBS_SUPPORT */ } #ifndef MBS_SUPPORT else if ((re_opcode_t) *p2 == charset) { /* We win if the first character of the loop is not part of the charset. */ if ((re_opcode_t) p1[3] == exactn && ! ((int) p2[1] * BYTEWIDTH > (int) p1[5] && (p2[2 + p1[5] / BYTEWIDTH] & (1 << (p1[5] % BYTEWIDTH))))) { p[-3] = (unsigned char) pop_failure_jump; DEBUG_PRINT1 (" No match => pop_failure_jump.\n"); } else if ((re_opcode_t) p1[3] == charset_not) { int idx; /* We win if the charset_not inside the loop lists every character listed in the charset after. */ for (idx = 0; idx < (int) p2[1]; idx++) if (! (p2[2 + idx] == 0 || (idx < (int) p1[4] && ((p2[2 + idx] & ~ p1[5 + idx]) == 0)))) break; if (idx == p2[1]) { p[-3] = (unsigned char) pop_failure_jump; DEBUG_PRINT1 (" No match => pop_failure_jump.\n"); } } else if ((re_opcode_t) p1[3] == charset) { int idx; /* We win if the charset inside the loop has no overlap with the one after the loop. */ for (idx = 0; idx < (int) p2[1] && idx < (int) p1[4]; idx++) if ((p2[2 + idx] & p1[5 + idx]) != 0) break; if (idx == p2[1] || idx == p1[4]) { p[-3] = (unsigned char) pop_failure_jump; DEBUG_PRINT1 (" No match => pop_failure_jump.\n"); } } } #endif /* not MBS_SUPPORT */ } p -= OFFSET_ADDRESS_SIZE; /* Point at relative address again. */ if ((re_opcode_t) p[-1] != pop_failure_jump) { p[-1] = (US_CHAR_TYPE) jump; DEBUG_PRINT1 (" Match => jump.\n"); goto unconditional_jump; } /* Note fall through. */ /* The end of a simple repeat has a pop_failure_jump back to its matching on_failure_jump, where the latter will push a failure point. The pop_failure_jump takes off failure points put on by this pop_failure_jump's matching on_failure_jump; we got through the pattern to here from the matching on_failure_jump, so didn't fail. */ case pop_failure_jump: { /* We need to pass separate storage for the lowest and highest registers, even though we don't care about the actual values. Otherwise, we will restore only one register from the stack, since lowest will == highest in `pop_failure_point'. */ active_reg_t dummy_low_reg, dummy_high_reg; US_CHAR_TYPE *pdummy = NULL; const CHAR_TYPE *sdummy = NULL; DEBUG_PRINT1 ("EXECUTING pop_failure_jump.\n"); POP_FAILURE_POINT (sdummy, pdummy, dummy_low_reg, dummy_high_reg, reg_dummy, reg_dummy, reg_info_dummy); } /* Note fall through. */ unconditional_jump: #ifdef _LIBC DEBUG_PRINT2 ("\n%p: ", p); #else DEBUG_PRINT2 ("\n0x%x: ", p); #endif /* Note fall through. */ /* Unconditionally jump (without popping any failure points). */ case jump: EXTRACT_NUMBER_AND_INCR (mcnt, p); /* Get the amount to jump. */ DEBUG_PRINT2 ("EXECUTING jump %d ", mcnt); p += mcnt; /* Do the jump. */ #ifdef _LIBC DEBUG_PRINT2 ("(to %p).\n", p); #else DEBUG_PRINT2 ("(to 0x%x).\n", p); #endif break; /* We need this opcode so we can detect where alternatives end in `group_match_null_string_p' et al. */ case jump_past_alt: DEBUG_PRINT1 ("EXECUTING jump_past_alt.\n"); goto unconditional_jump; /* Normally, the on_failure_jump pushes a failure point, which then gets popped at pop_failure_jump. We will end up at pop_failure_jump, also, and with a pattern of, say, `a+', we are skipping over the on_failure_jump, so we have to push something meaningless for pop_failure_jump to pop. */ case dummy_failure_jump: DEBUG_PRINT1 ("EXECUTING dummy_failure_jump.\n"); /* It doesn't matter what we push for the string here. What the code at `fail' tests is the value for the pattern. */ PUSH_FAILURE_POINT (NULL, NULL, -2); goto unconditional_jump; /* At the end of an alternative, we need to push a dummy failure point in case we are followed by a `pop_failure_jump', because we don't want the failure point for the alternative to be popped. For example, matching `(a|ab)*' against `aab' requires that we match the `ab' alternative. */ case push_dummy_failure: DEBUG_PRINT1 ("EXECUTING push_dummy_failure.\n"); /* See comments just above at `dummy_failure_jump' about the two zeroes. */ PUSH_FAILURE_POINT (NULL, NULL, -2); break; /* Have to succeed matching what follows at least n times. After that, handle like `on_failure_jump'. */ case succeed_n: EXTRACT_NUMBER (mcnt, p + OFFSET_ADDRESS_SIZE); DEBUG_PRINT2 ("EXECUTING succeed_n %d.\n", mcnt); assert (mcnt >= 0); /* Originally, this is how many times we HAVE to succeed. */ if (mcnt > 0) { mcnt--; p += OFFSET_ADDRESS_SIZE; STORE_NUMBER_AND_INCR (p, mcnt); #ifdef _LIBC DEBUG_PRINT3 (" Setting %p to %d.\n", p - OFFSET_ADDRESS_SIZE , mcnt); #else DEBUG_PRINT3 (" Setting 0x%x to %d.\n", p - OFFSET_ADDRESS_SIZE , mcnt); #endif } else if (mcnt == 0) { #ifdef _LIBC DEBUG_PRINT2 (" Setting two bytes from %p to no_op.\n", p + OFFSET_ADDRESS_SIZE); #else DEBUG_PRINT2 (" Setting two bytes from 0x%x to no_op.\n", p + OFFSET_ADDRESS_SIZE); #endif /* _LIBC */ #ifdef MBS_SUPPORT p[1] = (US_CHAR_TYPE) no_op; #else p[2] = (US_CHAR_TYPE) no_op; p[3] = (US_CHAR_TYPE) no_op; #endif /* MBS_SUPPORT */ goto on_failure; } break; case jump_n: EXTRACT_NUMBER (mcnt, p + OFFSET_ADDRESS_SIZE); DEBUG_PRINT2 ("EXECUTING jump_n %d.\n", mcnt); /* Originally, this is how many times we CAN jump. */ if (mcnt) { mcnt--; STORE_NUMBER (p + OFFSET_ADDRESS_SIZE, mcnt); #ifdef _LIBC DEBUG_PRINT3 (" Setting %p to %d.\n", p + OFFSET_ADDRESS_SIZE, mcnt); #else DEBUG_PRINT3 (" Setting 0x%x to %d.\n", p + OFFSET_ADDRESS_SIZE, mcnt); #endif /* _LIBC */ goto unconditional_jump; } /* If don't have to jump any more, skip over the rest of command. */ else p += 2 * OFFSET_ADDRESS_SIZE; break; case set_number_at: { DEBUG_PRINT1 ("EXECUTING set_number_at.\n"); EXTRACT_NUMBER_AND_INCR (mcnt, p); p1 = p + mcnt; EXTRACT_NUMBER_AND_INCR (mcnt, p); #ifdef _LIBC DEBUG_PRINT3 (" Setting %p to %d.\n", p1, mcnt); #else DEBUG_PRINT3 (" Setting 0x%x to %d.\n", p1, mcnt); #endif STORE_NUMBER (p1, mcnt); break; } #if 0 /* The DEC Alpha C compiler 3.x generates incorrect code for the test WORDCHAR_P (d - 1) != WORDCHAR_P (d) in the expansion of AT_WORD_BOUNDARY, so this code is disabled. Expanding the macro and introducing temporary variables works around the bug. */ case wordbound: DEBUG_PRINT1 ("EXECUTING wordbound.\n"); if (AT_WORD_BOUNDARY (d)) break; goto fail; case notwordbound: DEBUG_PRINT1 ("EXECUTING notwordbound.\n"); if (AT_WORD_BOUNDARY (d)) goto fail; break; #else case wordbound: { boolean prevchar, thischar; DEBUG_PRINT1 ("EXECUTING wordbound.\n"); if (AT_STRINGS_BEG (d) || AT_STRINGS_END (d)) break; prevchar = WORDCHAR_P (d - 1); thischar = WORDCHAR_P (d); if (prevchar != thischar) break; goto fail; } case notwordbound: { boolean prevchar, thischar; DEBUG_PRINT1 ("EXECUTING notwordbound.\n"); if (AT_STRINGS_BEG (d) || AT_STRINGS_END (d)) goto fail; prevchar = WORDCHAR_P (d - 1); thischar = WORDCHAR_P (d); if (prevchar != thischar) goto fail; break; } #endif case wordbeg: DEBUG_PRINT1 ("EXECUTING wordbeg.\n"); if (WORDCHAR_P (d) && (AT_STRINGS_BEG (d) || !WORDCHAR_P (d - 1))) break; goto fail; case wordend: DEBUG_PRINT1 ("EXECUTING wordend.\n"); if (!AT_STRINGS_BEG (d) && WORDCHAR_P (d - 1) && (!WORDCHAR_P (d) || AT_STRINGS_END (d))) break; goto fail; #ifdef emacs case before_dot: DEBUG_PRINT1 ("EXECUTING before_dot.\n"); if (PTR_CHAR_POS ((unsigned char *) d) >= point) goto fail; break; case at_dot: DEBUG_PRINT1 ("EXECUTING at_dot.\n"); if (PTR_CHAR_POS ((unsigned char *) d) != point) goto fail; break; case after_dot: DEBUG_PRINT1 ("EXECUTING after_dot.\n"); if (PTR_CHAR_POS ((unsigned char *) d) <= point) goto fail; break; case syntaxspec: DEBUG_PRINT2 ("EXECUTING syntaxspec %d.\n", mcnt); mcnt = *p++; goto matchsyntax; case wordchar: DEBUG_PRINT1 ("EXECUTING Emacs wordchar.\n"); mcnt = (int) Sword; matchsyntax: PREFETCH (); /* Can't use *d++ here; SYNTAX may be an unsafe macro. */ d++; if (SYNTAX (d[-1]) != (enum syntaxcode) mcnt) goto fail; SET_REGS_MATCHED (); break; case notsyntaxspec: DEBUG_PRINT2 ("EXECUTING notsyntaxspec %d.\n", mcnt); mcnt = *p++; goto matchnotsyntax; case notwordchar: DEBUG_PRINT1 ("EXECUTING Emacs notwordchar.\n"); mcnt = (int) Sword; matchnotsyntax: PREFETCH (); /* Can't use *d++ here; SYNTAX may be an unsafe macro. */ d++; if (SYNTAX (d[-1]) == (enum syntaxcode) mcnt) goto fail; SET_REGS_MATCHED (); break; #else /* not emacs */ case wordchar: DEBUG_PRINT1 ("EXECUTING non-Emacs wordchar.\n"); PREFETCH (); if (!WORDCHAR_P (d)) goto fail; SET_REGS_MATCHED (); d++; break; case notwordchar: DEBUG_PRINT1 ("EXECUTING non-Emacs notwordchar.\n"); PREFETCH (); if (WORDCHAR_P (d)) goto fail; SET_REGS_MATCHED (); d++; break; #endif /* not emacs */ default: abort (); } continue; /* Successfully executed one pattern command; keep going. */ /* We goto here if a matching operation fails. */ fail: if (!FAIL_STACK_EMPTY ()) { /* A restart point is known. Restore to that state. */ DEBUG_PRINT1 ("\nFAIL:\n"); POP_FAILURE_POINT (d, p, lowest_active_reg, highest_active_reg, regstart, regend, reg_info); /* If this failure point is a dummy, try the next one. */ if (!p) goto fail; /* If we failed to the end of the pattern, don't examine *p. */ assert (p <= pend); if (p < pend) { boolean is_a_jump_n = false; /* If failed to a backwards jump that's part of a repetition loop, need to pop this failure point and use the next one. */ switch ((re_opcode_t) *p) { case jump_n: is_a_jump_n = true; case maybe_pop_jump: case pop_failure_jump: case jump: p1 = p + 1; EXTRACT_NUMBER_AND_INCR (mcnt, p1); p1 += mcnt; if ((is_a_jump_n && (re_opcode_t) *p1 == succeed_n) || (!is_a_jump_n && (re_opcode_t) *p1 == on_failure_jump)) goto fail; break; default: /* do nothing */ ; } } if (d >= string1 && d <= end1) dend = end_match_1; } else break; /* Matching at this starting point really fails. */ } /* for (;;) */ if (best_regs_set) goto restore_best_regs; FREE_VARIABLES (); return -1; /* Failure to match. */ } /* re_match_2 */ /* Subroutine definitions for re_match_2. */ /* We are passed P pointing to a register number after a start_memory. Return true if the pattern up to the corresponding stop_memory can match the empty string, and false otherwise. If we find the matching stop_memory, sets P to point to one past its number. Otherwise, sets P to an undefined byte less than or equal to END. We don't handle duplicates properly (yet). */ static boolean group_match_null_string_p (p, end, reg_info) US_CHAR_TYPE **p, *end; register_info_type *reg_info; { int mcnt; /* Point to after the args to the start_memory. */ US_CHAR_TYPE *p1 = *p + 2; while (p1 < end) { /* Skip over opcodes that can match nothing, and return true or false, as appropriate, when we get to one that can't, or to the matching stop_memory. */ switch ((re_opcode_t) *p1) { /* Could be either a loop or a series of alternatives. */ case on_failure_jump: p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); /* If the next operation is not a jump backwards in the pattern. */ if (mcnt >= 0) { /* Go through the on_failure_jumps of the alternatives, seeing if any of the alternatives cannot match nothing. The last alternative starts with only a jump, whereas the rest start with on_failure_jump and end with a jump, e.g., here is the pattern for `a|b|c': /on_failure_jump/0/6/exactn/1/a/jump_past_alt/0/6 /on_failure_jump/0/6/exactn/1/b/jump_past_alt/0/3 /exactn/1/c So, we have to first go through the first (n-1) alternatives and then deal with the last one separately. */ /* Deal with the first (n-1) alternatives, which start with an on_failure_jump (see above) that jumps to right past a jump_past_alt. */ while ((re_opcode_t) p1[mcnt-(1+OFFSET_ADDRESS_SIZE)] == jump_past_alt) { /* `mcnt' holds how many bytes long the alternative is, including the ending `jump_past_alt' and its number. */ if (!alt_match_null_string_p (p1, p1 + mcnt - (1 + OFFSET_ADDRESS_SIZE), reg_info)) return false; /* Move to right after this alternative, including the jump_past_alt. */ p1 += mcnt; /* Break if it's the beginning of an n-th alternative that doesn't begin with an on_failure_jump. */ if ((re_opcode_t) *p1 != on_failure_jump) break; /* Still have to check that it's not an n-th alternative that starts with an on_failure_jump. */ p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); if ((re_opcode_t) p1[mcnt-(1+OFFSET_ADDRESS_SIZE)] != jump_past_alt) { /* Get to the beginning of the n-th alternative. */ p1 -= 1 + OFFSET_ADDRESS_SIZE; break; } } /* Deal with the last alternative: go back and get number of the `jump_past_alt' just before it. `mcnt' contains the length of the alternative. */ EXTRACT_NUMBER (mcnt, p1 - OFFSET_ADDRESS_SIZE); if (!alt_match_null_string_p (p1, p1 + mcnt, reg_info)) return false; p1 += mcnt; /* Get past the n-th alternative. */ } /* if mcnt > 0 */ break; case stop_memory: assert (p1[1] == **p); *p = p1 + 2; return true; default: if (!common_op_match_null_string_p (&p1, end, reg_info)) return false; } } /* while p1 < end */ return false; } /* group_match_null_string_p */ /* Similar to group_match_null_string_p, but doesn't deal with alternatives: It expects P to be the first byte of a single alternative and END one byte past the last. The alternative can contain groups. */ static boolean alt_match_null_string_p (p, end, reg_info) US_CHAR_TYPE *p, *end; register_info_type *reg_info; { int mcnt; US_CHAR_TYPE *p1 = p; while (p1 < end) { /* Skip over opcodes that can match nothing, and break when we get to one that can't. */ switch ((re_opcode_t) *p1) { /* It's a loop. */ case on_failure_jump: p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); p1 += mcnt; break; default: if (!common_op_match_null_string_p (&p1, end, reg_info)) return false; } } /* while p1 < end */ return true; } /* alt_match_null_string_p */ /* Deals with the ops common to group_match_null_string_p and alt_match_null_string_p. Sets P to one after the op and its arguments, if any. */ static boolean common_op_match_null_string_p (p, end, reg_info) US_CHAR_TYPE **p, *end; register_info_type *reg_info; { int mcnt; boolean ret; int reg_no; US_CHAR_TYPE *p1 = *p; switch ((re_opcode_t) *p1++) { case no_op: case begline: case endline: case begbuf: case endbuf: case wordbeg: case wordend: case wordbound: case notwordbound: #ifdef emacs case before_dot: case at_dot: case after_dot: #endif break; case start_memory: reg_no = *p1; assert (reg_no > 0 && reg_no <= MAX_REGNUM); ret = group_match_null_string_p (&p1, end, reg_info); /* Have to set this here in case we're checking a group which contains a group and a back reference to it. */ if (REG_MATCH_NULL_STRING_P (reg_info[reg_no]) == MATCH_NULL_UNSET_VALUE) REG_MATCH_NULL_STRING_P (reg_info[reg_no]) = ret; if (!ret) return false; break; /* If this is an optimized succeed_n for zero times, make the jump. */ case jump: EXTRACT_NUMBER_AND_INCR (mcnt, p1); if (mcnt >= 0) p1 += mcnt; else return false; break; case succeed_n: /* Get to the number of times to succeed. */ p1 += OFFSET_ADDRESS_SIZE; EXTRACT_NUMBER_AND_INCR (mcnt, p1); if (mcnt == 0) { p1 -= 2 * OFFSET_ADDRESS_SIZE; EXTRACT_NUMBER_AND_INCR (mcnt, p1); p1 += mcnt; } else return false; break; case duplicate: if (!REG_MATCH_NULL_STRING_P (reg_info[*p1])) return false; break; case set_number_at: p1 += 2 * OFFSET_ADDRESS_SIZE; default: /* All other opcodes mean we cannot match the empty string. */ return false; } *p = p1; return true; } /* common_op_match_null_string_p */ /* Return zero if TRANSLATE[S1] and TRANSLATE[S2] are identical for LEN bytes; nonzero otherwise. */ static int bcmp_translate (s1, s2, len, translate) const CHAR_TYPE *s1, *s2; register int len; RE_TRANSLATE_TYPE translate; { register const US_CHAR_TYPE *p1 = (const US_CHAR_TYPE *) s1; register const US_CHAR_TYPE *p2 = (const US_CHAR_TYPE *) s2; while (len) { #ifdef MBS_SUPPORT if (((*p1<=0xff)?translate[*p1++]:*p1++) != ((*p2<=0xff)?translate[*p2++]:*p2++)) return 1; #else if (translate[*p1++] != translate[*p2++]) return 1; #endif /* MBS_SUPPORT */ len--; } return 0; } /* Entry points for GNU code. */ /* re_compile_pattern is the GNU regular expression compiler: it compiles PATTERN (of length SIZE) and puts the result in BUFP. Returns 0 if the pattern was valid, otherwise an error string. Assumes the `allocated' (and perhaps `buffer') and `translate' fields are set in BUFP on entry. We call regex_compile to do the actual compilation. */ const char * re_compile_pattern (pattern, length, bufp) const char *pattern; size_t length; struct re_pattern_buffer *bufp; { reg_errcode_t ret; /* GNU code is written to assume at least RE_NREGS registers will be set (and at least one extra will be -1). */ bufp->regs_allocated = REGS_UNALLOCATED; /* And GNU code determines whether or not to get register information by passing null for the REGS argument to re_match, etc., not by setting no_sub. */ bufp->no_sub = 0; /* Match anchors at newline. */ bufp->newline_anchor = 1; ret = regex_compile (pattern, length, re_syntax_options, bufp); if (!ret) return NULL; return gettext (re_error_msgid + re_error_msgid_idx[(int) ret]); } #ifdef _LIBC weak_alias (__re_compile_pattern, re_compile_pattern) #endif /* Entry points compatible with 4.2 BSD regex library. We don't define them unless specifically requested. */ #if defined _REGEX_RE_COMP || defined _LIBC /* BSD has one and only one pattern buffer. */ static struct re_pattern_buffer re_comp_buf; char * #ifdef _LIBC /* Make these definitions weak in libc, so POSIX programs can redefine these names if they don't use our functions, and still use regcomp/regexec below without link errors. */ weak_function #endif re_comp (s) const char *s; { reg_errcode_t ret; if (!s) { if (!re_comp_buf.buffer) return gettext ("No previous regular expression"); return 0; } if (!re_comp_buf.buffer) { re_comp_buf.buffer = (unsigned char *) malloc (200); if (re_comp_buf.buffer == NULL) return (char *) gettext (re_error_msgid + re_error_msgid_idx[(int) REG_ESPACE]); re_comp_buf.allocated = 200; re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH); if (re_comp_buf.fastmap == NULL) return (char *) gettext (re_error_msgid + re_error_msgid_idx[(int) REG_ESPACE]); } /* Since `re_exec' always passes NULL for the `regs' argument, we don't need to initialize the pattern buffer fields which affect it. */ /* Match anchors at newlines. */ re_comp_buf.newline_anchor = 1; ret = regex_compile (s, strlen (s), re_syntax_options, &re_comp_buf); if (!ret) return NULL; /* Yes, we're discarding `const' here if !HAVE_LIBINTL. */ return (char *) gettext (re_error_msgid + re_error_msgid_idx[(int) ret]); } int #ifdef _LIBC weak_function #endif re_exec (s) const char *s; { const int len = strlen (s); return 0 <= re_search (&re_comp_buf, s, len, 0, len, (struct re_registers *) 0); } #endif /* _REGEX_RE_COMP */ /* POSIX.2 functions. Don't define these for Emacs. */ #ifndef emacs /* regcomp takes a regular expression as a string and compiles it. PREG is a regex_t *. We do not expect any fields to be initialized, since POSIX says we shouldn't. Thus, we set `buffer' to the compiled pattern; `used' to the length of the compiled pattern; `syntax' to RE_SYNTAX_POSIX_EXTENDED if the REG_EXTENDED bit in CFLAGS is set; otherwise, to RE_SYNTAX_POSIX_BASIC; `newline_anchor' to REG_NEWLINE being set in CFLAGS; `fastmap' to an allocated space for the fastmap; `fastmap_accurate' to zero; `re_nsub' to the number of subexpressions in PATTERN. PATTERN is the address of the pattern string. CFLAGS is a series of bits which affect compilation. If REG_EXTENDED is set, we use POSIX extended syntax; otherwise, we use POSIX basic syntax. If REG_NEWLINE is set, then . and [^...] don't match newline. Also, regexec will try a match beginning after every newline. If REG_ICASE is set, then we considers upper- and lowercase versions of letters to be equivalent when matching. If REG_NOSUB is set, then when PREG is passed to regexec, that routine will report only success or failure, and nothing about the registers. It returns 0 if it succeeds, nonzero if it doesn't. (See regex.h for the return codes and their meanings.) */ int regcomp (preg, pattern, cflags) regex_t *preg; const char *pattern; int cflags; { reg_errcode_t ret; reg_syntax_t syntax = (cflags & REG_EXTENDED) ? RE_SYNTAX_POSIX_EXTENDED : RE_SYNTAX_POSIX_BASIC; /* regex_compile will allocate the space for the compiled pattern. */ preg->buffer = 0; preg->allocated = 0; preg->used = 0; /* Try to allocate space for the fastmap. */ preg->fastmap = (char *) malloc (1 << BYTEWIDTH); if (cflags & REG_ICASE) { unsigned i; preg->translate = (RE_TRANSLATE_TYPE) malloc (CHAR_SET_SIZE * sizeof (*(RE_TRANSLATE_TYPE)0)); if (preg->translate == NULL) return (int) REG_ESPACE; /* Map uppercase characters to corresponding lowercase ones. */ for (i = 0; i < CHAR_SET_SIZE; i++) preg->translate[i] = ISUPPER (i) ? TOLOWER (i) : i; } else preg->translate = NULL; /* If REG_NEWLINE is set, newlines are treated differently. */ if (cflags & REG_NEWLINE) { /* REG_NEWLINE implies neither . nor [^...] match newline. */ syntax &= ~RE_DOT_NEWLINE; syntax |= RE_HAT_LISTS_NOT_NEWLINE; /* It also changes the matching behavior. */ preg->newline_anchor = 1; } else preg->newline_anchor = 0; preg->no_sub = !!(cflags & REG_NOSUB); /* POSIX says a null character in the pattern terminates it, so we can use strlen here in compiling the pattern. */ ret = regex_compile (pattern, strlen (pattern), syntax, preg); /* POSIX doesn't distinguish between an unmatched open-group and an unmatched close-group: both are REG_EPAREN. */ if (ret == REG_ERPAREN) ret = REG_EPAREN; if (ret == REG_NOERROR && preg->fastmap) { /* Compute the fastmap now, since regexec cannot modify the pattern buffer. */ if (re_compile_fastmap (preg) == -2) { /* Some error occurred while computing the fastmap, just forget about it. */ free (preg->fastmap); preg->fastmap = NULL; } } return (int) ret; } #ifdef _LIBC weak_alias (__regcomp, regcomp) #endif /* regexec searches for a given pattern, specified by PREG, in the string STRING. If NMATCH is zero or REG_NOSUB was set in the cflags argument to `regcomp', we ignore PMATCH. Otherwise, we assume PMATCH has at least NMATCH elements, and we set them to the offsets of the corresponding matched substrings. EFLAGS specifies `execution flags' which affect matching: if REG_NOTBOL is set, then ^ does not match at the beginning of the string; if REG_NOTEOL is set, then $ does not match at the end. We return 0 if we find a match and REG_NOMATCH if not. */ int regexec (preg, string, nmatch, pmatch, eflags) const regex_t *preg; const char *string; size_t nmatch; regmatch_t pmatch[]; int eflags; { int ret; struct re_registers regs; regex_t private_preg; int len = strlen (string); boolean want_reg_info = !preg->no_sub && nmatch > 0; private_preg = *preg; private_preg.not_bol = !!(eflags & REG_NOTBOL); private_preg.not_eol = !!(eflags & REG_NOTEOL); /* The user has told us exactly how many registers to return information about, via `nmatch'. We have to pass that on to the matching routines. */ private_preg.regs_allocated = REGS_FIXED; if (want_reg_info) { regs.num_regs = nmatch; regs.start = TALLOC (nmatch * 2, regoff_t); if (regs.start == NULL) return (int) REG_NOMATCH; regs.end = regs.start + nmatch; } /* Perform the searching operation. */ ret = re_search (&private_preg, string, len, /* start: */ 0, /* range: */ len, want_reg_info ? ®s : (struct re_registers *) 0); /* Copy the register information to the POSIX structure. */ if (want_reg_info) { if (ret >= 0) { unsigned r; for (r = 0; r < nmatch; r++) { pmatch[r].rm_so = regs.start[r]; pmatch[r].rm_eo = regs.end[r]; } } /* If we needed the temporary register info, free the space now. */ free (regs.start); } /* We want zero return to mean success, unlike `re_search'. */ return ret >= 0 ? (int) REG_NOERROR : (int) REG_NOMATCH; } #ifdef _LIBC weak_alias (__regexec, regexec) #endif /* Returns a message corresponding to an error code, ERRCODE, returned from either regcomp or regexec. We don't use PREG here. */ size_t regerror (errcode, preg, errbuf, errbuf_size) int errcode; const regex_t *preg; char *errbuf; size_t errbuf_size; { const char *msg; size_t msg_size; if (errcode < 0 || errcode >= (int) (sizeof (re_error_msgid_idx) / sizeof (re_error_msgid_idx[0]))) /* Only error codes returned by the rest of the code should be passed to this routine. If we are given anything else, or if other regex code generates an invalid error code, then the program has a bug. Dump core so we can fix it. */ abort (); msg = gettext (re_error_msgid + re_error_msgid_idx[errcode]); msg_size = strlen (msg) + 1; /* Includes the null. */ if (errbuf_size != 0) { if (msg_size > errbuf_size) { #if defined HAVE_MEMPCPY || defined _LIBC *((char *) __mempcpy (errbuf, msg, errbuf_size - 1)) = '\0'; #else memcpy (errbuf, msg, errbuf_size - 1); errbuf[errbuf_size - 1] = 0; #endif } else memcpy (errbuf, msg, msg_size); } return msg_size; } #ifdef _LIBC weak_alias (__regerror, regerror) #endif /* Free dynamically allocated space used by PREG. */ void regfree (preg) regex_t *preg; { if (preg->buffer != NULL) free (preg->buffer); preg->buffer = NULL; preg->allocated = 0; preg->used = 0; if (preg->fastmap != NULL) free (preg->fastmap); preg->fastmap = NULL; preg->fastmap_accurate = 0; if (preg->translate != NULL) free (preg->translate); preg->translate = NULL; } #ifdef _LIBC weak_alias (__regfree, regfree) #endif #endif /* not emacs */ ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/Strings.cc����������������������������������������������������������������������0000644�0000000�0000000�00000010120�13722700563�015425� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#include <cstdio> #include "Strings.h" namespace str { void wraptext(ostream &out, string const &in, unsigned indent, unsigned initial, unsigned width) { char const *p = in.c_str(); int ind = indent - initial, wid = width - initial; if (initial < indent) wid = width - indent; while (*p) { int count; bool newlines = false; if (*p && ind > 0) { out << string(ind, ' '); ind = indent; } for (count = 0; p[count] != 0 && p[count] != '\n' && count < wid; count++) ; if (p[count] == '\n') newlines = true; // move back to last space seperation while (p[count] && !isspace(p[count]) && count) count--; if (count == 0) for (count = 0; p[count] != 0 && p[count] != '\n' && count < wid; count++) ; for (int i = 0; i < count; i++) out << *p++; // skip spaces while (*p != 0 && isspace(*p)) { if (*p == '\n') out << endl; p++; } if (*p && !newlines) out << endl; initial = 0; ind = indent; wid = width - ind; } } string ucfirst(string const &str) { string out; bool newword = true; for (string::const_iterator i = str.begin(); i != str.end(); i++) if (newword && isalpha(*i)) { out += toupper(*i); newword = false; } else { if (isspace(*i) || ispunct(*i)) { newword = true; } out += *i; } return out; } string exec(string const &command) { FILE *pipe = popen(command.c_str(), "r"); char buffer[1024]; string out; if (!pipe) throw runtime_error("exec(): failed, " + string(strerror(errno))); while (fgets(buffer, 1024, pipe)) out += buffer; fclose(pipe); return out; } string join(string const &delim, vector<string> const &components) { string out; vector<string>::const_iterator end = components.end(); end--; for (vector<string>::const_iterator i = components.begin(); i != components.end(); i++) { out += (*i); if (i != end) out += delim; } return out; } vector<string> split(string const &delim, string const &text) { unsigned start = 0; vector<string> out; for (unsigned i = start; i < text.size(); i++) if (!strncmp(text.c_str() + i, delim.c_str(), delim.size())) { out.push_back(text.substr(start, i - start)); start = i + delim.size(); i += delim.size() - 1; } out.push_back(text.substr(start)); return out; } string replace(string const &match, string const &repl, string const &in) { string out; string::size_type found = 0, lastfound = 0; while ((found = in.find(match, found)) != string::npos) { out += in.substr(lastfound, found - lastfound); out += repl; found += match.size(); lastfound = found; } out += in.substr(lastfound); return out; } string htmlify(string const &_str) { string out; char const *str = _str.c_str(); for (unsigned i = 0; i < _str.size(); i++) switch (str[i]) { case '<' : out += "<"; break; case '>' : out += ">"; break; case '&' : out += "&"; break; default : out += str[i]; break; } return out; } string unhtmlify(string const &_str) { string out; char const *str = _str.c_str(); for (unsigned i = 0; i < _str.size(); i++) if (str[i] == '&') { ++i; if (!strncmp(str + i, "amp;", 4)) { i += 3; out += '&'; } else if (!strncmp(str + i, "gt;", 3)) { i += 3; out += '>'; } else if (!strncmp(str + i, "lt;", 3)) { i += 3; out += '<'; } else { out += '&'; out += str[i]; } } else out += str[i]; return out; } string addcslashes(string const &_str) { string out; char const *str = _str.c_str(); for (unsigned i = 0; i < _str.size(); ++i) switch (str[i]) { case '\n' : out += "\\n"; break; case '\r' : out += "\\r"; break; case '\t' : out += "\\t"; break; case '"' : out += "\\\""; break; case '\\' : out += "\\\\"; break; default : out += str[i]; break; } return out; } string stripcslashes(string const &_str) { string out; char const *str = _str.c_str(); for (unsigned i = 0; i < _str.size(); ++i) if (str[i] == '\\') { ++i; switch (str[i]) { case 'n' : out += '\n'; break; case 'r' : out += '\r'; break; case 't' : out += '\t'; break; case '"' : out += '"'; break; case '\\' : out += '\\'; break; default : out += str[i]; break; } } else out += str[i]; return out; } } ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/Lexer.cc������������������������������������������������������������������������0000644�0000000�0000000�00000004323�13722700563�015063� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#include "Lexer.h" Lexer::Lexer() : initialised(false) { } Lexer::~Lexer() { } Lexer &Lexer::addPattern(unsigned index, char const *in, bool ignore) { // This was to allow statically allocated Lexer's. If regexp's were // initialised globally, they may have been constructed before // the Regex::cache was initialised...this caused segfaults. Bad. if (!initialised) { Pattern &p = pattern[Character]; try { p.rx = "."; p.ignore = false; p.enabled = true; } catch (Regex::exception &e) { throw runtime_error("Lexer::Lexer: index " + str::stringify(index) + ", " + string(e.what())); } initialised = true; } if (index < 256) throw runtime_error("pattern indices under 256 are reserved, relevant pattern is '" + string(in) + "'"); Pattern &p = pattern[index]; try { p.rx = in; p.ignore = ignore; p.enabled = true; } catch (Regex::exception &e) { throw runtime_error("Lexer::addPattern: index " + str::stringify(index) + ", " + string(e.what())); } return *this; } bool Lexer::get(iterator &it, char const *&in, unsigned &line) { if (*in == 0) return false; bool ignore; do { ignore = false; for (map<int, Pattern>::iterator i = pattern.begin(); i != pattern.end(); ++i) if ((*i).second.enabled) { Pattern &p = (*i).second; int length; if ((length = p.rx.matchStart(in)) != -1) { for (int j = 0; j < length; ++j) if (in[j] == '\n') line++; if (p.ignore) { in += length; ignore = true; if (*in == 0) return false; break; } else { it._value = string(in, length); it._line = line; // Handle individual characters if ((*i).first == Character) it._type = *in; else it._type = (*i).first; it.in = in; in += length; return true; } } } } while (ignore); return false; } void Lexer::ignore(unsigned index, bool state) { if (pattern.find(index) == pattern.end()) throw exception("tried to ignore/unignore an unknown token", 0); pattern[index].ignore = state; } void Lexer::enable(unsigned index, bool state) { if (pattern.find(index) == pattern.end()) throw exception("tried to enable/disable an unknown token", 0); pattern[index].enabled = state; } �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/CommandArgs.cc������������������������������������������������������������������0000644�0000000�0000000�00000011555�13722700563�016204� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#include "CommandArgs.h" CommandArgs::CommandArgs() { } CommandArgs::~CommandArgs() { } void CommandArgs::addArgument(int shortarg, string const &longarg, Parameter argument, string const &help) { Arg arg; arg.shortarg = shortarg; arg.argument = argument; arg.longarg = longarg; arg.help = help; args.push_back(arg); } void CommandArgs::setHelp(int shortarg, string const &help) { for (vector<CommandArgs::Arg>::iterator i = args.begin(); i != args.end(); i++) if ((*i).shortarg == shortarg) { (*i).help = help; return; } throw exception(string("couldn't set help for non-existent option '") + (char)shortarg + "'"); } bool CommandArgs::iterator::get() { if (!argv || arg >= argc) return false; _type = Unknown; _argument = 0; value = 0; // continuing existing short parameter list if (inshort) { char c = argv[arg][inshort]; for (vector<CommandArgs::Arg>::iterator i = cmdarg->args.begin(); i != cmdarg->args.end(); i++) { CommandArgs::Arg const &a = (*i); if (a.shortarg == c) { _type = Argument; // option with a parameter, but still more short args to go if (a.argument == Required && argv[arg][inshort + 1] != 0) { value = argv[arg] + inshort + 1; inshort = 0; _argument = c; arg++; return true; } else { inshort++; // next argument? if (argv[arg][inshort] == 0) { inshort = 0; arg++; // optional argument present? if (arg < argc && a.argument == Optional && argv[arg][0] != '-') { value = argv[arg]; arg++; } } _argument = c; if (a.argument == Required) { if (arg >= argc) throw exception(string("expected parameter to argument '-") + c + "'"); value = argv[arg++]; } _type = Argument; return true; } } } value = argv[arg++]; return true; } // check options if (argv[arg][0] == '-') { for (vector<CommandArgs::Arg>::iterator i = cmdarg->args.begin(); i != cmdarg->args.end(); i++) { CommandArgs::Arg a = (*i); // long arg? if (argv[arg][1] == '-') { if (!a.longarg.compare(argv[arg] + 2)) { _argument = a.shortarg; arg++; if (a.argument == Required) { if (arg >= argc) throw exception(string("expected parameter to argument '--") + a.longarg + "'"); value = argv[arg++]; } else // optional argument present? if (a.argument == Optional && arg < argc && argv[arg][0] != '-') { value = argv[arg]; arg++; } _type = Argument; return true; } // short arg? } else { if (a.shortarg == argv[arg][1]) { char c = argv[arg][1]; if (a.argument == Required && argv[arg][2] != 0) { _argument = c; _type = Argument; value = argv[arg] + 2; inshort = 0; arg++; return true; } else { inshort = 2; // next argument? if (argv[arg][inshort] == 0) { inshort = 0; arg++; // optional argument present? if (arg < argc && a.argument == Optional && argv[arg][0] != '-') { value = argv[arg]; arg++; } } _argument = c; if (a.argument == Required) { if (arg >= argc) throw exception(string("expected parameter to argument '") + c + "'"); value = argv[arg++]; } _type = Argument; return true; } } } } value = argv[arg++]; return true; } else { _type = Unknown; value = argv[arg++]; _argument = 0; } return true; } void CommandArgs::displayHelp(ostream &out, int termwidth) { unsigned max = 0; // calculate widest argument for (vector<CommandArgs::Arg>::iterator i = args.begin(); i != args.end(); i++) { Arg &arg = *i; string tmp; if (arg.help == "") continue; if (arg.shortarg > 0) { tmp += '-'; tmp += (char)arg.shortarg; } if (arg.longarg != "") { if (tmp != "") tmp += ", "; tmp += "--" + arg.longarg; } if (arg.argument == Required) { tmp += " ARG"; } else if (arg.argument == Optional) { tmp += " [ARG]"; } if (max < tmp.size()) max = tmp.size(); } for (vector<CommandArgs::Arg>::iterator i = args.begin(); i != args.end(); i++) { Arg &arg = *i; string tmp; if (arg.help == "") continue; if (arg.shortarg > 0) { tmp += '-'; tmp += (char)arg.shortarg; } if (arg.longarg != "") { if (tmp != "") tmp += ", "; tmp += "--" + arg.longarg; } if (arg.argument == Required) { tmp += " ARG"; } else if (arg.argument == Optional) { tmp += " [ARG]"; } out << tmp; str::wraptext(out, arg.help, max + 2, tmp.size(), termwidth); out << endl; } } string const &CommandArgs::iterator::longOption() const { assert(cmdarg); for (vector<CommandArgs::Arg>::iterator i = cmdarg->args.begin(); !(i == cmdarg->args.end()); ++i) if (_argument == (*i).shortarg) return (*i).longarg; throw exception("unknown option index '" + str::stringify(_argument) + "'"); } ���������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/CommandArgs.h�������������������������������������������������������������������0000644�0000000�0000000�00000004441�13722700563�016042� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#ifndef CRASH_COMMANDARGS #define CRASH_COMMANDARGS #include <stdexcept> #include <string> #include <vector> #include <iostream> #include <cassert> #include "Strings.h" using namespace std; /* CommandArgs is a class to parse command line arguments. 04/02/01 Initial creation */ class CommandArgs { public : class exception : public runtime_error { public : exception(string const &what) : runtime_error(what.c_str()) {} }; enum Parameter { None, Required, Optional }; CommandArgs(); ~CommandArgs(); void addArgument(int shortarg, string const &longarg = "", Parameter parameter = None, string const &help = ""); void setHelp(int shortarg, string const &help); void displayHelp(ostream &out, int termwidth = 80); class iterator { public : enum Type { Argument, Unknown }; iterator() : cmdarg(0), inshort(0), _type(Unknown), value(0), arg(0), argc(0), _argument(-1), argv(0) {} int operator == (iterator const &other) const { return argv == other.argv && argc == other.argc; } iterator operator ++ () { if (!get()) { argc = -1; argv = 0; } return *this; } iterator operator ++ (int) { iterator j = *this; if (!get()) { argc = -1; argv = 0; } return j; } Type type() { return _type; } int option() const { return _argument; } string const &longOption() const; char const *parameter() const { return value; } private : iterator(CommandArgs *cmdarg, int argc, char const **argv) : cmdarg(cmdarg), inshort(0), _type(Unknown), value(0), arg(0), argc(argc), _argument(-1), argv(argv) { get(); } bool get(); CommandArgs *cmdarg; int inshort; Type _type; char const *value; int arg, argc, _argument; char const **argv; friend class CommandArgs; }; iterator begin(int argc, char const **argv) { return iterator(this, argc - 1, argv + 1); } iterator end() { return iterator(this, -1, 0); } private : friend class iterator; // Private, so object can't be default copied. CommandArgs(CommandArgs const ©) {} CommandArgs &operator = (CommandArgs const ©) { return *this; } struct Arg { int shortarg; string longarg, help; int argument; }; vector<Arg> args; }; #endif �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/Terminal.cc���������������������������������������������������������������������0000644�0000000�0000000�00000004130�13722700563�015553� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#include "Terminal.h" namespace term { // this is SUCH a dodgy hack static bool forcecolour = false; void forceColour(bool state) { forcecolour = state; } string title(string const &str) { return string("]0;") + str + ""; } string background(Colour colour) { return string("[4") + str::stringify(colour) + "m"; } string foreground(Colour colour) { return string("[3") + str::stringify(colour) + "m"; } string colour(Colour colour) { return string("[3") + str::stringify(colour) + "m"; } string attribute(Attribute attribute) { return string("[") + str::stringify(attribute) + "m"; } ostream &black(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &red(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &green(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &yellow(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &blue(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &magenta(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &cyan(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &white(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &normal(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &bold(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &halfbright(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &underline(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &blink(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } ostream &reverse(ostream &os) { if (&cout == &os && (forcecolour || isatty(1))) os << ""; return os; } } ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/Terminal.h����������������������������������������������������������������������0000644�0000000�0000000�00000002062�13722700563�015417� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#ifndef CRASH_TERMINAL #define CRASH_TERMINAL #include <unistd.h> #include <stdexcept> #include <iostream> #include "Strings.h" using namespace std; namespace term { enum Colour { Black, Red, Green, Brown, Blue, Magenta, Cyan, White }; enum Attribute { Normal, Bold, HalfBright, Underline = 4, Blink, Reverse = 7, }; void forceColour(bool state); // Used for getting the string representation of terminal attributes string background(Colour colour); string foreground(Colour colour); string colour(Colour colour); string attribute(Attribute attribute); string title(string const &str); // Stream-oriented terminal attributes ostream &black(ostream &os); ostream &red(ostream &os); ostream &green(ostream &os); ostream &yellow(ostream &os); ostream &blue(ostream &os); ostream &magenta(ostream &os); ostream &cyan(ostream &os); ostream &white(ostream &os); ostream &normal(ostream &os); ostream &bold(ostream &os); ostream &halfbright(ostream &os); ostream &underline(ostream &os); ostream &blink(ostream &os); ostream &reverse(ostream &os); } #endif ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/XML.h���������������������������������������������������������������������������0000644�0000000�0000000�00000003716�13722700563�014313� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#ifndef CRASH_XML #define CRASH_XML #include <algorithm> #include <cstring> #include <stdexcept> #include <vector> #include <map> #include <string> #include "Strings.h" #include "Lexer.h" using namespace std; class XML { public : enum Type { Element = 256, Body, Data }; #ifdef CRASH_SIGNAL static Signal2<string const &, map<string, string> const &> onElementBegin; static Signal1<string const &> onElementEnd; static Signal2<XML const &, string const &> onBody; static Signal2<XML const &, string const &> onData; #endif class exception : public runtime_error { public : exception(string const &what, int line) : runtime_error(what.c_str()), _line(line) {} int line() const { return _line; } private : int _line; }; XML(); XML(char const *input); virtual ~XML(); void parse(char const *input); Type type() const { return _type; } string const &name() const { return _data; } string const &body() const { return _data; } string const &data() const { return _data; } vector<XML*> const &child() const { return _child; } map<string, string> const &attrib() const { return _attrib; } protected : XML(Type type, XML *parent, Lexer::iterator &token); void init(); void skip(Lexer::iterator &token); void next(Lexer::iterator &token); void parseElement(Lexer::iterator &token); void parseBody(Lexer::iterator &token); void parseData(Lexer::iterator &token); // Lexer constants enum { XmlDecl = 256, XmlCommentBegin, XmlBegin, XmlEnd, XmlDataBegin, XmlContent }; enum { ElementWS = 256, ElementValue, ElementKey, ElementAssignment, ElementTerminator}; enum { CommentEnd = 256, CommentBody }; enum { DataEnd = 256, DataBody }; enum { ProcessBegin = 256, ProcessBody, ProcessEnd }; static bool initialised; static Lexer xmlScan, tagScan, commentScan, dataScan, processScan; XML *_parent; Type _type; string _data; map<string, string> _attrib; vector<XML*> _child; }; #endif ��������������������������������������������������devtodo-master/util/Strings.h�����������������������������������������������������������������������0000644�0000000�0000000�00000006321�13722700563�015277� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#ifndef CRASH_STRINGS #define CRASH_STRINGS #include <cstdio> #include <cstring> #include <string> #include <cerrno> #include <iostream> #include <vector> #include <sstream> #include <stdexcept> using namespace std; namespace str { string join(string const &delim, vector<string> const &components); vector<string> split(string const &delim, string const &text); string replace(string const &match, string const &repl, string const &in); inline int count(string const &in, string const &delim = " \t\n\r") { int count = 1; for (unsigned i = 0; i < in.size(); i++) if (delim.find(in[i]) != string::npos) { count++; i += delim.size() - 1; } return count; } inline string dirname(string const &filename) { int slash = filename.rfind('/'); if (slash == -1) return "."; return filename.substr(0, slash); } string exec(string const &command); string ucfirst(string const &str); inline string reverse(string const &str) { string out; for (string::const_reverse_iterator i = str.rbegin(); i != str.rend(); i++) out += *i; return out; } inline string uppercase(string const &str) { string out; for (string::const_iterator i = str.begin(); i != str.end(); i++) out += toupper(*i); return out; } inline string lowercase(string const &str) { string out; for (string::const_iterator i = str.begin(); i != str.end(); i++) out += tolower(*i); return out; } inline string invertcase(string const &str) { string out; for (string::const_iterator i = str.begin(); i != str.end(); i++) out += islower(*i) ? toupper(*i) : tolower(*i); return out; } inline string ltrim(string const &str) { for (unsigned i = 0; i < str.size(); i++) if (!isspace(str[i])) return str.substr(i); return ""; } inline string rtrim(string const &str) { for (int i = str.size() - 1; i >= 0; i--) if (!isspace(str[i])) return str.substr(0, i + 1); return ""; } inline string trim(string const &str) { return rtrim(ltrim(str)); } /// Convert all HTML-able characters. ie. >, < and & string htmlify(string const &str); /// Convert HTML digraphs to their character forms. string unhtmlify(string const &str); /// Convert all C escapable characters to their escaped form. string addcslashes(string const &str); /// Convert escaped characters to their original forms. string stripcslashes(string const &str); void wraptext(ostream &out, string const &in, unsigned indent = 0, unsigned initialindent = 0, unsigned width = 80); inline string basename(string const &filename) { return filename.substr(filename.rfind('/') + 1); } /** Convert a type to a string. A simple, but convenient, wrapper around ostrstream. If a type has a ostream << operator associated with it, this function will work. @parameter var Variable to convert. */ template <typename T> string stringify(T const &t) { ostringstream os; os << t; return os.str(); } /** Convert a string to a type. Requires that the type have an istream >> operator assoicated with it. eg. string s = "10"; int i; i = destringify<int>(s); @parameter str Strings to convert to type T. */ template <typename T> T destringify(string const &str) { istringstream os(str); T t; os >> t; if (!os.good() && !os.eof()) throw runtime_error("can't destringify '" + str + "'"); return t; } } #endif ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/README��������������������������������������������������������������������������0000644�0000000�0000000�00000000141�13722700563�014347� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������These source files can be regenerated if you have libCrash. This can be done with: make rebuild �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/c_regex.h�����������������������������������������������������������������������0000644�0000000�0000000�00000052042�13722700563�015263� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* Definitions for data structures and routines for the regular expression library, version 0.12. Copyright (C) 1985,1989-1993,1995-1998, 2000 Free Software Foundation, Inc. This file is part of the GNU C Library. Its master source is NOT part of the C library, however. The master source lives in /gd/gnu/lib. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with the GNU C Library; see the file COPYING.LIB. If not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _REGEX_H #define _REGEX_H 1 /* Allow the use in C++ code. */ #ifdef __cplusplus extern "C" { #endif /* POSIX says that <sys/types.h> must be included (by the caller) before <regex.h>. */ #if !defined _POSIX_C_SOURCE && !defined _POSIX_SOURCE && defined VMS /* VMS doesn't have `size_t' in <sys/types.h>, even though POSIX says it should be there. */ # include <stddef.h> #endif /* The following two types have to be signed and unsigned integer type wide enough to hold a value of a pointer. For most ANSI compilers ptrdiff_t and size_t should be likely OK. Still size of these two types is 2 for Microsoft C. Ugh... */ typedef long int s_reg_t; typedef unsigned long int active_reg_t; /* The following bits are used to determine the regexp syntax we recognize. The set/not-set meanings are chosen so that Emacs syntax remains the value 0. The bits are given in alphabetical order, and the definitions shifted by one from the previous bit; thus, when we add or remove a bit, only one other definition need change. */ typedef unsigned long int reg_syntax_t; /* If this bit is not set, then \ inside a bracket expression is literal. If set, then such a \ quotes the following character. */ #define RE_BACKSLASH_ESCAPE_IN_LISTS ((unsigned long int) 1) /* If this bit is not set, then + and ? are operators, and \+ and \? are literals. If set, then \+ and \? are operators and + and ? are literals. */ #define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1) /* If this bit is set, then character classes are supported. They are: [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:], [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:]. If not set, then character classes are not supported. */ #define RE_CHAR_CLASSES (RE_BK_PLUS_QM << 1) /* If this bit is set, then ^ and $ are always anchors (outside bracket expressions, of course). If this bit is not set, then it depends: ^ is an anchor if it is at the beginning of a regular expression or after an open-group or an alternation operator; $ is an anchor if it is at the end of a regular expression, or before a close-group or an alternation operator. This bit could be (re)combined with RE_CONTEXT_INDEP_OPS, because POSIX draft 11.2 says that * etc. in leading positions is undefined. We already implemented a previous draft which made those constructs invalid, though, so we haven't changed the code back. */ #define RE_CONTEXT_INDEP_ANCHORS (RE_CHAR_CLASSES << 1) /* If this bit is set, then special characters are always special regardless of where they are in the pattern. If this bit is not set, then special characters are special only in some contexts; otherwise they are ordinary. Specifically, * + ? and intervals are only special when not after the beginning, open-group, or alternation operator. */ #define RE_CONTEXT_INDEP_OPS (RE_CONTEXT_INDEP_ANCHORS << 1) /* If this bit is set, then *, +, ?, and { cannot be first in an re or immediately after an alternation or begin-group operator. */ #define RE_CONTEXT_INVALID_OPS (RE_CONTEXT_INDEP_OPS << 1) /* If this bit is set, then . matches newline. If not set, then it doesn't. */ #define RE_DOT_NEWLINE (RE_CONTEXT_INVALID_OPS << 1) /* If this bit is set, then . doesn't match NUL. If not set, then it does. */ #define RE_DOT_NOT_NULL (RE_DOT_NEWLINE << 1) /* If this bit is set, nonmatching lists [^...] do not match newline. If not set, they do. */ #define RE_HAT_LISTS_NOT_NEWLINE (RE_DOT_NOT_NULL << 1) /* If this bit is set, either \{...\} or {...} defines an interval, depending on RE_NO_BK_BRACES. If not set, \{, \}, {, and } are literals. */ #define RE_INTERVALS (RE_HAT_LISTS_NOT_NEWLINE << 1) /* If this bit is set, +, ? and | aren't recognized as operators. If not set, they are. */ #define RE_LIMITED_OPS (RE_INTERVALS << 1) /* If this bit is set, newline is an alternation operator. If not set, newline is literal. */ #define RE_NEWLINE_ALT (RE_LIMITED_OPS << 1) /* If this bit is set, then `{...}' defines an interval, and \{ and \} are literals. If not set, then `\{...\}' defines an interval. */ #define RE_NO_BK_BRACES (RE_NEWLINE_ALT << 1) /* If this bit is set, (...) defines a group, and \( and \) are literals. If not set, \(...\) defines a group, and ( and ) are literals. */ #define RE_NO_BK_PARENS (RE_NO_BK_BRACES << 1) /* If this bit is set, then \<digit> matches <digit>. If not set, then \<digit> is a back-reference. */ #define RE_NO_BK_REFS (RE_NO_BK_PARENS << 1) /* If this bit is set, then | is an alternation operator, and \| is literal. If not set, then \| is an alternation operator, and | is literal. */ #define RE_NO_BK_VBAR (RE_NO_BK_REFS << 1) /* If this bit is set, then an ending range point collating higher than the starting range point, as in [z-a], is invalid. If not set, then when ending range point collates higher than the starting range point, the range is ignored. */ #define RE_NO_EMPTY_RANGES (RE_NO_BK_VBAR << 1) /* If this bit is set, then an unmatched ) is ordinary. If not set, then an unmatched ) is invalid. */ #define RE_UNMATCHED_RIGHT_PAREN_ORD (RE_NO_EMPTY_RANGES << 1) /* If this bit is set, succeed as soon as we match the whole pattern, without further backtracking. */ #define RE_NO_POSIX_BACKTRACKING (RE_UNMATCHED_RIGHT_PAREN_ORD << 1) /* If this bit is set, do not process the GNU regex operators. If not set, then the GNU regex operators are recognized. */ #define RE_NO_GNU_OPS (RE_NO_POSIX_BACKTRACKING << 1) /* If this bit is set, turn on internal regex debugging. If not set, and debugging was on, turn it off. This only works if regex.c is compiled -DDEBUG. We define this bit always, so that all that's needed to turn on debugging is to recompile regex.c; the calling code can always have this bit set, and it won't affect anything in the normal case. */ #define RE_DEBUG (RE_NO_GNU_OPS << 1) /* If this bit is set, a syntactically invalid interval is treated as a string of ordinary characters. For example, the ERE 'a{1' is treated as 'a\{1'. */ #define RE_INVALID_INTERVAL_ORD (RE_DEBUG << 1) /* This global variable defines the particular regexp syntax to use (for some interfaces). When a regexp is compiled, the syntax used is stored in the pattern buffer, so changing this does not affect already-compiled regexps. */ extern reg_syntax_t re_syntax_options; /* Define combinations of the above bits for the standard possibilities. (The [[[ comments delimit what gets put into the Texinfo file, so don't delete them!) */ /* [[[begin syntaxes]]] */ #define RE_SYNTAX_EMACS 0 #define RE_SYNTAX_AWK \ (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \ | RE_DOT_NEWLINE | RE_CONTEXT_INDEP_ANCHORS \ | RE_UNMATCHED_RIGHT_PAREN_ORD | RE_NO_GNU_OPS) #define RE_SYNTAX_GNU_AWK \ ((RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DEBUG) \ & ~(RE_DOT_NOT_NULL | RE_INTERVALS | RE_CONTEXT_INDEP_OPS)) #define RE_SYNTAX_POSIX_AWK \ (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS \ | RE_INTERVALS | RE_NO_GNU_OPS) #define RE_SYNTAX_GREP \ (RE_BK_PLUS_QM | RE_CHAR_CLASSES \ | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \ | RE_NEWLINE_ALT) #define RE_SYNTAX_EGREP \ (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \ | RE_NEWLINE_ALT | RE_NO_BK_PARENS \ | RE_NO_BK_VBAR) #define RE_SYNTAX_POSIX_EGREP \ (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES \ | RE_INVALID_INTERVAL_ORD) /* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */ #define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC #define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC /* Syntax bits common to both basic and extended POSIX regex syntax. */ #define _RE_SYNTAX_POSIX_COMMON \ (RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \ | RE_INTERVALS | RE_NO_EMPTY_RANGES) #define RE_SYNTAX_POSIX_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM) /* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this isn't minimal, since other operators, such as \`, aren't disabled. */ #define RE_SYNTAX_POSIX_MINIMAL_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS) #define RE_SYNTAX_POSIX_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_VBAR \ | RE_CONTEXT_INVALID_OPS | RE_UNMATCHED_RIGHT_PAREN_ORD) /* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INDEP_OPS is removed and RE_NO_BK_REFS is added. */ #define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD) /* [[[end syntaxes]]] */ /* Maximum number of duplicates an interval can allow. Some systems (erroneously) define this in other header files, but we want our value, so remove any previous define. */ #ifdef RE_DUP_MAX # undef RE_DUP_MAX #endif /* If sizeof(int) == 2, then ((1 << 15) - 1) overflows. */ #define RE_DUP_MAX (0x7fff) /* POSIX `cflags' bits (i.e., information for `regcomp'). */ /* If this bit is set, then use extended regular expression syntax. If not set, then use basic regular expression syntax. */ #define REG_EXTENDED 1 /* If this bit is set, then ignore case when matching. If not set, then case is significant. */ #define REG_ICASE (REG_EXTENDED << 1) /* If this bit is set, then anchors do not match at newline characters in the string. If not set, then anchors do match at newlines. */ #define REG_NEWLINE (REG_ICASE << 1) /* If this bit is set, then report only success or fail in regexec. If not set, then returns differ between not matching and errors. */ #define REG_NOSUB (REG_NEWLINE << 1) /* POSIX `eflags' bits (i.e., information for regexec). */ /* If this bit is set, then the beginning-of-line operator doesn't match the beginning of the string (presumably because it's not the beginning of a line). If not set, then the beginning-of-line operator does match the beginning of the string. */ #define REG_NOTBOL 1 /* Like REG_NOTBOL, except for the end-of-line. */ #define REG_NOTEOL (1 << 1) /* If any error codes are removed, changed, or added, update the `re_error_msg' table in regex.c. */ typedef enum { #ifdef _XOPEN_SOURCE REG_ENOSYS = -1, /* This will never happen for this implementation. */ #endif REG_NOERROR = 0, /* Success. */ REG_NOMATCH, /* Didn't find a match (for regexec). */ /* POSIX regcomp return error codes. (In the order listed in the standard.) */ REG_BADPAT, /* Invalid pattern. */ REG_ECOLLATE, /* Not implemented. */ REG_ECTYPE, /* Invalid character class name. */ REG_EESCAPE, /* Trailing backslash. */ REG_ESUBREG, /* Invalid back reference. */ REG_EBRACK, /* Unmatched left bracket. */ REG_EPAREN, /* Parenthesis imbalance. */ REG_EBRACE, /* Unmatched \{. */ REG_BADBR, /* Invalid contents of \{\}. */ REG_ERANGE, /* Invalid range end. */ REG_ESPACE, /* Ran out of memory. */ REG_BADRPT, /* No preceding re for repetition op. */ /* Error codes we've added. */ REG_EEND, /* Premature end. */ REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */ REG_ERPAREN /* Unmatched ) or \); not returned from regcomp. */ } reg_errcode_t; /* This data structure represents a compiled pattern. Before calling the pattern compiler, the fields `buffer', `allocated', `fastmap', `translate', and `no_sub' can be set. After the pattern has been compiled, the `re_nsub' field is available. All other fields are private to the regex routines. */ #ifndef RE_TRANSLATE_TYPE # define RE_TRANSLATE_TYPE char * #endif struct re_pattern_buffer { /* [[[begin pattern_buffer]]] */ /* Space that holds the compiled pattern. It is declared as `unsigned char *' because its elements are sometimes used as array indexes. */ unsigned char *buffer; /* Number of bytes to which `buffer' points. */ unsigned long int allocated; /* Number of bytes actually used in `buffer'. */ unsigned long int used; /* Syntax setting with which the pattern was compiled. */ reg_syntax_t syntax; /* Pointer to a fastmap, if any, otherwise zero. re_search uses the fastmap, if there is one, to skip over impossible starting points for matches. */ char *fastmap; /* Either a translate table to apply to all characters before comparing them, or zero for no translation. The translation is applied to a pattern when it is compiled and to a string when it is matched. */ RE_TRANSLATE_TYPE translate; /* Number of subexpressions found by the compiler. */ size_t re_nsub; /* Zero if this pattern cannot match the empty string, one else. Well, in truth it's used only in `re_search_2', to see whether or not we should use the fastmap, so we don't set this absolutely perfectly; see `re_compile_fastmap' (the `duplicate' case). */ unsigned can_be_null : 1; /* If REGS_UNALLOCATED, allocate space in the `regs' structure for `max (RE_NREGS, re_nsub + 1)' groups. If REGS_REALLOCATE, reallocate space if necessary. If REGS_FIXED, use what's there. */ #define REGS_UNALLOCATED 0 #define REGS_REALLOCATE 1 #define REGS_FIXED 2 unsigned regs_allocated : 2; /* Set to zero when `regex_compile' compiles a pattern; set to one by `re_compile_fastmap' if it updates the fastmap. */ unsigned fastmap_accurate : 1; /* If set, `re_match_2' does not return information about subexpressions. */ unsigned no_sub : 1; /* If set, a beginning-of-line anchor doesn't match at the beginning of the string. */ unsigned not_bol : 1; /* Similarly for an end-of-line anchor. */ unsigned not_eol : 1; /* If true, an anchor at a newline matches. */ unsigned newline_anchor : 1; /* [[[end pattern_buffer]]] */ }; typedef struct re_pattern_buffer regex_t; /* Type for byte offsets within the string. POSIX mandates this. */ typedef int regoff_t; /* This is the structure we store register match data in. See regex.texinfo for a full description of what registers match. */ struct re_registers { unsigned num_regs; regoff_t *start; regoff_t *end; }; /* If `regs_allocated' is REGS_UNALLOCATED in the pattern buffer, `re_match_2' returns information about at least this many registers the first time a `regs' structure is passed. */ #ifndef RE_NREGS # define RE_NREGS 30 #endif /* POSIX specification for registers. Aside from the different names than `re_registers', POSIX uses an array of structures, instead of a structure of arrays. */ typedef struct { regoff_t rm_so; /* Byte offset from string's start to substring's start. */ regoff_t rm_eo; /* Byte offset from string's start to substring's end. */ } regmatch_t; /* Declarations for routines. */ /* To avoid duplicating every routine declaration -- once with a prototype (if we are ANSI), and once without (if we aren't) -- we use the following macro to declare argument types. This unfortunately clutters up the declarations a bit, but I think it's worth it. */ #if __STDC__ # define _RE_ARGS(args) args #else /* not __STDC__ */ # define _RE_ARGS(args) () #endif /* not __STDC__ */ /* Sets the current default syntax to SYNTAX, and return the old syntax. You can also simply assign to the `re_syntax_options' variable. */ extern reg_syntax_t re_set_syntax _RE_ARGS ((reg_syntax_t syntax)); /* Compile the regular expression PATTERN, with length LENGTH and syntax given by the global `re_syntax_options', into the buffer BUFFER. Return NULL if successful, and an error string if not. */ extern const char *re_compile_pattern _RE_ARGS ((const char *pattern, size_t length, struct re_pattern_buffer *buffer)); /* Compile a fastmap for the compiled pattern in BUFFER; used to accelerate searches. Return 0 if successful and -2 if was an internal error. */ extern int re_compile_fastmap _RE_ARGS ((struct re_pattern_buffer *buffer)); /* Search in the string STRING (with length LENGTH) for the pattern compiled into BUFFER. Start searching at position START, for RANGE characters. Return the starting position of the match, -1 for no match, or -2 for an internal error. Also return register information in REGS (if REGS and BUFFER->no_sub are nonzero). */ extern int re_search _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string, int length, int start, int range, struct re_registers *regs)); /* Like `re_search', but search in the concatenation of STRING1 and STRING2. Also, stop searching at index START + STOP. */ extern int re_search_2 _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1, int length1, const char *string2, int length2, int start, int range, struct re_registers *regs, int stop)); /* Like `re_search', but return how many characters in STRING the regexp in BUFFER matched, starting at position START. */ extern int re_match _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string, int length, int start, struct re_registers *regs)); /* Relates to `re_match' as `re_search_2' relates to `re_search'. */ extern int re_match_2 _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1, int length1, const char *string2, int length2, int start, struct re_registers *regs, int stop)); /* Set REGS to hold NUM_REGS registers, storing them in STARTS and ENDS. Subsequent matches using BUFFER and REGS will use this memory for recording register information. STARTS and ENDS must be allocated with malloc, and must each be at least `NUM_REGS * sizeof (regoff_t)' bytes long. If NUM_REGS == 0, then subsequent matches should allocate their own register data. Unless this function is called, the first search or match using PATTERN_BUFFER will allocate its own register data, without freeing the old data. */ extern void re_set_registers _RE_ARGS ((struct re_pattern_buffer *buffer, struct re_registers *regs, unsigned num_regs, regoff_t *starts, regoff_t *ends)); #if defined _REGEX_RE_COMP || defined _LIBC # ifndef _CRAY /* 4.2 bsd compatibility. */ extern char *re_comp _RE_ARGS ((const char *)); extern int re_exec _RE_ARGS ((const char *)); # endif #endif /* GCC 2.95 and later have "__restrict"; C99 compilers have "restrict", and "configure" may have defined "restrict". */ #ifndef __restrict # if ! (2 < __GNUC__ || (2 == __GNUC__ && 95 <= __GNUC_MINOR__)) # if defined restrict || 199901L <= __STDC_VERSION__ # define __restrict restrict # else # define __restrict # endif # endif #endif /* For now unconditionally define __restrict_arr to expand to nothing. Ideally we would have a test for the compiler which allows defining it to restrict. */ #define __restrict_arr /* POSIX compatibility. */ extern int regcomp _RE_ARGS ((regex_t *__restrict __preg, const char *__restrict __pattern, int __cflags)); extern int regexec _RE_ARGS ((const regex_t *__restrict __preg, const char *__restrict __string, size_t __nmatch, regmatch_t __pmatch[__restrict_arr], int __eflags)); extern size_t regerror _RE_ARGS ((int __errcode, const regex_t *__preg, char *__errbuf, size_t __errbuf_size)); extern void regfree _RE_ARGS ((regex_t *__preg)); #ifdef __cplusplus } #endif /* C++ */ #endif /* regex.h */ /* Local variables: make-backup-files: t version-control: t trim-versions-without-asking: nil End: */ ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/Lexer.h�������������������������������������������������������������������������0000644�0000000�0000000�00000005701�13722700563�014726� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#ifndef CRASH_LEXER #define CRASH_LEXER #include <stdexcept> #include <string> #include <map> #include <iterator> #include "Strings.h" #include "Regex.h" using namespace std; /** Lexer is a lexical analyser. 02/01/01 Fixed the bug where ignored tokens caused it to die horribly. 20/09/00 Restarted after I couldn't find an annoying bug. The interface needed to be cleaned up anyway. 24/08/00 Fixed a bug where the list of tokens wasn't being cleared in between calls to the scanner. 13/08/00 Created. */ class Lexer { public : class exception : public runtime_error { public : exception(string const &str, unsigned line) : runtime_error(str), _line(line) {} unsigned line() { return _line; } private : unsigned _line; }; class iterator { public : iterator() : tokeniser(0), in(0), _line(1) {} operator string const & () const { return _value; } iterator operator ++ (int) { iterator i = *this; get(); return i; } iterator &operator ++ () { get(); return *this; } int operator != (iterator const &other) const { return in != other.in || tokeniser != other.tokeniser; } int operator == (iterator const &other) const { return in == other.in && tokeniser == other.tokeniser; } int operator [] (unsigned index) const { return _value[index]; } unsigned type() const { return _type; } unsigned line() const { return _line; } unsigned size() const { return _value.size(); } string const &value() const { return _value; } char const *source() const { return in; } void skip(unsigned skip) { for (unsigned i = 0; i < skip; i++) if (*in) { if (*in == '\n') _line++; in++; } else break; } private : void get() { assert(in && tokeniser); if (!tokeniser->get(*this, in, _line)) in = 0; } iterator(Lexer *t, char const *in) : tokeniser(t), in(in), _line(1) { if (in) get(); } friend class Lexer; Lexer *tokeniser; string _value; char const *in; unsigned _type; unsigned _line; struct { int start, end; } match[50]; int matches; }; friend class iterator; Lexer(); virtual ~Lexer(); iterator begin(char const *in) { return iterator(this, in); } iterator end() { return iterator(this, 0); } Lexer &ignorePattern(unsigned index, char const *pattern) { return addPattern(index, pattern, true); } Lexer &addPattern(unsigned index, char const *pattern, bool ignore = false); void ignore(unsigned index, bool state = true); void enable(unsigned index, bool state = true); protected : enum { Character = 1000000 }; struct Pattern { Regex rx; bool ignore, enabled; }; map<int, Pattern> pattern; bool initialised; virtual bool get(iterator &it, char const *&in, unsigned &line); }; inline ostream &operator << (ostream &out, Lexer::iterator const &it) { out << it.value(); return out; } #endif ���������������������������������������������������������������devtodo-master/util/Makefile.am���������������������������������������������������������������������0000644�0000000�0000000�00000000423�13722700563�015526� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������noinst_LTLIBRARIES=libutil.la libutil_la_SOURCES=Terminal.cc Terminal.h Lexer.cc Lexer.h \ Regex.cc Regex.h XML.cc XML.h Strings.cc Strings.h CommandArgs.cc CommandArgs.h rebuild: crash-module -veto --exclude Signal --module XML Strings Terminal CommandArgs --join 50,lib ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/util/Regex.h�������������������������������������������������������������������������0000644�0000000�0000000�00000003713�13722700563�014722� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#ifndef CRASH_REGEX #define CRASH_REGEX #include <cassert> #include <cstring> #include <string> #include <map> #include <utility> #include <stdexcept> #include <cassert> #include <sys/types.h> #include <regex.h> #ifndef CRASH_REGEX_CACHE_THRESHOLD #define CRASH_REGEX_CACHE_THRESHOLD 128 #endif using namespace std; /** Regex is a C++ wrapper around the POSIX Regex library. 13/00/01 Added cache. This speeds up general rx construction considerably. 14/08/00 Created. */ class Regex { public : struct exception : public runtime_error { exception(string const &what) : runtime_error(what) {} }; struct out_of_range : public exception { out_of_range(string const &what) : exception(what) {} }; struct no_match : public exception { no_match(string const &what) : exception(what) {} }; Regex(); Regex(char const *regex); Regex(Regex const ©); ~Regex(); Regex &operator = (Regex const ©); Regex &operator = (char const *regex); string const &source() const { return inrx; } /* Regex regex("'([^']*)'"); string out; out = regex.transform("'alec thomas'", "(\\1)"); // outputs: (alec thomas) */ string transform(string const &in, string const &mask); int match(char const *str); int operator == (char const *str) { return match(str); } int matchStart(char const *str); int operator <= (char const *str) { return matchStart(str); } int substrings() { for (int i = 0; i < 50; i++) if (matches[i].rm_so == -1) return i; return 50; } int subStart(unsigned index) { assert(index < 50 && matches[index].rm_so != -1); return matches[index].rm_so; } int subEnd(unsigned index) { assert(index < 50 && matches[index].rm_so != -1); return matches[index].rm_so; } private : string inrx; regex_t regex; regmatch_t matches[50]; struct Cache { Cache() : hits(0), instances(0) {} regex_t rx; int hits, instances; }; friend struct Cache; // static map<string, Cache> cache; }; #endif �����������������������������������������������������devtodo-master/util/XML.cc��������������������������������������������������������������������������0000644�0000000�0000000�00000013425�13722700563�014447� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#include "XML.h" bool XML::initialised = false; Lexer XML::xmlScan, XML::tagScan, XML::commentScan, XML::dataScan, XML::processScan; #ifdef CRASH_SIGNAL Signal2<string const &, map<string, string> const &> XML::onElementBegin; Signal1<string const &> XML::onElementEnd; Signal2<XML const &, string const &> XML::onBody; Signal2<XML const &, string const &> XML::onData; #endif XML::XML(Type type, XML *parent, Lexer::iterator &token) : _parent(parent), _type(type) { switch (type) { case Element : parseElement(token); break; case Body : parseBody(token); break; case Data : parseData(token); break; } } XML::XML() : _parent(0), _type(Element) { init(); } XML::XML(char const *str) : _parent(0), _type(Element) { init(); parse(str); } XML::~XML() { for (vector<XML*>::iterator i = _child.begin(); i != _child.end(); ++i) delete *i; } void XML::parse(char const *str) { try { Lexer::iterator i = xmlScan.begin(str); if (i.type() == XmlDecl) { ++i; } parseElement(i); } catch (Lexer::exception &e) { throw exception(e.what(), e.line()); } } void XML::init() { // Only initialise scanners once if (!initialised) { // <?xml version="1.0" encoding="UTF-8" standalone="no"?> xmlScan.addPattern(XmlDecl, "<\\?xml[^?]*\\?>[[:space:]]*"); xmlScan.addPattern(XmlCommentBegin, "<!--"); xmlScan.addPattern(XmlBegin, "<[a-zA-Z0-9_-]+" "([[:space:]]+[a-zA-Z_0-9-]+=(([/a-zA-Z_0-9,.]+)|(\"[^\"]*\")|('[^']*')))" "*[[:space:]]*(/?)>"); xmlScan.addPattern(XmlEnd, "</[a-zA-Z0-9_-]+>"); xmlScan.addPattern(XmlDataBegin, "<!DATA[[:space:]]*\\[\\["); xmlScan.addPattern(XmlContent, "([\n\r]|[^<])+"); commentScan.addPattern(CommentEnd, "-->[[:space:]]*"); commentScan.addPattern(CommentBody, "[\n\r]|."); tagScan.addPattern(ElementWS, "[[:space:]]+", true); tagScan.addPattern(ElementValue, "('(\\.|[^'])*')|(\"(\\.|[^\"])*\")"); tagScan.addPattern(ElementKey, "([a-zA-Z_][a-zA-Z0-9-]*)"); tagScan.addPattern(ElementAssignment, "="); tagScan.addPattern(ElementTerminator, "/"); dataScan.addPattern(DataEnd, "]]>"); dataScan.addPattern(DataBody, "[\n\r]|."); processScan.addPattern(ProcessBegin, "<\\?xml"); processScan.addPattern(ProcessBody, "\\?>|[^?][^>]"); processScan.addPattern(ProcessEnd, "\\?>"); initialised = true; } } // Skip comments void XML::skip(Lexer::iterator &token) { while (token.type() == XmlCommentBegin) { int skip = 0; try { for (Lexer::iterator i = commentScan.begin(token.source()); i != commentScan.end(); ++i) { skip += i.size(); if (i.type() == CommentEnd) break; } } catch (Lexer::exception &e) { throw exception(e.what(), token.line() + e.line() - 1); } token.skip(skip); ++token; } } // Get next token, skipping any comments void XML::next(Lexer::iterator &token) { ++token; skip(token); } void XML::parseElement(Lexer::iterator &token) { skip(token); if (token.type() != XmlBegin) throw exception("expected element, got '" + token.value() + "'", token.line()); char str[token.size()]; strncpy(str, token.value().c_str() + 1, token.size() - 2); str[token.size() - 2] = 0; try { Lexer::iterator i = tagScan.begin(str); if (i.type() != ElementKey) throw exception("invalid key", token.line()); _data = i.value(); // Extract attributes for (++i; i != tagScan.end(); ++i) { if (i.type() == ElementTerminator) { next(token); return; } if (i.type() != ElementKey) throw exception("expected key for attribute, got '" + i.value() + "'", token.line()); string k = i.value(); ++i; if (i.type() != ElementAssignment) throw exception("expected assignment operator after attribute key, got '" + i.value() + "'", token.line()); ++i; if (i.type() != ElementValue) throw exception("expected value for key '" + k + "', got '" + i.value() + "'", token.line()); _attrib[k] = str::stripcslashes(i.value().substr(1, i.size() - 2)); } } catch (Lexer::exception &e) { throw exception(e.what(), token.line() + e.line() - 1); } #ifdef CRASH_SIGNAL XML::onElementBegin(_data, _attrib); #endif next(token); // Scan children while (token != xmlScan.end()) switch (token.type()) { case XmlBegin : _child.push_back(new XML(Element, this, token)); break; case XmlDataBegin : _child.push_back(new XML(Data, this, token)); break; case XmlEnd : if (token.value().substr(2, token.size() - 3) != _data) throw exception("expected tag closure for '" + _data + "', got '" + token.value() + "'", token.line()); #ifdef CRASH_SIGNAL XML::onElementEnd(_data); #endif next(token); return; break; case XmlContent : _child.push_back(new XML(Body, this, token)); break; default: throw exception("unexpected token '" + token.value() + "'", token.line()); break; } } void XML::parseBody(Lexer::iterator &token) { skip(token); if (token.type() != XmlContent) throw exception("expected body, got '" + token.value() + "'", token.line()); // text is buffered into the buffer, then appended to _data as it fills char const *s = token.value().c_str(); unsigned size = token.value().size(); for (unsigned i = 0; i < size; ++i) { if (s[i] == '&') { if (!strncmp(s + i, "<", 4)) { _data += '<'; i += 3; } else if (!strncmp(s + i, ">", 4)) { _data += '>'; i += 3; } else if (!strncmp(s + i, "&", 5)) { _data += '&'; i += 4; } } else _data += s[i]; } #ifdef CRASH_SIGNAL XML::onBody(*_parent, _data); #endif next(token); } void XML::parseData(Lexer::iterator &token) { skip(token); int skip = 0; for (Lexer::iterator i = dataScan.begin(token.source()); i != dataScan.end(); ++i) { skip += i.size(); if (i.type() == DataEnd) break; _data += i.value(); } token.skip(skip); #ifdef CRASH_SIGNAL XML::onData(*_parent, _data); #endif } �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/COPYING������������������������������������������������������������������������������0000644�0000000�0000000�00000035433�13722700563�013561� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 675 Mass Ave, Cambridge, MA 02139, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������devtodo-master/.todo��������������������������������������������������������������������������������0000644�0000000�0000000�00000037152�13722700563�013474� 0����������������������������������������������������������������������������������������������������ustar �root����������������������������root�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������<?xml version="1.0"?> <todo version="0.1.19"> <title> devtodo - hierarchical task list Allow user to specify defaults in ~/.todorc. Format could be something like: <option> <value> - identical to command line args but without the prefixed '--'. Add facility to 'graft' an added note onto an existing one. Add facility to change the file used as the input database. Perhaps --database <file>. This could be put in the ~/.todorc to make it permanent. Ended up with two new options: --database <relative file> and --fallback-database <absolute file>. The first is used to change the default database from .todo to whatever while the second one is used to specify a fallback database file to use if no other can be found. Add sub-tasks (1.1, 1.2, 1.2.1, etc.). Might have to add an index attribute to each item, which would be a floating point number. In the end, it didn't require adding an index attribute. Add support for TODORC environment variable (courtesy, Claude) Add version checking so that if an older version uses a newer version database, it warns the user. Probably only major/medium version number checking needs to be done? Allow branches to be reparented: --reparent <index> <newparent> Automatically derive child item priority from parent item. Change SYNTAX section in man page to SYNPOSIS (courtesy, Arthur Korn) Chase down other temporary string bugs - I'm fairly sure there will be a few in there :( Found another one and squashed it. Compile for different platforms on cf.sourceforge.net FreeBSD compile has problems with regex - getting empty strings? Do something other than fail on compile, if readline is not linkable. Expand environment variables in ~/.todorc parsed strings. Useful for 'fallback-database $HOME/.todo' Fix bug where pressing ^D to end input of a priority or text causes a seg-fault (courtesy, Matt Kraai) Fix indenting weirdness. Fix man page so that -r is no longer present, to reflect removal of this option (courtesy, Arthur Korn) Fix multiple items being specified as seperate arguments always defaulting to last argument (courtesy, Arthur Korn) Fix problem where todo won't access third teir items: 1.1.x Fix up some minor man page problems Instead of deleting items on tdr, mark them as done? Or have a new option "-d|--done" (tdd) for this? Make -s (strip symbols) optional through --configure. Also, use -g -Wall when debugging and use -DNDEBUG when not debugging. Make some man pages. Figure out how to generate man pages. Probably have to download some source code as an example. Possibly learn how to use SGML? Or get some front-end that uses it. That would make it easier to generate other backends simultaneously, like .html and .man, from the one source. Remove explicitly set CXXFLAGS in configure.in before releasing. Sort items after editing so that the correct order is maintained. Use some sort of template string for formatting output. For example, '%i%f%2n.%T' would generate the default display. %<n?i: indent to current depth; <n> is the number of spaces per indent level and defaults to 4 %T is the item text, which wraps and indents to the depth the item started at %t is unwrapped text %p is the priority %d is the date (formatted according to --format-date) %n is the index number of the item %f is the state flag (+ means children, - means done, * means children and done) Validate arguments from ~/.todorc. It's not good to be able to put --add or --reparent in there. There should really only be a couple available: verbose, colour, etc. Add facility to extract TODO entries in source code into .todo database. Perhaps an external script or something would be better? Could be done with an option to the root node, such as: <todo mirror="true"> Minor bug where items with sub-items that are all marked done still get a + when children are filtered out. Allow sorting in a variety of ways. One example of the usefulness of this is to sort done items after not done items - makes it easier to seperate them visually. Another example is sorting by finished date, created date, lifespan of item, etc. a prefix of '-' will mean sort descending, '+' means sort ascending Valid sort keys are created - sort on created time completed - sort on completed time text - sort on text priority - sort on priority duration - sort on time item was open done - sort on whether an item is done or not eg. This is the default behaviour: todo --sort done,priority,-created Warn if database is created with world or group read/write permissions. If a sub-sub-item needs to be expanded and you are filtering with '-children' you have to first expand the parent then the child. This is probably not expected behaviour. Allow formatting of time in --verbose display. Add new filter options 'created=<date>[-<date>|,<date>]', 'completed=<date>[-<date>|,<date>]' and 'duration=<time>' to filtering code. Add a 'current' marker to items? ie. mark the item you are currently working on so that it is easier to determine. Default to medium priority if a blank line is entered in the priority input (thanks to Alexei Gilchrist). Fix performance problems! (Alexei Gilchrist has reported >= 16 seconds on a file less than 20K. Not good.) One problem was getline(string) - this is very slow. Changed it to use in.getline(char*, int) and it profiled a LOT faster. Use of Lexer::iterator::operator ++ (int) - this causes a copy of the entire iterator - bad. Added buffering code in XML::getBody - it *was* basically appending every single character individually - slow. Instead of reading entire database with getline(in, str) I now use the size returned by stat to pre-allocate a buffer and load it into that. Add --global-database <file> and -G/--global option to use it. Replaces clunky --fallback-database semantics. Thanks again to Alexei Gilchrist. Re-add numbered priorities for speedy entry. Alexei Gilchrist bought this up as an idea. Adding optional "hidden" notes to notes. Kind of like annotations. Invoke an external editor for this facility? (idea courtesy of Alexei Gilchrist). I have added a comment field for each item, as evidenced by this. Some sort of integration with CVS would be cool. Maybe you could do: "todo --cvs-commit" and it would commit to CVS using all the completed TODO items since the last commit as the change notes. That would be exceptionally cool. Go binary? This would definitely make it run a lot faster...but is that actually where the bottleneck is? (Well, this was definitely the bottleneck) Add --timeout <n> to only show database if shown more than <n> seconds ago. Add encryption using crash::Encrypt classes. This will probably require MIME support, or something similar, in order to actually store the resultant binary information. I have Base64 stream support, so this would be trivial now. Alternatively, just pass the encryption stream to the database saving routine. This would require rewriting the Loaders interface (again). Add per-database options - brilliant for changing views on different databases. Problem with this is that the options are parsed *before* the database is loaded, which would probably mean reloading the database to have the new options take effect. Add backing up of database (--backup [<n>], where <n> is the number of backups to keep). Add percentage completions? i18n support? Detect terminal width and use that instead of hardwired 80 characters. Rewrite filtering code so it's generic (as the sorting code is). Write functions to do envar expansion, C string escaping, etc. Add search facility. This could be done as a filter: todo --filter /string, which would make it simple when ommitting the '--filter': todo /string. Change everything to be an attribute (except, perhaps, the item body). This will allow things like timers and so forth (eg. todo --add-attrib 'start=%D'). Timed events. eg. todo --set text="This is funny"@07/03/2001-10:53 12 ... this would change the text of item 12 to "This is funny" on the 07/03/2001 at 10:53 in the morning. This is a test Evaluate filters on first use rather than on read. This will allow wildcard filters to actually work. Categories - optionally assign a note a category, then you can filter on that category. Make priority names and number of them user-defineable. Fix situation where doing "todo -r -5" goes into an endless loop. Fix problems with * expansion. It doesn't appear to work at all... devtodo-master/configure.in0000644000000000000000000000513713722700563015035 0ustar rootrootAC_INIT AC_CONFIG_SRCDIR([src/main.cc]) AM_INIT_AUTOMAKE(devtodo,0.1.20) # We don't want the util source to be made into a shared lib as it's # only used locally AC_DISABLE_SHARED AM_PROG_LIBTOOL AC_PROG_CXX AC_PROG_INSTALL AC_PROG_LN_S AC_LANG([C++]) # Extra options AC_ARG_ENABLE(debug, [ --enable-debug enable debugging CXXFLAGS (-Wall -g) [off]], [case "${enableval}" in yes) debug=true; CXXFLAGS="-Wall -g" ;; no) debug=false ;; *) AC_MSG_ERROR(--enable-debug expects either yes or no) ;; esac], [debug=false]) AM_CONDITIONAL(DEBUG, test x$debug = xtrue) # Don't use termcap to obtain window size AC_ARG_WITH(termcap, [ --without-termcap don't use termcap to obtain terminal width]) if test "${with_termcap}_" = _ -o "${with_termcap}_" = yes; then AC_DEFINE(USETERMCAP, 1, [ Use termcap to get terminal width]) fi # Check for various headers and functions - although I'm not doing anything # with them yet AC_CHECK_HEADERS(regex.h string utility iterator stdexcept list map vector \ typeinfo ctype.h stack iostream fstream ctime) AC_CHECK_FUNCS(regcomp ctime time unlink isatty strncmp) dnl The autoconf test for strftime is broken now (due to gcc 3.3 bug?): dnl Gcc 3.3 testprog = ``extern "C" char strftime;'', build with g++ test.cc dnl breaks with: dnl test.cc:1: error: nonnull argument with out-of-range operand number dnl (arg 1, operand 3) AC_MSG_CHECKING(for strftime) AC_COMPILE_IFELSE( [AC_LANG_PROGRAM([#include ], [[ char * s; time_t t = time(NULL); size_t x = strftime(s, 5, "%a", localtime(&t)); ]] )], [ AC_DEFINE(HAVE_STRFTIME, 1, [Define to 1 if you have the 'strftime' func tion.]) AC_MSG_RESULT(yes) ], [AC_MSG_RESULT(no)]) # Check for readline - modified heavily from librep # check for terminal library # this is a very cool solution from octave's configure.in unset tcap for termlib in ncurses curses termcap terminfo termlib; do AC_CHECK_LIB(${termlib}, tputs, [tcap="$tcap -l$termlib"]) case "$tcap" in *-l${termlib}*) break ;; esac done AC_CHECK_LIB(readline, readline,[READLINE_LIBS="-lreadline $tcap"], , $tcap) if test -z "$READLINE_LIBS"; then AC_MSG_ERROR([Can't find readline libraries]) fi AC_SUBST(READLINE_LIBS) SYSCONFDIR="`eval echo $sysconfdir`" AC_DEFINE_UNQUOTED(SYSCONFDIR, "$SYSCONFDIR", [System configuration directory]) AC_SUBST(SYSCONFDIR) AC_CHECK_PROG(HAVE_CRASH_CONFIG, crash-config, yes) AC_SUBST(HAVE_CRASH_CONFIG) AC_CONFIG_HEADERS([config.h]) AC_CONFIG_FILES([Makefile src/Makefile util/Makefile doc/Makefile doc/devtodo.1 makepackages.sh devtodo.spec devtodo.list]) AC_OUTPUT chmod +x makepackages.sh devtodo-master/README0000644000000000000000000000235713722700563013405 0ustar rootrootTodo is a program to display and manage a hierarchical list of outstanding work, or just reminders. The program itself is assisted by a few shell scripts that override default builtins. Specifically, cd, pushd and popd are overridden so that when using one of these commands to enter a directory, the todo will display any outstanding items in that directory. These scripts are available in the doc sub-directory as scripts.sh and scripts.tcsh. For much more complete information please refer to the man page (devtodo(1)). Some examples of sneaky ways to use devtodo: 1. Displaying only one item: todo 12 2. Displaying *all* items: todo all 3. Removing items 1 through 10: tdr 1-10 4. Making item 10.1 a child of item 13: todo -R 10.1,13 5. Using the binary database loader (but falling back to XML): echo "database-loaders binary,xml" >> ~/.todorc 6. *NOT* using the binary database loader at all, ever: echo "database-loaders xml" >> ~/.todorc 7. Generating a simplistic TODO file: todo --TODO --format generated='%2>%i- %+1T' all 8. Being verbose: todo -v 9. Being very verbose: todo -vv 10. Display only medium priority items that are completed and have the word "foobar" in them: todo all done /foobar 10. man devtodo man devtodo devtodo-master/doc/0000755000000000000000000000000013722700563013263 5ustar rootrootdevtodo-master/doc/scripts.sh0000755000000000000000000000233113722700563015310 0ustar rootroot# # These functions override builtin bash commands that change directories. # The purpose of this is to show any todo items as soon as you move into a # directory. Quite handy. # # The script will also display todo items upon first login. # # For example, if I have some todo items in my home directory and I cd ~, # the items will be displayed. # # This script should be added to either the system wide shell initialisation # file (/etc/profile) or a user specific initialisation file (~/.bash_profile # or ~/.profile). In addition, if you are using X, terminals you start up # should be login terminals (typically -ls, --ls or something to that effect). # # Only display every X (10) seconds, and display a maximum of one line per note. # The timeout period can be modified by putting # timeout # in your ~/.todorc. TODO_OPTIONS="--timeout --summary" cd () { builtin cd "$@" RV=$? [ $RV = 0 -a -r .todo ] && devtodo ${TODO_OPTIONS} return $RV } pushd () { builtin pushd "$@" RV=$? [ $RV = 0 -a -r .todo ] && devtodo ${TODO_OPTIONS} return $RV } popd () { builtin popd "$@" RV=$? [ $RV = 0 -a -r .todo ] && devtodo ${TODO_OPTIONS} return $RV } # Run todo initially upon login devtodo ${TODO_OPTIONS} devtodo-master/doc/devtodo.1.in0000644000000000000000000004576513722700563015437 0ustar rootroot.\" todo is licensed under the GPL, version 2. A copy of the GPL should have been distributed with the source in the file COPYING .TH @PACKAGE@ "1" @VERSION@ "Alec Thomas" "Programming utility" .SH "NAME" .LP todo \- a reminder/task program aimed at developers .SH "SYNOPSIS" .LP .TP \fBtodo [\fI\fP]\fP With no options, displays the items in the current directory. .TP \fBtda [\-p \fI\fR] [\-g \fI\fP] [\fI\fP]\fP Add a new item, optionally grafting it as a child of the given item. .TP \fBtde \fI\fP\fP Edit the given item. .TP \fBtdr \fI\fP\fP Remove the given items. .TP \fBtdd \fI\fP\fP Mark the specified items as being done. .TP \fBtdl [\-g \fI\fP] \fI\fP\fP Link the specified devtodo database into the current one, optionally grafting it as a child of the specified index. .SH "DESCRIPTION" .LP \fBtodo\fP is a program aimed specifically at programmers (but usable by anybody at the terminal) to aid in day\-to\-day development. .LP It maintains a list of items that have yet to be completed. This allows the programmer to track outstanding bugs or items that need to be completed with very little effort. .LP Items can be prioritised and can also be displayed in a hierarchy, so that one item may depend on another. .LP With the use of some small shell scripts (scripts.* in the doc directory of the source distribution), todo can also display the outstanding items in a directory as you change into it. So, for example, if you cd into the source directory for todo itself you should see a list of outstanding items...unless all of the bugs have been fixed ;). .SH "OPTIONS" .LP Options can have both a long and a short form. .LP Short options can be combined into one argument by using a hyphen followed by a string of short options. Parameters of short options can also be appended to this string. .LP .TP \fB\-v, \-\-verbose\fR Display verbosely .TP \fB\-a, \-\-add [\fI\fR]\fR Add a note (will prompt for a note if one is not supplied). .TP \fB\-g, \-\-graft \fI\fR\fR In conjunction with \fI\-\-add\fR or \fI\-\-link\fR, graft the new item to the specified item. .TP \fB\-l, \-\-link \fI\fR\fR Link the specified todo file into the body of this one. If the linked database has a title set, this will be used as the body of the linking item otherwise the directory name of the linked database will be used. Use \-\-remove (or tdr) to remove linked databases \(hy this does \fBnot\fR remove the database itself, only the link. .TP \fB\-R,\-\-reparent \fI[,]\fR\fR Change the parent of the first item index to the second item index. If no second index is given the item is reparented to the root of the tree. .TP \fB\-p, \-\-priority \fI\fR\fR In conjunction with \-\-add or \-\-edit, set the priority (default | veryhigh | high | medium | low | verylow) .TP \fB\-e, \-\-edit \fI\fR\fR Edit the note that is indexed by the given number. .TP \fB\-\-remove \fI\fR\fR Remove the note indexed by the given numbers, including any children. .TP \fB\-d, \-\-done \fI\fR\fR Mark the specified notes (and their children) as done. .TP \fB\-D, \-\-not\-done \fI\fR\fR Mark the specified notes (and all children) as not done. .TP \fB\-\-global\-database \fI\fR\fR Specify the database to use if either the \fI-G\fR or \fI--global\fR options are specified. .TP \fB\-G, \-\-global\fR Force todo to use the database specified with \fI--global-database\fR. If this is placed in your \fI~/.todorc\fR it will force todo to use that database to the exclusion of all others. .TP \fB\-\-database \fI\fR\fR Change the database from whatever the default is (typically '.todo') to the file specified. .TP \fB\-T, \-\-TODO\fR Generate a typical TODO output text file from a Todo DB. .TP \fB\-A, \-\-all\fR Shortcut for the filter '+done,+children' to show all notes. .TP \fB\-f, \-\-filter \fI\fR\fR Display only those notes that pass the filter. Please refer the section \fIFILTERS\fR for more information. .TP \fB\-\-colour \fI\fR\fR Override default colours of todo items. Please refer to the section \fICOLOUR\fR for more information. .TP \fB\-\-force\-colour\fR Force use of colour even when not outputting to a TTY. This is useful when piping to \fIless(1)\fR \-R. .TP \fB\-\-mono\fR Remove all ANSI escape sequences from output - useful for colour impaired terminals. .TP \fB\-\-help\fR Display this help. .TP \fB\-\-version\fR Display version of ToDo. .TP \fB\-\-title [\fI\fR]\fR Set the title of this directory's todo notes. .TP \fB\-\-date\-format \fI\fR\fR Format the display of time values. The format is that used by strftime(3). The default format is '%c'. This option is best specified in the \fI~/.todorc\fR. .TP \fB\-\-format \fI=\fR\fR Specify the formatting of output. Please refer to the section FORMATTING for more information. .TP \fB\-\-use\-format \fI=\fR\fR Use the format string identified by \fI\fR (defined with \-\-format) as the format string to use when formatting with the builtin format \fI\fR. .TP \fB\-\-sort \fI\fR\fR Sort the database with the specified expression. Refer to the section \fISORTING\fR for more detailed information. .TP \fB\-\-paranoid\fR Be paranoid about some settings, including permissions. .TP \fB\-\-database\-loaders \fI\fR\fR Try the database formats in the given order. Valid formats are \fIxml\fR and \fIbinary\fR. eg. todo \-\-database\-loaders binary,xml. The default format is XML. .TP \fB\-\-backup [\fI\fR]\fR Backup the database up to \fI\fR times, just before it is written to. If \fI\fR is not specified, one backup will be made. The filenames used to store the backups are the default database name with their revision appended like so: .todo.1, .todo.2, etc. To actually use one of these backups, you can either mv it to .todo or use \-\-database .todo. to explicitly specify its use. .TP \fB\-s, \-\-summary\fR Toggle "summary" mode, where long items are truncated to one line. .TP \fB\-c, \-\-comment\fR Edit or show comments respectively. .TP \fB\-\-timeout [\fI