jregex/0000755000175000017500000000000013732321702012236 5ustar andriusandriusjregex/docs/0000755000175000017500000000000013732320472013171 5ustar andriusandriusjregex/docs/api/0000755000175000017500000000000007426632542013751 5ustar andriusandriusjregex/docs/api/overview-summary.html0000644000175000017500000001027207503220220020160 0ustar andriusandrius : Overview

Packages
jregex  
jregex.util.io  

 



jregex/docs/api/allclasses-frame.html0000644000175000017500000000371107503220220020035 0ustar andriusandrius All Classes All Classes
ListEnumerator
ListEnumerator.Instantiator
Matcher
MatchIterator
MatchResult
Optimizer
PathPattern
Pattern
PatternSyntaxException
PerlSubstitution
REFlags
Replacer
RETokenizer
Substitution
TextBuffer
WildcardFilter
WildcardPattern
jregex/docs/api/serialized-form.html0000644000175000017500000001656307503220222017726 0ustar andriusandrius Serialized Form

Serialized Form


Package jregex

Class jregex.Pattern implements Serializable

Serialized Fields

stringRepr

java.lang.String stringRepr

root

jregex.Term root

root0

jregex.Term root0

memregs

int memregs

counters

int counters

lookaheads

int lookaheads

namedGroupMap

java.util.Hashtable namedGroupMap

Class jregex.PatternSyntaxException implements Serializable

Class jregex.WildcardPattern implements Serializable

Serialized Fields

str

java.lang.String str


Package jregex.util.io

Class jregex.util.io.PathPattern implements Serializable

Serialized Fields

str

java.lang.String str

root

java.lang.String root

rootf

java.io.File rootf

queue

jregex.util.io.PathElementMask queue

last

jregex.util.io.PathElementMask last



jregex/docs/api/overview-tree.html0000644000175000017500000001447607503220220017434 0ustar andriusandrius : Class Hierarchy

Hierarchy For All Packages

Package Hierarchies:
jregex, jregex.util.io

Class Hierarchy

Interface Hierarchy



jregex/docs/api/deprecated-list.html0000644000175000017500000001140207503220220017664 0ustar andriusandrius : Deprecated List

Deprecated API

Deprecated Methods
jregex.util.io.PathPattern.directory()
          Is meaningless with regard to variable paths (since v.1.2) 
jregex.Matcher.isStart()
          Replaced by isPrefix() 
jregex.util.io.PathPattern.names()
          Is meaningless with regard to variable paths (since v.1.2) 
 



jregex/docs/api/packages.html0000644000175000017500000000124507503220220016375 0ustar andriusandrius


The front page has been relocated.Please see:
          Frame version
          Non-frame version.
jregex/docs/api/index.html0000644000175000017500000000142707503220220015730 0ustar andriusandrius Generated Documentation (Untitled) <H2> Frame Alert</H2> <P> This document is designed to be viewed using the frames feature. If you see this message, you are using a non-frame-capable web client. <BR> Link to <A HREF="overview-summary.html">Non-frame version.</A> jregex/docs/api/package-list0000644000175000017500000000003007503220222016211 0ustar andriusandriusjregex jregex.util.io jregex/docs/api/jregex/0000755000175000017500000000000007426632572015240 5ustar andriusandriusjregex/docs/api/jregex/Replacer.html0000644000175000017500000007045107503220222017647 0ustar andriusandrius : Class Replacer

jregex
Class Replacer

java.lang.Object
  |
  +--jregex.Replacer

public class Replacer
extends java.lang.Object

The Replacer class suggests some methods to replace occurences of a pattern either by a result of evaluation of a perl-like expression, or by a plain string, or according to a custom substitution model, provided as a Substitution interface implementation.
A Replacer instance may be obtained either using Pattern.replacer(...) method, or by constructor:

 Pattern p=new Pattern("\\w+");
 Replacer perlExpressionReplacer=p.replacer("[$&]");
 //or another way to do the same
 Substitution myOwnModel=new Substitution(){
    public void appendSubstitution(MatchResult match,TextBuffer tb){
       tb.append('[');
       match.getGroup(MatchResult.MATCH,tb);
       tb.append(']');
    }
 }
 Replacer myVeryOwnReplacer=new Replacer(p,myOwnModel);
 
The second method is much more verbose, but gives more freedom. To perform a replacement call replace(someInput):
 System.out.print(perlExpressionReplacer.replace("All your base "));
 System.out.println(myVeryOwnReplacer.replace("are belong to us"));
 //result: "[All] [your] [base] [are] [belong] [to] [us]"
 

See Also:
Substitution, PerlSubstitution, Replacer(jregex.Pattern,jregex.Substitution)

Constructor Summary
Replacer(Pattern pattern, java.lang.String substitution)
           
Replacer(Pattern pattern, java.lang.String substitution, boolean isPerlExpr)
           
Replacer(Pattern pattern, Substitution substitution)
           
 
Method Summary
 java.lang.String replace(char[] chars, int off, int len)
           
 int replace(char[] chars, int off, int len, java.lang.StringBuffer sb)
           
 int replace(char[] chars, int off, int len, TextBuffer dest)
           
 void replace(char[] chars, int off, int len, java.io.Writer out)
           
static int replace(Matcher m, Substitution substitution, TextBuffer dest)
          Replaces all occurences of a matcher's pattern in a matcher's target by a given substitution appending the result to a buffer.
The substitution starts from current matcher's position, current match not included.
static int replace(Matcher m, Substitution substitution, java.io.Writer out)
           
 java.lang.String replace(MatchResult res, int group)
           
 int replace(MatchResult res, int group, java.lang.StringBuffer sb)
           
 int replace(MatchResult res, int group, TextBuffer dest)
           
 void replace(MatchResult res, int group, java.io.Writer out)
           
 int replace(MatchResult res, java.lang.String groupName, java.lang.StringBuffer sb)
           
 int replace(MatchResult res, java.lang.String groupName, TextBuffer dest)
           
 void replace(MatchResult res, java.lang.String groupName, java.io.Writer out)
           
 java.lang.String replace(java.io.Reader text, int length)
           
 int replace(java.io.Reader text, int length, java.lang.StringBuffer sb)
           
 int replace(java.io.Reader text, int length, TextBuffer dest)
           
 void replace(java.io.Reader in, int length, java.io.Writer out)
           
 java.lang.String replace(java.lang.String text)
           
 int replace(java.lang.String text, java.lang.StringBuffer sb)
           
 int replace(java.lang.String text, TextBuffer dest)
           
 void replace(java.lang.String text, java.io.Writer out)
           
 void setSubstitution(java.lang.String s, boolean isPerlExpr)
           
static TextBuffer wrap(java.lang.StringBuffer sb)
           
static TextBuffer wrap(java.io.Writer writer)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Replacer

public Replacer(Pattern pattern,
                Substitution substitution)

Replacer

public Replacer(Pattern pattern,
                java.lang.String substitution)

Replacer

public Replacer(Pattern pattern,
                java.lang.String substitution,
                boolean isPerlExpr)
Method Detail

setSubstitution

public void setSubstitution(java.lang.String s,
                            boolean isPerlExpr)

replace

public java.lang.String replace(java.lang.String text)

replace

public java.lang.String replace(char[] chars,
                                int off,
                                int len)

replace

public java.lang.String replace(MatchResult res,
                                int group)

replace

public java.lang.String replace(java.io.Reader text,
                                int length)
                         throws java.io.IOException

replace

public int replace(java.lang.String text,
                   java.lang.StringBuffer sb)

replace

public int replace(char[] chars,
                   int off,
                   int len,
                   java.lang.StringBuffer sb)

replace

public int replace(MatchResult res,
                   int group,
                   java.lang.StringBuffer sb)

replace

public int replace(MatchResult res,
                   java.lang.String groupName,
                   java.lang.StringBuffer sb)

replace

public int replace(java.io.Reader text,
                   int length,
                   java.lang.StringBuffer sb)
            throws java.io.IOException

replace

public int replace(java.lang.String text,
                   TextBuffer dest)

replace

public int replace(char[] chars,
                   int off,
                   int len,
                   TextBuffer dest)

replace

public int replace(MatchResult res,
                   int group,
                   TextBuffer dest)

replace

public int replace(MatchResult res,
                   java.lang.String groupName,
                   TextBuffer dest)

replace

public int replace(java.io.Reader text,
                   int length,
                   TextBuffer dest)
            throws java.io.IOException

replace

public static int replace(Matcher m,
                          Substitution substitution,
                          TextBuffer dest)
Replaces all occurences of a matcher's pattern in a matcher's target by a given substitution appending the result to a buffer.
The substitution starts from current matcher's position, current match not included.

replace

public static int replace(Matcher m,
                          Substitution substitution,
                          java.io.Writer out)
                   throws java.io.IOException

replace

public void replace(java.lang.String text,
                    java.io.Writer out)
             throws java.io.IOException

replace

public void replace(char[] chars,
                    int off,
                    int len,
                    java.io.Writer out)
             throws java.io.IOException

replace

public void replace(MatchResult res,
                    int group,
                    java.io.Writer out)
             throws java.io.IOException

replace

public void replace(MatchResult res,
                    java.lang.String groupName,
                    java.io.Writer out)
             throws java.io.IOException

replace

public void replace(java.io.Reader in,
                    int length,
                    java.io.Writer out)
             throws java.io.IOException

wrap

public static TextBuffer wrap(java.lang.StringBuffer sb)

wrap

public static TextBuffer wrap(java.io.Writer writer)


jregex/docs/api/jregex/TextBuffer.html0000644000175000017500000001640507503220222020167 0ustar andriusandrius : Interface TextBuffer

jregex
Interface TextBuffer


public interface TextBuffer


Method Summary
 void append(char c)
           
 void append(char[] chars, int start, int len)
           
 void append(java.lang.String s)
           
 

Method Detail

append

public void append(char c)

append

public void append(char[] chars,
                   int start,
                   int len)

append

public void append(java.lang.String s)


jregex/docs/api/jregex/MatchResult.html0000644000175000017500000005041107503220220020335 0ustar andriusandrius : Interface MatchResult

jregex
Interface MatchResult

All Known Implementing Classes:
Matcher

public interface MatchResult


Field Summary
static int MATCH
           
static int PREFIX
           
static int SUFFIX
           
static int TARGET
           
 
Method Summary
 char charAt(int i)
           
 char charAt(int i, int groupNo)
           
 int end()
           
 int end(int n)
           
 boolean getGroup(int n, java.lang.StringBuffer sb)
           
 boolean getGroup(int n, TextBuffer tb)
           
 boolean getGroup(java.lang.String name, java.lang.StringBuffer sb)
           
 boolean getGroup(java.lang.String name, TextBuffer tb)
           
 java.lang.String group(int n)
           
 java.lang.String group(java.lang.String name)
           
 int groupCount()
           
 boolean isCaptured()
           
 boolean isCaptured(int groupId)
           
 boolean isCaptured(java.lang.String groupName)
           
 int length()
           
 int length(int n)
           
 Pattern pattern()
           
 java.lang.String prefix()
           
 int start()
           
 int start(int n)
           
 java.lang.String suffix()
           
 java.lang.String target()
           
 char[] targetChars()
           
 int targetEnd()
           
 int targetStart()
           
 

Field Detail

MATCH

public static final int MATCH

PREFIX

public static final int PREFIX

SUFFIX

public static final int SUFFIX

TARGET

public static final int TARGET
Method Detail

pattern

public Pattern pattern()

groupCount

public int groupCount()

isCaptured

public boolean isCaptured()

isCaptured

public boolean isCaptured(int groupId)

isCaptured

public boolean isCaptured(java.lang.String groupName)

group

public java.lang.String group(int n)

getGroup

public boolean getGroup(int n,
                        java.lang.StringBuffer sb)

getGroup

public boolean getGroup(int n,
                        TextBuffer tb)

group

public java.lang.String group(java.lang.String name)

getGroup

public boolean getGroup(java.lang.String name,
                        java.lang.StringBuffer sb)

getGroup

public boolean getGroup(java.lang.String name,
                        TextBuffer tb)

prefix

public java.lang.String prefix()

suffix

public java.lang.String suffix()

target

public java.lang.String target()

targetStart

public int targetStart()

targetEnd

public int targetEnd()

targetChars

public char[] targetChars()

start

public int start()

end

public int end()

length

public int length()

start

public int start(int n)

end

public int end(int n)

length

public int length(int n)

charAt

public char charAt(int i)

charAt

public char charAt(int i,
                   int groupNo)


jregex/docs/api/jregex/RETokenizer.html0000644000175000017500000003507507503220222020316 0ustar andriusandrius : Class RETokenizer

jregex
Class RETokenizer

java.lang.Object
  |
  +--jregex.RETokenizer
All Implemented Interfaces:
java.util.Enumeration

public class RETokenizer
extends java.lang.Object
implements java.util.Enumeration

The Tokenizer class suggests a methods to break a text into tokens using occurences of a pattern as delimiters. There are two ways to obtain a text tokenizer for some pattern:

 Pattern p=new Pattern("\\s+"); //any number of space characters
 String text="blah blah blah";
 //by factory method
 RETokenizer tok1=p.tokenizer(text);
 //or by constructor
 RETokenizer tok2=new RETokenizer(p,text);
 
Now the one way is to use the tokenizer as a token enumeration/iterator:
 while(tok1.hasMore()) System.out.println(tok1.nextToken());
 
and another way is to split it into a String array:
 
 String[] arr=tok2.split();
 for(int i=0;i

See Also:
Pattern.tokenizer(java.lang.String)

Constructor Summary
RETokenizer(Matcher m, boolean emptyEnabled)
           
RETokenizer(Pattern pattern, char[] chars, int off, int len)
           
RETokenizer(Pattern pattern, java.io.Reader r, int len)
           
RETokenizer(Pattern pattern, java.lang.String text)
           
 
Method Summary
 boolean hasMore()
           
 boolean hasMoreElements()
           
 boolean isEmptyEnabled()
           
 java.lang.Object nextElement()
           
 java.lang.String nextToken()
           
 void reset()
           
 void setEmptyEnabled(boolean b)
           
 java.lang.String[] split()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RETokenizer

public RETokenizer(Pattern pattern,
                   java.lang.String text)

RETokenizer

public RETokenizer(Pattern pattern,
                   char[] chars,
                   int off,
                   int len)

RETokenizer

public RETokenizer(Pattern pattern,
                   java.io.Reader r,
                   int len)
            throws java.io.IOException

RETokenizer

public RETokenizer(Matcher m,
                   boolean emptyEnabled)
Method Detail

setEmptyEnabled

public void setEmptyEnabled(boolean b)

isEmptyEnabled

public boolean isEmptyEnabled()

hasMore

public boolean hasMore()

nextToken

public java.lang.String nextToken()

split

public java.lang.String[] split()

reset

public void reset()

hasMoreElements

public boolean hasMoreElements()
Specified by:
hasMoreElements in interface java.util.Enumeration

nextElement

public java.lang.Object nextElement()
Specified by:
nextElement in interface java.util.Enumeration
Returns:
a next token as a String


jregex/docs/api/jregex/package-summary.html0000644000175000017500000001765507503220220021205 0ustar andriusandrius : Package jregex

Package jregex

Interface Summary
MatchIterator  
MatchResult  
REFlags  
Substitution  
TextBuffer  
 

Class Summary
Matcher Matcher instance is an automaton that actually performs matching.
Optimizer  
Pattern A handle for a precompiled regular expression.
To match a regular expression myExpr against a text myString one should first create a Pattern object: Pattern p=new Pattern(myExpr); then obtain a Matcher object: Matcher matcher=p.matcher(myText); The latter is an automaton that actually performs a search.
PerlSubstitution An implementation of the Substitution interface.
Replacer The Replacer class suggests some methods to replace occurences of a pattern either by a result of evaluation of a perl-like expression, or by a plain string, or according to a custom substitution model, provided as a Substitution interface implementation.
A Replacer instance may be obtained either using Pattern.replacer(...) method, or by constructor: Pattern p=new Pattern("\\w+"); Replacer perlExpressionReplacer=p.replacer("[$&]"); //or another way to do the same Substitution myOwnModel=new Substitution(){ public void appendSubstitution(MatchResult match,TextBuffer tb){ tb.append('['); match.getGroup(MatchResult.MATCH,tb); tb.append(']'); } } Replacer myVeryOwnReplacer=new Replacer(p,myOwnModel); The second method is much more verbose, but gives more freedom.
RETokenizer The Tokenizer class suggests a methods to break a text into tokens using occurences of a pattern as delimiters.
WildcardPattern A Pattern subclass that accepts a simplified pattern syntax: ?
 

Exception Summary
PatternSyntaxException Is thrown when Pattern constructor's argument doesn't conform the Perl5 regular expression syntax.
 



jregex/docs/api/jregex/MatchIterator.html0000644000175000017500000001616007503220220020653 0ustar andriusandrius : Interface MatchIterator

jregex
Interface MatchIterator


public interface MatchIterator


Method Summary
 int count()
           
 boolean hasMore()
           
 MatchResult nextMatch()
           
 

Method Detail

hasMore

public boolean hasMore()

nextMatch

public MatchResult nextMatch()

count

public int count()


jregex/docs/api/jregex/Replacer.WriterWrap.html0000644000175000017500000001634207465344650021776 0ustar andriusandrius : Interface Replacer.WriterWrap

jregex
Interface Replacer.WriterWrap

All Superinterfaces:
TextBuffer
Enclosing class:
Replacer

public static interface Replacer.WriterWrap
extends TextBuffer


Method Summary
 void checkError()
           
 
Methods inherited from interface jregex.TextBuffer
append, append, append
 

Method Detail

checkError

public void checkError()
                throws java.io.IOException


jregex/docs/api/jregex/Substitution.html0000644000175000017500000001545007503220222020624 0ustar andriusandrius : Interface Substitution

jregex
Interface Substitution

All Known Implementing Classes:
PerlSubstitution

public interface Substitution


Method Summary
 void appendSubstitution(MatchResult match, TextBuffer dest)
           
 

Method Detail

appendSubstitution

public void appendSubstitution(MatchResult match,
                               TextBuffer dest)


jregex/docs/api/jregex/WildcardPattern.html0000644000175000017500000004315407503220222021201 0ustar andriusandrius : Class WildcardPattern

jregex
Class WildcardPattern

java.lang.Object
  |
  +--jregex.Pattern
        |
        +--jregex.WildcardPattern
All Implemented Interfaces:
REFlags, java.io.Serializable

public class WildcardPattern
extends Pattern

A Pattern subclass that accepts a simplified pattern syntax:

  • ? - matches any single character;
  • * - matches any number of any characters;
  • all the rest - matches itself. Each wildcard takes a capturing group withing a pattern.

    See Also:
    Pattern, Serialized Form

    Field Summary
    static java.lang.String ANY_CHAR
               
    static java.lang.String WORD_CHAR
               
     
    Fields inherited from interface jregex.REFlags
    DEFAULT, DOTALL, IGNORE_CASE, IGNORE_SPACES, MULTILINE, UNICODE, XML_SCHEMA
     
    Constructor Summary
    protected WildcardPattern()
               
      WildcardPattern(java.lang.String wc)
               
      WildcardPattern(java.lang.String wc, boolean icase)
               
      WildcardPattern(java.lang.String wc, int flags)
               
      WildcardPattern(java.lang.String wc, java.lang.String wcClass, int flags)
               
     
    Method Summary
    protected  void compile(java.lang.String wc, java.lang.String wcClass, java.lang.String specials, int flags)
               
    protected static java.lang.String convertSpecials(java.lang.String s, java.lang.String wcClass, java.lang.String specials)
               
     java.lang.String toString()
               
     
    Methods inherited from class jregex.Pattern
    compile, groupCount, groupId, matcher, matcher, matcher, matcher, matcher, matcher, matches, replacer, replacer, startsWith, tokenizer, tokenizer, tokenizer, toString_d
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
     

    Field Detail

    WORD_CHAR

    public static final java.lang.String WORD_CHAR

    ANY_CHAR

    public static final java.lang.String ANY_CHAR
    Constructor Detail

    WildcardPattern

    public WildcardPattern(java.lang.String wc)
    Parameters:
    wc - The pattern

    WildcardPattern

    public WildcardPattern(java.lang.String wc,
                           boolean icase)
    Parameters:
    wc - The pattern
    icase - If true, the pattern is case-insensitive.

    WildcardPattern

    public WildcardPattern(java.lang.String wc,
                           int flags)
    Parameters:
    wc - The pattern
    flags - The bitwise OR of any of REFlags.* . The only meaningful flags are REFlags.IGNORE_CASE and REFlags.DOTALL (the latter allows the wildcards to match the EOL characters).

    WildcardPattern

    public WildcardPattern(java.lang.String wc,
                           java.lang.String wcClass,
                           int flags)
    Parameters:
    wc - The pattern
    wcClass - The wildcard class, could be any of WORD_CHAR or ANY_CHAR
    flags - The bitwise OR of any of REFlags.* . The only meaningful flags are REFlags.IGNORE_CASE and REFlags.DOTALL (the latter allows the wildcards to match the EOL characters).

    WildcardPattern

    protected WildcardPattern()
    Method Detail

    convertSpecials

    protected static java.lang.String convertSpecials(java.lang.String s,
                                                      java.lang.String wcClass,
                                                      java.lang.String specials)

    compile

    protected void compile(java.lang.String wc,
                           java.lang.String wcClass,
                           java.lang.String specials,
                           int flags)

    toString

    public java.lang.String toString()
    Overrides:
    toString in class Pattern


    jregex/docs/api/jregex/PatternSyntaxException.html0000644000175000017500000001770407503220222022617 0ustar andriusandrius : Class PatternSyntaxException

    jregex
    Class PatternSyntaxException

    java.lang.Object
      |
      +--java.lang.Throwable
            |
            +--java.lang.Exception
                  |
                  +--java.lang.RuntimeException
                        |
                        +--java.lang.IllegalArgumentException
                              |
                              +--jregex.PatternSyntaxException
    
    All Implemented Interfaces:
    java.io.Serializable

    public class PatternSyntaxException
    extends java.lang.IllegalArgumentException

    Is thrown when Pattern constructor's argument doesn't conform the Perl5 regular expression syntax.

    See Also:
    Pattern, Serialized Form

    Constructor Summary
    PatternSyntaxException(java.lang.String s)
               
     
    Methods inherited from class java.lang.Throwable
    fillInStackTrace, getLocalizedMessage, getMessage, printStackTrace, printStackTrace, printStackTrace, toString
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
     

    Constructor Detail

    PatternSyntaxException

    public PatternSyntaxException(java.lang.String s)


    jregex/docs/api/jregex/util/0000755000175000017500000000000007426632572016215 5ustar andriusandriusjregex/docs/api/jregex/util/io/0000755000175000017500000000000007426632572016624 5ustar andriusandriusjregex/docs/api/jregex/util/io/package-summary.html0000644000175000017500000001216207503220220022555 0ustar andriusandrius : Package jregex.util.io

    Package jregex.util.io

    Interface Summary
    ListEnumerator.Instantiator  
     

    Class Summary
    ListEnumerator  
    PathPattern A special-purpose subclass of the Pattern class.
    WildcardFilter  
     



    jregex/docs/api/jregex/util/io/WildcardFilter.html0000644000175000017500000002153507503220222022374 0ustar andriusandrius : Class WildcardFilter

    jregex.util.io
    Class WildcardFilter

    java.lang.Object
      |
      +--jregex.util.io.WildcardFilter
    
    All Implemented Interfaces:
    java.io.FilenameFilter

    public class WildcardFilter
    extends java.lang.Object
    implements java.io.FilenameFilter


    Constructor Summary
    WildcardFilter(java.lang.String ptn)
               
    WildcardFilter(java.lang.String ptn, boolean icase)
               
     
    Method Summary
     boolean accept(java.io.File dir, java.lang.String name)
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Constructor Detail

    WildcardFilter

    public WildcardFilter(java.lang.String ptn)

    WildcardFilter

    public WildcardFilter(java.lang.String ptn,
                          boolean icase)
    Method Detail

    accept

    public boolean accept(java.io.File dir,
                          java.lang.String name)
    Specified by:
    accept in interface java.io.FilenameFilter


    jregex/docs/api/jregex/util/io/PathPattern.html0000644000175000017500000004061007503220222021722 0ustar andriusandrius : Class PathPattern

    jregex.util.io
    Class PathPattern

    java.lang.Object
      |
      +--jregex.Pattern
            |
            +--jregex.util.io.PathPattern
    
    All Implemented Interfaces:
    REFlags, java.io.Serializable

    public class PathPattern
    extends Pattern

    A special-purpose subclass of the Pattern class. Has two different applications:

  • to search files by their paths using special patterns;
  • to match path strings Syntax:
  • ? - any character but path separator
  • * - any string no including path separators
  • ** - any path

    Usage:
     PathPattern pp=new PathPattern("jregex/**"); //all files and directories
                                                  //under the jregex directory
     Enumeration files=pp.enumerateFiles();
     Matcher m=pp.matcher();
     while(files.hasMoreElements()){
        File f=(File)files.nextElement();
        m.setTarget(f.getPath());
        if(!m.matches()) System.out.println("Error in jregex.io.PathPattern");
     }
     

    See Also:
    WildcardPattern, Serialized Form

    Fields inherited from interface jregex.REFlags
    DEFAULT, DOTALL, IGNORE_CASE, IGNORE_SPACES, MULTILINE, UNICODE, XML_SCHEMA
     
    Constructor Summary
    PathPattern(java.io.File dir, java.lang.String path, boolean icase)
               
    PathPattern(java.io.File dir, java.lang.String path, int flags)
               
    PathPattern(java.lang.String ptn)
               
    PathPattern(java.lang.String ptn, boolean icase)
               
    PathPattern(java.lang.String path, int flags)
               
     
    Method Summary
     java.io.File directory()
              Deprecated. Is meaningless with regard to variable paths (since v.1.2)
     java.util.Enumeration enumerateFiles()
               
     java.io.File[] files()
               
     java.lang.String[] names()
              Deprecated. Is meaningless with regard to variable paths (since v.1.2)
     java.lang.String toString()
               
     
    Methods inherited from class jregex.Pattern
    compile, groupCount, groupId, matcher, matcher, matcher, matcher, matcher, matcher, matches, replacer, replacer, startsWith, tokenizer, tokenizer, tokenizer, toString_d
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
     

    Constructor Detail

    PathPattern

    public PathPattern(java.lang.String ptn)

    PathPattern

    public PathPattern(java.lang.String ptn,
                       boolean icase)

    PathPattern

    public PathPattern(java.lang.String path,
                       int flags)

    PathPattern

    public PathPattern(java.io.File dir,
                       java.lang.String path,
                       boolean icase)

    PathPattern

    public PathPattern(java.io.File dir,
                       java.lang.String path,
                       int flags)
    Method Detail

    enumerateFiles

    public java.util.Enumeration enumerateFiles()

    files

    public java.io.File[] files()

    names

    public java.lang.String[] names()
    Deprecated. Is meaningless with regard to variable paths (since v.1.2)


    directory

    public java.io.File directory()
    Deprecated. Is meaningless with regard to variable paths (since v.1.2)


    toString

    public java.lang.String toString()
    Overrides:
    toString in class Pattern


    jregex/docs/api/jregex/util/io/package-frame.html0000644000175000017500000000230507503220220022150 0ustar andriusandrius : Package jregex.util.io jregex.util.io
    Interfaces 
    ListEnumerator.Instantiator
    Classes 
    ListEnumerator
    PathPattern
    WildcardFilter
    jregex/docs/api/jregex/util/io/ListEnumerator.html0000644000175000017500000003172107503220222022450 0ustar andriusandrius : Class ListEnumerator

    jregex.util.io
    Class ListEnumerator

    java.lang.Object
      |
      +--jregex.util.io.Enumerator
            |
            +--jregex.util.io.ListEnumerator
    
    All Implemented Interfaces:
    java.util.Enumeration

    public class ListEnumerator
    extends jregex.util.io.Enumerator


    Inner Class Summary
    static interface ListEnumerator.Instantiator
               
     
    Field Summary
    protected  java.lang.Object currObj
               
    static ListEnumerator.Instantiator defaultInstantiator
               
     
    Constructor Summary
    ListEnumerator(java.io.File dir, ListEnumerator.Instantiator i)
               
    ListEnumerator(java.io.File dir, java.lang.String[] names, ListEnumerator.Instantiator i)
               
     
    Method Summary
    protected  boolean find()
               
     boolean hasMoreElements()
               
     java.lang.Object nextElement()
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    defaultInstantiator

    public static final ListEnumerator.Instantiator defaultInstantiator

    currObj

    protected java.lang.Object currObj
    Constructor Detail

    ListEnumerator

    public ListEnumerator(java.io.File dir,
                          ListEnumerator.Instantiator i)

    ListEnumerator

    public ListEnumerator(java.io.File dir,
                          java.lang.String[] names,
                          ListEnumerator.Instantiator i)
    Method Detail

    find

    protected boolean find()
    Overrides:
    find in class jregex.util.io.Enumerator

    hasMoreElements

    public boolean hasMoreElements()
    Specified by:
    hasMoreElements in interface java.util.Enumeration

    nextElement

    public java.lang.Object nextElement()
    Specified by:
    nextElement in interface java.util.Enumeration


    jregex/docs/api/jregex/util/io/package-tree.html0000644000175000017500000001216107503220220022016 0ustar andriusandrius : jregex.util.io Class Hierarchy

    Hierarchy For Package jregex.util.io

    Package Hierarchies:
    All Packages

    Class Hierarchy

    • class java.lang.Object
      • class jregex.util.io.Enumerator (implements java.util.Enumeration)
      • class jregex.Pattern (implements jregex.REFlags, java.io.Serializable)
      • class jregex.util.io.WildcardFilter (implements java.io.FilenameFilter)

    Interface Hierarchy



    jregex/docs/api/jregex/util/io/ListEnumerator.Instantiator.html0000644000175000017500000001514707503220222025132 0ustar andriusandrius : Interface ListEnumerator.Instantiator

    jregex.util.io
    Interface ListEnumerator.Instantiator

    Enclosing class:
    ListEnumerator

    public static interface ListEnumerator.Instantiator


    Method Summary
     java.io.File instantiate(java.io.File dir, java.lang.String name)
               
     

    Method Detail

    instantiate

    public java.io.File instantiate(java.io.File dir,
                                    java.lang.String name)


    jregex/docs/api/jregex/PerlSubstitution.html0000644000175000017500000002615507503220222021453 0ustar andriusandrius : Class PerlSubstitution

    jregex
    Class PerlSubstitution

    java.lang.Object
      |
      +--jregex.PerlSubstitution
    
    All Implemented Interfaces:
    Substitution

    public class PerlSubstitution
    extends java.lang.Object
    implements Substitution

    An implementation of the Substitution interface. Performs substitutions in accordance with Perl-like substitution scripts.
    The latter is a string, containing a mix of memory register references and plain text blocks.
    It may look like "some_chars $1 some_chars$2some_chars" or "123${1}45${2}67".
    A tag consisting of '$',not preceeded by the escape character'\' and followed by some digits (possibly enclosed in the curled brackets) is interpreted as a memory register reference, the digits forming a register ID. All the rest is considered as a plain text.
    Upon the Replacer has found a text block that matches the pattern, a references in a replacement string are replaced by the contents of corresponding memory registers, and the resulting text replaces the matched block.
    For example, the following code:

     System.out.println("\""+
        new Replacer(new Pattern("\\b(\\d+)\\b"),new PerlSubstitution("'$1'")).replace("abc 123 def")
        +"\"");
     
    will print "abc '123' def".

    See Also:
    Substitution, Replacer, Pattern

    Constructor Summary
    PerlSubstitution(java.lang.String s)
               
     
    Method Summary
     void appendSubstitution(MatchResult match, TextBuffer dest)
               
     java.lang.String toString()
               
     java.lang.String value(MatchResult mr)
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
     

    Constructor Detail

    PerlSubstitution

    public PerlSubstitution(java.lang.String s)
    Method Detail

    value

    public java.lang.String value(MatchResult mr)

    appendSubstitution

    public void appendSubstitution(MatchResult match,
                                   TextBuffer dest)
    Specified by:
    appendSubstitution in interface Substitution

    toString

    public java.lang.String toString()
    Overrides:
    toString in class java.lang.Object


    jregex/docs/api/jregex/package-frame.html0000644000175000017500000000370507503220220020571 0ustar andriusandrius : Package jregex jregex
    Interfaces 
    MatchIterator
    MatchResult
    REFlags
    Substitution
    TextBuffer
    Classes 
    Matcher
    Optimizer
    Pattern
    PerlSubstitution
    Replacer
    RETokenizer
    WildcardPattern
    Exceptions 
    PatternSyntaxException
    jregex/docs/api/jregex/REFlags.html0000644000175000017500000002553707503220222017402 0ustar andriusandrius : Interface REFlags

    jregex
    Interface REFlags

    All Known Implementing Classes:
    Pattern

    public interface REFlags


    Field Summary
    static int DEFAULT
              All the foolowing options turned off
    static int DOTALL
              Affects the behaviour of dot(".") tag.
    static int IGNORE_CASE
              Pattern "a" matches both "a" and "A".
    static int IGNORE_SPACES
              Affects how the space characters are interpeted in the expression.
    static int MULTILINE
              Affects the behaviour of "^" and "$" tags.
    static int UNICODE
              Affects whether the predefined classes("\d","\s","\w",etc) in the expression are interpreted as belonging to Unicode.
    static int XML_SCHEMA
              Turns on the compatibility with XML Schema regular expressions.
     

    Field Detail

    DEFAULT

    public static final int DEFAULT
    All the foolowing options turned off

    IGNORE_CASE

    public static final int IGNORE_CASE
    Pattern "a" matches both "a" and "A". Corresponds to "i" in Perl notation.

    MULTILINE

    public static final int MULTILINE
    Affects the behaviour of "^" and "$" tags. When switched off:
  • the "^" matches the beginning of the whole text;
  • the "$" matches the end of the whole text, or just before the '\n' or "\r\n" at the end of text. When switched on:
  • the "^" additionally matches the line beginnings (that is just after the '\n');
  • the "$" additionally matches the line ends (that is just before "\r\n" or '\n'); Corresponds to "m" in Perl notation.

  • DOTALL

    public static final int DOTALL
    Affects the behaviour of dot(".") tag. When switched off:
  • the dot matches any character but EOLs('\r','\n'); When switched on:
  • the dot matches any character, including EOLs. This flag is sometimes referenced in regex tutorials as SINGLELINE, which confusingly seems opposite to MULTILINE, but in fact is orthogonal. Corresponds to "s" in Perl notation.

  • IGNORE_SPACES

    public static final int IGNORE_SPACES
    Affects how the space characters are interpeted in the expression. When switched off:
  • the spaces are interpreted literally; When switched on:
  • the spaces are ingnored, allowing an expression to be slightly more readable. Corresponds to "x" in Perl notation.

  • UNICODE

    public static final int UNICODE
    Affects whether the predefined classes("\d","\s","\w",etc) in the expression are interpreted as belonging to Unicode. When switched off:
  • the predefined classes are interpreted as ASCII; When switched on:
  • the predefined classes are interpreted as Unicode categories;

  • XML_SCHEMA

    public static final int XML_SCHEMA
    Turns on the compatibility with XML Schema regular expressions.


    jregex/docs/api/jregex/Matcher.html0000644000175000017500000014673307503220222017504 0ustar andriusandrius : Class Matcher

    jregex
    Class Matcher

    java.lang.Object
      |
      +--jregex.Matcher
    
    All Implemented Interfaces:
    MatchResult

    public class Matcher
    extends java.lang.Object
    implements MatchResult

    Matcher instance is an automaton that actually performs matching. It provides the following methods:

  • searching for a matching substrings : matcher.find() or matcher.findAll();
  • testing whether a text matches a whole pattern : matcher.matches();
  • testing whether the text matches the beginning of a pattern : matcher.matchesPrefix();
  • searching with custom options : matcher.find(int options)

    Obtaining results
    After the search succeded, i.e. if one of above methods returned true one may obtain an information on the match:

  • may check whether some group is captured : matcher.isCaptured(int);
  • may obtain start and end positions of the match and its length : matcher.start(int),matcher.end(int),matcher.length(int);
  • may obtain match contents as String : matcher.group(int).
    The same way can be obtained the match prefix and suffix information. The appropriate methods are grouped in MatchResult interface, which the Matcher class implements.
    Matcher objects are not thread-safe, so only one thread may use a matcher instance at a time. Note, that Pattern objects are thread-safe(the same instanse may be shared between multiple threads), and the typical tactics in multithreaded applications is to have one Pattern instance per expression(a singleton), and one Matcher object per thread.


    Field Summary
    static int ACCEPT_INCOMPLETE
              Experimental option; if a text ends up before the end of a pattern,report a match.
    static int ANCHOR_END
              The same effect as "$" without REFlags.MULTILINE.
    static int ANCHOR_LASTMATCH
              The same effect as "\\G".
    static int ANCHOR_START
              The same effect as "^" without REFlags.MULTILINE.
     
    Fields inherited from interface jregex.MatchResult
    MATCH, PREFIX, SUFFIX, TARGET
     
    Method Summary
     char charAt(int i)
               
     char charAt(int i, int groupId)
               
     int end()
               
     int end(int id)
               
     boolean find()
              Searches through a target for a matching substring, starting from just after the end of last match.
     boolean find(int anchors)
              Searches through a target for a matching substring, starting from just after the end of last match.
     MatchIterator findAll()
              The same as findAll(int), but with default behaviour;
     MatchIterator findAll(int options)
              Returns an iterator over the matches found by subsequently calling find(options), the search starts from the zero position.
     boolean getGroup(int n, java.lang.StringBuffer sb)
               
     boolean getGroup(int n, TextBuffer tb)
               
     boolean getGroup(java.lang.String name, java.lang.StringBuffer sb)
               
     boolean getGroup(java.lang.String name, TextBuffer tb)
               
     java.lang.String group(int n)
               
     java.lang.String group(java.lang.String name)
               
     int groupCount()
               
     java.lang.String[] groups()
               
     java.util.Vector groupv()
               
     boolean isCaptured()
               
     boolean isCaptured(int id)
               
     boolean isCaptured(java.lang.String groupName)
               
     boolean isStart()
              Deprecated. Replaced by isPrefix()
     int length()
               
     int length(int id)
               
     boolean matches()
              Tells whether a current target matches the whole pattern.
     boolean matches(java.lang.String s)
              Just a combination of setTarget(String) and matches().
     boolean matchesPrefix()
              Tells whether the entire target matches the beginning of the pattern.
     Pattern pattern()
               
     java.lang.String prefix()
               
     boolean proceed()
              Continues to search from where the last search left off.
     boolean proceed(int options)
              Continues to search from where the last search left off using specified options: Matcher m=new Pattern("\\w+").matcher("abc"); while(m.proceed(0)){ System.out.println(m.group(0)); } Output: abc ab a bc b c For example, let's find all odd nubmers occuring in a text: Matcher m=new Pattern("\\d+").matcher("123"); while(m.proceed(0)){ String match=m.group(0); if(isOdd(Integer.parseInt(match))) System.out.println(match); } static boolean isOdd(int i){ return (i&1)>0; } This outputs: 123 1 23 3 Note that using find() method we would find '123' only.
     void setPosition(int pos)
              Allows to set a position the subsequent find()/find(int) will start from.
     void setTarget(char[] text, int start, int len)
              Supplies a text to search in/match with, as a part of char array.
     void setTarget(char[] text, int start, int len, boolean shared)
              To be used with much care.
     void setTarget(Matcher m, int groupId)
              This method allows to efficiently pass data between matchers.
     void setTarget(java.io.Reader in, int len)
              Supplies a text to search in/match with through a stream.
     void setTarget(java.lang.String text)
              Supplies a text to search in/match with.
     void setTarget(java.lang.String text, int start, int len)
              Supplies a text to search in/match with, as a part of String.
     void skip()
              Sets the current search position just after the end of last match.
     int start()
               
     int start(int id)
               
     java.lang.String suffix()
               
     java.lang.String target()
               
     char[] targetChars()
               
     int targetEnd()
               
     int targetStart()
               
     java.lang.String toString_d()
               
     java.lang.String toString()
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
     

    Field Detail

    ANCHOR_START

    public static final int ANCHOR_START
    The same effect as "^" without REFlags.MULTILINE.
    See Also:
    find(int)

    ANCHOR_LASTMATCH

    public static final int ANCHOR_LASTMATCH
    The same effect as "\\G".
    See Also:
    find(int)

    ANCHOR_END

    public static final int ANCHOR_END
    The same effect as "$" without REFlags.MULTILINE.
    See Also:
    find(int)

    ACCEPT_INCOMPLETE

    public static final int ACCEPT_INCOMPLETE
    Experimental option; if a text ends up before the end of a pattern,report a match.
    See Also:
    find(int)
    Method Detail

    setTarget

    public final void setTarget(Matcher m,
                                int groupId)
    This method allows to efficiently pass data between matchers. Note that a matcher may pass data to itself:
       Matcher m=new Pattern("\\w+").matcher(myString);
       if(m.find())m.setTarget(m,m.SUFFIX); //forget all that is not a suffix
     
    Resets current search position to zero.
    Parameters:
    m - - a matcher that is a source of data
    groupId - - which group to take data from
    See Also:
    setTarget(java.lang.String), setTarget(java.lang.String,int,int), setTarget(char[],int,int), setTarget(java.io.Reader,int)

    setTarget

    public void setTarget(java.lang.String text)
    Supplies a text to search in/match with. Resets current search position to zero.
    Parameters:
    text - - a data
    See Also:
    setTarget(jregex.Matcher,int), setTarget(java.lang.String,int,int), setTarget(char[],int,int), setTarget(java.io.Reader,int)

    setTarget

    public void setTarget(java.lang.String text,
                          int start,
                          int len)
    Supplies a text to search in/match with, as a part of String. Resets current search position to zero.
    Parameters:
    text - - a data source
    start - - where the target starts
    len - - how long is the target
    See Also:
    setTarget(jregex.Matcher,int), setTarget(java.lang.String), setTarget(char[],int,int), setTarget(java.io.Reader,int)

    setTarget

    public void setTarget(char[] text,
                          int start,
                          int len)
    Supplies a text to search in/match with, as a part of char array. Resets current search position to zero.
    Parameters:
    text - - a data source
    start - - where the target starts
    len - - how long is the target
    See Also:
    setTarget(jregex.Matcher,int), setTarget(java.lang.String), setTarget(java.lang.String,int,int), setTarget(java.io.Reader,int)

    setTarget

    public final void setTarget(char[] text,
                                int start,
                                int len,
                                boolean shared)
    To be used with much care. Supplies a text to search in/match with, as a part of a char array, as above, but also allows to permit to use the array as internal buffer for subsequent inputs. That is, if we call it with shared=false:
       myMatcher.setTarget(myCharArray,x,y,false); //we declare that array contents is NEITHER shared NOR will be used later, so may modifications on it are permitted
     
    then we should expect the array contents to be changed on subsequent setTarget(..) operations. Such method may yield some increase in perfomanse in the case of multiple setTarget() calls. Resets current search position to zero.
    Parameters:
    text - - a data source
    start - - where the target starts
    len - - how long is the target
    shared - - if true: data are shared or used later, don't modify it; if false: possible modifications of the text on subsequent setTarget() calls are perceived and allowed.
    See Also:
    setTarget(jregex.Matcher,int), setTarget(java.lang.String), setTarget(java.lang.String,int,int), setTarget(char[],int,int), setTarget(java.io.Reader,int)

    setTarget

    public void setTarget(java.io.Reader in,
                          int len)
                   throws java.io.IOException
    Supplies a text to search in/match with through a stream. Resets current search position to zero.
    Parameters:
    in - - a data stream;
    len - - how much characters should be read; if len is -1, read the entire stream.
    See Also:
    setTarget(jregex.Matcher,int), setTarget(java.lang.String), setTarget(java.lang.String,int,int), setTarget(char[],int,int)

    matchesPrefix

    public final boolean matchesPrefix()
    Tells whether the entire target matches the beginning of the pattern. The whole pattern is also regarded as its beginning.
    This feature allows to find a mismatch by examining only a beginning part of the target (as if the beginning of the target doesn't match the beginning of the pattern, then the entire target also couldn't match).
    For example the following assertions yield true:
       Pattern p=new Pattern("abcd"); 
       p.matcher("").matchesPrefix();
       p.matcher("a").matchesPrefix();
       p.matcher("ab").matchesPrefix();
       p.matcher("abc").matchesPrefix();
       p.matcher("abcd").matchesPrefix();
     
    and the following yield false:
       p.matcher("b").isPrefix();
       p.matcher("abcdef").isPrefix();
       p.matcher("x").isPrefix();
     
    Returns:
    true if the entire target matches the beginning of the pattern

    isStart

    public final boolean isStart()
    Deprecated. Replaced by isPrefix()

    Just an old name for isPrefix().
    Retained for backwards compatibility.

    matches

    public final boolean matches()
    Tells whether a current target matches the whole pattern. For example the following yields the true:
       Pattern p=new Pattern("\\w+"); 
       p.matcher("a").matches();
       p.matcher("ab").matches();
       p.matcher("abc").matches();
     
    and the following yields the false:
       p.matcher("abc def").matches();
       p.matcher("bcd ").matches();
       p.matcher(" bcd").matches();
       p.matcher("#xyz#").matches();
     
    Returns:
    whether a current target matches the whole pattern.

    matches

    public final boolean matches(java.lang.String s)
    Just a combination of setTarget(String) and matches().
    Parameters:
    s - the target string;
    Returns:
    whether the specified string matches the whole pattern.

    setPosition

    public void setPosition(int pos)
    Allows to set a position the subsequent find()/find(int) will start from.
    Parameters:
    pos - the position to start from;
    See Also:
    find(), find(int)

    find

    public final boolean find()
    Searches through a target for a matching substring, starting from just after the end of last match. If there wasn't any search performed, starts from zero.
    Returns:
    true if a match found.

    find

    public final boolean find(int anchors)
    Searches through a target for a matching substring, starting from just after the end of last match. If there wasn't any search performed, starts from zero.
    Parameters:
    anchors - a zero or a combination(bitwise OR) of ANCHOR_START,ANCHOR_END,ANCHOR_LASTMATCH,ACCEPT_INCOMPLETE
    Returns:
    true if a match found.

    findAll

    public MatchIterator findAll()
    The same as findAll(int), but with default behaviour;

    findAll

    public MatchIterator findAll(int options)
    Returns an iterator over the matches found by subsequently calling find(options), the search starts from the zero position.

    proceed

    public final boolean proceed()
    Continues to search from where the last search left off. The same as proceed(0).
    See Also:
    proceed(int)

    proceed

    public final boolean proceed(int options)
    Continues to search from where the last search left off using specified options:
     Matcher m=new Pattern("\\w+").matcher("abc");
     while(m.proceed(0)){
        System.out.println(m.group(0));
     }
     
    Output:
     abc
     ab
     a
     bc
     b
     c
     
    For example, let's find all odd nubmers occuring in a text:
        Matcher m=new Pattern("\\d+").matcher("123");
        while(m.proceed(0)){
           String match=m.group(0);
           if(isOdd(Integer.parseInt(match))) System.out.println(match);
        }
        
        static boolean isOdd(int i){
           return (i&1)>0;
        }
     
    This outputs:
     123
     1
     23
     3
     
    Note that using find() method we would find '123' only.
    Parameters:
    options - search options, some of ANCHOR_START|ANCHOR_END|ANCHOR_LASTMATCH|ACCEPT_INCOMPLETE; zero value(default) stands for usual search for substring.

    skip

    public final void skip()
    Sets the current search position just after the end of last match.

    toString

    public java.lang.String toString()
    Overrides:
    toString in class java.lang.Object

    pattern

    public Pattern pattern()
    Specified by:
    pattern in interface MatchResult

    target

    public java.lang.String target()
    Specified by:
    target in interface MatchResult

    targetChars

    public char[] targetChars()
    Specified by:
    targetChars in interface MatchResult

    targetStart

    public int targetStart()
    Specified by:
    targetStart in interface MatchResult

    targetEnd

    public int targetEnd()
    Specified by:
    targetEnd in interface MatchResult

    charAt

    public char charAt(int i)
    Specified by:
    charAt in interface MatchResult

    charAt

    public char charAt(int i,
                       int groupId)
    Specified by:
    charAt in interface MatchResult

    length

    public final int length()
    Specified by:
    length in interface MatchResult

    start

    public final int start()
    Specified by:
    start in interface MatchResult

    end

    public final int end()
    Specified by:
    end in interface MatchResult

    prefix

    public java.lang.String prefix()
    Specified by:
    prefix in interface MatchResult

    suffix

    public java.lang.String suffix()
    Specified by:
    suffix in interface MatchResult

    groupCount

    public int groupCount()
    Specified by:
    groupCount in interface MatchResult

    group

    public java.lang.String group(int n)
    Specified by:
    group in interface MatchResult

    group

    public java.lang.String group(java.lang.String name)
    Specified by:
    group in interface MatchResult

    getGroup

    public boolean getGroup(int n,
                            TextBuffer tb)
    Specified by:
    getGroup in interface MatchResult

    getGroup

    public boolean getGroup(java.lang.String name,
                            TextBuffer tb)
    Specified by:
    getGroup in interface MatchResult

    getGroup

    public boolean getGroup(int n,
                            java.lang.StringBuffer sb)
    Specified by:
    getGroup in interface MatchResult

    getGroup

    public boolean getGroup(java.lang.String name,
                            java.lang.StringBuffer sb)
    Specified by:
    getGroup in interface MatchResult

    groups

    public java.lang.String[] groups()

    groupv

    public java.util.Vector groupv()

    isCaptured

    public final boolean isCaptured()
    Specified by:
    isCaptured in interface MatchResult

    isCaptured

    public final boolean isCaptured(int id)
    Specified by:
    isCaptured in interface MatchResult

    isCaptured

    public final boolean isCaptured(java.lang.String groupName)
    Specified by:
    isCaptured in interface MatchResult

    length

    public final int length(int id)
    Specified by:
    length in interface MatchResult

    start

    public final int start(int id)
    Specified by:
    start in interface MatchResult

    end

    public final int end(int id)
    Specified by:
    end in interface MatchResult

    toString_d

    public java.lang.String toString_d()


    jregex/docs/api/jregex/package-tree.html0000644000175000017500000001371507503220220020440 0ustar andriusandrius : jregex Class Hierarchy

    Hierarchy For Package jregex

    Package Hierarchies:
    All Packages

    Class Hierarchy

    Interface Hierarchy



    jregex/docs/api/jregex/Optimizer.html0000644000175000017500000001556107503220222020075 0ustar andriusandrius : Class Optimizer

    jregex
    Class Optimizer

    java.lang.Object
      |
      +--jregex.Optimizer
    

    public class Optimizer
    extends java.lang.Object


    Field Summary
    static int THRESHOLD
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    THRESHOLD

    public static final int THRESHOLD


    jregex/docs/api/jregex/Pattern.html0000644000175000017500000010405707503220222017527 0ustar andriusandrius : Class Pattern

    jregex
    Class Pattern

    java.lang.Object
      |
      +--jregex.Pattern
    
    All Implemented Interfaces:
    REFlags, java.io.Serializable
    Direct Known Subclasses:
    PathPattern, WildcardPattern

    public class Pattern
    extends java.lang.Object
    implements java.io.Serializable, REFlags

    A handle for a precompiled regular expression.
    To match a regular expression myExpr against a text myString one should first create a Pattern object:

     Pattern p=new Pattern(myExpr);
     
    then obtain a Matcher object:
     Matcher matcher=p.matcher(myText);
     
    The latter is an automaton that actually performs a search. It provides the following methods:
  • search for matching substrings : matcher.find() or matcher.findAll();
  • test whether the text matches the whole pattern : matcher.matches();
  • test whether the text matches the beginning of the pattern : matcher.matchesPrefix();
  • search with custom options : matcher.find(int options)

    Flags
    Flags (see REFlags interface) change the meaning of some regular expression elements at compiletime. These flags may be passed both as string(see Pattern(String,String)) and as bitwise OR of:

  • REFlags.IGNORE_CASE - enables case insensitivity
  • REFlags.MULTILINE - forces "^" and "$" to match both at the start and the end of line;
  • REFlags.DOTALL - forces "." to match eols('\r' and '\n' in ASCII);
  • REFlags.IGNORE_SPACES - literal spaces in expression are ignored for better readability;
  • REFlags.UNICODE - the predefined classes('\w','\d',etc) are referenced to Unicode;
  • REFlags.XML_SCHEMA - permits XML Schema regular expressions syntax extentions.

    Multithreading
    Pattern instances are thread-safe, i.e. the same Pattern object may be used by any number of threads simultaniously. On the other hand, the Matcher objects are NOT thread safe, so, given a Pattern instance, each thread must obtain and use its own Matcher.

    See Also:
    REFlags, Matcher, Matcher.setTarget(java.lang.String), Matcher.setTarget(java.lang.String,int,int), Matcher.setTarget(char[],int,int), Matcher.setTarget(java.io.Reader,int), MatchResult, MatchResult.group(int), MatchResult.start(int), MatchResult.end(int), MatchResult.length(int), MatchResult.charAt(int,int), MatchResult.prefix(), MatchResult.suffix(), Serialized Form

    Fields inherited from interface jregex.REFlags
    DEFAULT, DOTALL, IGNORE_CASE, IGNORE_SPACES, MULTILINE, UNICODE, XML_SCHEMA
     
    Constructor Summary
    protected Pattern()
               
      Pattern(java.lang.String regex)
              Compiles an expression with default flags.
      Pattern(java.lang.String regex, int flags)
              Compiles a regular expression using REFlags.
      Pattern(java.lang.String regex, java.lang.String flags)
              Compiles a regular expression using Perl5-style flags.
     
    Method Summary
    protected  void compile(java.lang.String regex, int flags)
               
     int groupCount()
              How many capturing groups this expression includes?
     java.lang.Integer groupId(java.lang.String name)
              Get numeric id for a group name.
     Matcher matcher()
              Returns a targetless matcher.
     Matcher matcher(char[] data, int start, int end)
              Returns a matcher for a specified region.
     Matcher matcher(MatchResult res, int groupId)
              Returns a matcher for a match result (in a performance-friendly way).
     Matcher matcher(MatchResult res, java.lang.String groupName)
              Just as above, yet with symbolic group name.
     Matcher matcher(java.io.Reader text, int length)
              Returns a matcher taking a text stream as target.
     Matcher matcher(java.lang.String s)
              Returns a matcher for a specified string.
     boolean matches(java.lang.String s)
              A shorthand for Pattern.matcher(String).matches().
     Replacer replacer(java.lang.String expr)
              Returns a replacer of a pattern by specified perl-like expression.
     Replacer replacer(Substitution model)
              Returns a replacer will substitute all occurences of a pattern through applying a user-defined substitution model.
     boolean startsWith(java.lang.String s)
              A shorthand for Pattern.matcher(String).matchesPrefix().
     RETokenizer tokenizer(char[] data, int off, int len)
              Tokenizes a specified region by an occurences of the pattern.
     RETokenizer tokenizer(java.io.Reader in, int length)
              Tokenizes a specified region by an occurences of the pattern.
     RETokenizer tokenizer(java.lang.String text)
              Tokenizes a text by an occurences of the pattern.
     java.lang.String toString_d()
              Returns a less or more readable representation of a bytecode for the pattern.
     java.lang.String toString()
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
     

    Constructor Detail

    Pattern

    protected Pattern()
               throws PatternSyntaxException

    Pattern

    public Pattern(java.lang.String regex)
            throws PatternSyntaxException
    Compiles an expression with default flags.
    Parameters:
    regex - the Perl5-compatible regular expression string.
    Throws:
    PatternSyntaxException - if the argument doesn't correspond to perl5 regex syntax.
    See Also:
    Pattern(java.lang.String,java.lang.String), Pattern(java.lang.String,int)

    Pattern

    public Pattern(java.lang.String regex,
                   java.lang.String flags)
            throws PatternSyntaxException
    Compiles a regular expression using Perl5-style flags. The flag string should consist of letters 'i','m','s','x','u','X'(the case is significant) and a hyphen. The meaning of letters:
    • i - case insensitivity, corresponds to REFLlags.IGNORE_CASE;
    • m - multiline treatment(BOLs and EOLs affect the '^' and '$'), corresponds to REFLlags.MULTILINE flag;
    • s - single line treatment('.' matches \r's and \n's),corresponds to REFLlags.DOTALL;
    • x - extended whitespace comments (spaces and eols in the expression are ignored), corresponds to REFLlags.IGNORE_SPACES.
    • u - predefined classes are regarded as belonging to Unicode, corresponds to REFLlags.UNICODE; this may yield some performance penalty.
    • X - compatibility with XML Schema, corresponds to REFLlags.XML_SCHEMA.
    Parameters:
    regex - the Perl5-compatible regular expression string.
    flags - the Perl5-compatible flags.
    Throws:
    PatternSyntaxException - if the argument doesn't correspond to perl5 regex syntax. see REFlags

    Pattern

    public Pattern(java.lang.String regex,
                   int flags)
            throws PatternSyntaxException
    Compiles a regular expression using REFlags. The flags parameter is a bitwise OR of the folloing values:
    • REFLlags.IGNORE_CASE - case insensitivity, corresponds to 'i' letter;
    • REFLlags.MULTILINE - multiline treatment(BOLs and EOLs affect the '^' and '$'), corresponds to 'm';
    • REFLlags.DOTALL - single line treatment('.' matches \r's and \n's),corresponds to 's';
    • REFLlags.IGNORE_SPACES - extended whitespace comments (spaces and eols in the expression are ignored), corresponds to 'x'.
    • REFLlags.UNICODE - predefined classes are regarded as belonging to Unicode, corresponds to 'u'; this may yield some performance penalty.
    • REFLlags.XML_SCHEMA - compatibility with XML Schema, corresponds to 'X'.
    Parameters:
    regex - the Perl5-compatible regular expression string.
    flags - the Perl5-compatible flags.
    Throws:
    PatternSyntaxException - if the argument doesn't correspond to perl5 regex syntax. see REFlags
    Method Detail

    compile

    protected void compile(java.lang.String regex,
                           int flags)
                    throws PatternSyntaxException

    groupCount

    public int groupCount()
    How many capturing groups this expression includes?

    groupId

    public java.lang.Integer groupId(java.lang.String name)
    Get numeric id for a group name.
    Returns:
    null if no such name found.
    See Also:
    MatchResult.group(java.lang.String), MatchResult.isCaptured(java.lang.String)

    matches

    public boolean matches(java.lang.String s)
    A shorthand for Pattern.matcher(String).matches().
    Parameters:
    s - the target
    Returns:
    true if the entire target matches the pattern
    See Also:
    Matcher.matches(), Matcher.matches(String)

    startsWith

    public boolean startsWith(java.lang.String s)
    A shorthand for Pattern.matcher(String).matchesPrefix().
    Parameters:
    s - the target
    Returns:
    true if the entire target matches the beginning of the pattern
    See Also:
    Matcher.matchesPrefix()

    matcher

    public Matcher matcher()
    Returns a targetless matcher. Don't forget to supply a target.

    matcher

    public Matcher matcher(java.lang.String s)
    Returns a matcher for a specified string.

    matcher

    public Matcher matcher(char[] data,
                           int start,
                           int end)
    Returns a matcher for a specified region.

    matcher

    public Matcher matcher(MatchResult res,
                           int groupId)
    Returns a matcher for a match result (in a performance-friendly way). groupId parameter specifies which group is a target.
    Parameters:
    groupId - which group is a target; either positive integer(group id), or one of MatchResult.MATCH,MatchResult.PREFIX,MatchResult.SUFFIX,MatchResult.TARGET.

    matcher

    public Matcher matcher(MatchResult res,
                           java.lang.String groupName)
    Just as above, yet with symbolic group name.
    Throws:
    NullPointerException - if there is no group with such name

    matcher

    public Matcher matcher(java.io.Reader text,
                           int length)
                    throws java.io.IOException
    Returns a matcher taking a text stream as target. Note that this is not a true POSIX-style stream matching, i.e. the whole length of the text is preliminary read and stored in a char array.
    Parameters:
    text - a text stream
    len - the length to read from a stream; if len is -1, the whole stream is read in.
    Throws:
    java.io.IOException - indicates an IO problem
    OutOfMemoryException - if a stream is too lengthy

    replacer

    public Replacer replacer(java.lang.String expr)
    Returns a replacer of a pattern by specified perl-like expression. Such replacer will substitute all occurences of a pattern by an evaluated expression ("$&" and "$0" will substitute by the whole match, "$1" will substitute by group#1, etc). Example:
     String text="The quick brown fox jumped over the lazy dog";
     Pattern word=new Pattern("\\w+");
     System.out.println(word.replacer("[$&]").replace(text));
     //prints "[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog]"
     Pattern swap=new Pattern("(fox|dog)(.*?)(fox|dog)");
     System.out.println(swap.replacer("$3$2$1").replace(text));
     //prints "The quick brown dog jumped over the lazy fox"
     Pattern scramble=new Pattern("(\\w+)(.*?)(\\w+)");
     System.out.println(scramble.replacer("$3$2$1").replace(text));
     //prints "quick The fox brown over jumped lazy the dog"
     
    Parameters:
    expr - a perl-like expression, the "$&" and "${&}" standing for whole match, the "$N" and "${N}" standing for group#N, and "${Foo}" standing for named group Foo.
    See Also:
    Replacer

    replacer

    public Replacer replacer(Substitution model)
    Returns a replacer will substitute all occurences of a pattern through applying a user-defined substitution model.
    Parameters:
    model - a Substitution object which is in charge for match substitution
    See Also:
    Replacer

    tokenizer

    public RETokenizer tokenizer(java.lang.String text)
    Tokenizes a text by an occurences of the pattern. Note that a series of adjacent matches are regarded as a single separator. The same as new RETokenizer(Pattern,String);
    See Also:
    RETokenizer, RETokenizer.RETokenizer(jregex.Pattern,java.lang.String)

    tokenizer

    public RETokenizer tokenizer(char[] data,
                                 int off,
                                 int len)
    Tokenizes a specified region by an occurences of the pattern. Note that a series of adjacent matches are regarded as a single separator. The same as new RETokenizer(Pattern,char[],int,int);
    See Also:
    RETokenizer, RETokenizer.RETokenizer(jregex.Pattern,char[],int,int)

    tokenizer

    public RETokenizer tokenizer(java.io.Reader in,
                                 int length)
                          throws java.io.IOException
    Tokenizes a specified region by an occurences of the pattern. Note that a series of adjacent matches are regarded as a single separator. The same as new RETokenizer(Pattern,Reader,int);
    See Also:
    RETokenizer, RETokenizer.RETokenizer(jregex.Pattern,java.io.Reader,int)

    toString

    public java.lang.String toString()
    Overrides:
    toString in class java.lang.Object

    toString_d

    public java.lang.String toString_d()
    Returns a less or more readable representation of a bytecode for the pattern.


    jregex/docs/api/index-all.html0000644000175000017500000013715207503220220016503 0ustar andriusandrius : Index
    A C D E F G H I J L M N O P R S T U V W X

    A

    ACCEPT_INCOMPLETE - Static variable in class jregex.Matcher
    Experimental option; if a text ends up before the end of a pattern,report a match.
    accept(File, String) - Method in class jregex.util.io.WildcardFilter
     
    ANCHOR_END - Static variable in class jregex.Matcher
    The same effect as "$" without REFlags.MULTILINE.
    ANCHOR_LASTMATCH - Static variable in class jregex.Matcher
    The same effect as "\\G".
    ANCHOR_START - Static variable in class jregex.Matcher
    The same effect as "^" without REFlags.MULTILINE.
    ANY_CHAR - Static variable in class jregex.WildcardPattern
     
    append(char) - Method in interface jregex.TextBuffer
     
    append(char[], int, int) - Method in interface jregex.TextBuffer
     
    append(String) - Method in interface jregex.TextBuffer
     
    appendSubstitution(MatchResult, TextBuffer) - Method in interface jregex.Substitution
     
    appendSubstitution(MatchResult, TextBuffer) - Method in class jregex.PerlSubstitution
     

    C

    charAt(int) - Method in interface jregex.MatchResult
     
    charAt(int) - Method in class jregex.Matcher
     
    charAt(int, int) - Method in interface jregex.MatchResult
     
    charAt(int, int) - Method in class jregex.Matcher
     
    compile(String, int) - Method in class jregex.Pattern
     
    compile(String, String, String, int) - Method in class jregex.WildcardPattern
     
    convertSpecials(String, String, String) - Static method in class jregex.WildcardPattern
     
    count() - Method in interface jregex.MatchIterator
     

    D

    DEFAULT - Static variable in interface jregex.REFlags
    All the foolowing options turned off
    defaultInstantiator - Static variable in class jregex.util.io.ListEnumerator
     
    directory() - Method in class jregex.util.io.PathPattern
    Deprecated. Is meaningless with regard to variable paths (since v.1.2)
    DOTALL - Static variable in interface jregex.REFlags
    Affects the behaviour of dot(".") tag.

    E

    end() - Method in interface jregex.MatchResult
     
    end() - Method in class jregex.Matcher
     
    end(int) - Method in interface jregex.MatchResult
     
    end(int) - Method in class jregex.Matcher
     
    enumerateFiles() - Method in class jregex.util.io.PathPattern
     

    F

    files() - Method in class jregex.util.io.PathPattern
     
    find() - Method in class jregex.Matcher
    Searches through a target for a matching substring, starting from just after the end of last match.
    find() - Method in class jregex.util.io.ListEnumerator
     
    find(int) - Method in class jregex.Matcher
    Searches through a target for a matching substring, starting from just after the end of last match.
    findAll() - Method in class jregex.Matcher
    The same as findAll(int), but with default behaviour;
    findAll(int) - Method in class jregex.Matcher
    Returns an iterator over the matches found by subsequently calling find(options), the search starts from the zero position.

    G

    getGroup(int, StringBuffer) - Method in interface jregex.MatchResult
     
    getGroup(int, StringBuffer) - Method in class jregex.Matcher
     
    getGroup(int, TextBuffer) - Method in interface jregex.MatchResult
     
    getGroup(int, TextBuffer) - Method in class jregex.Matcher
     
    getGroup(String, StringBuffer) - Method in interface jregex.MatchResult
     
    getGroup(String, StringBuffer) - Method in class jregex.Matcher
     
    getGroup(String, TextBuffer) - Method in interface jregex.MatchResult
     
    getGroup(String, TextBuffer) - Method in class jregex.Matcher
     
    group(int) - Method in interface jregex.MatchResult
     
    group(int) - Method in class jregex.Matcher
     
    group(String) - Method in interface jregex.MatchResult
     
    group(String) - Method in class jregex.Matcher
     
    groupCount() - Method in class jregex.Pattern
    How many capturing groups this expression includes?
    groupCount() - Method in interface jregex.MatchResult
     
    groupCount() - Method in class jregex.Matcher
     
    groupId(String) - Method in class jregex.Pattern
    Get numeric id for a group name.
    groups() - Method in class jregex.Matcher
     
    groupv() - Method in class jregex.Matcher
     

    H

    hasMore() - Method in interface jregex.MatchIterator
     
    hasMore() - Method in class jregex.RETokenizer
     
    hasMoreElements() - Method in class jregex.RETokenizer
     

    I

    IGNORE_CASE - Static variable in interface jregex.REFlags
    Pattern "a" matches both "a" and "A".
    IGNORE_SPACES - Static variable in interface jregex.REFlags
    Affects how the space characters are interpeted in the expression.
    instantiate(File, String) - Method in interface jregex.util.io.ListEnumerator.Instantiator
     
    isCaptured() - Method in interface jregex.MatchResult
     
    isCaptured() - Method in class jregex.Matcher
     
    isCaptured(int) - Method in interface jregex.MatchResult
     
    isCaptured(int) - Method in class jregex.Matcher
     
    isCaptured(String) - Method in interface jregex.MatchResult
     
    isCaptured(String) - Method in class jregex.Matcher
     
    isEmptyEnabled() - Method in class jregex.RETokenizer
     
    isStart() - Method in class jregex.Matcher
    Deprecated. Replaced by isPrefix()

    J

    jregex - package jregex
     
    jregex.util.io - package jregex.util.io
     

    L

    length() - Method in interface jregex.MatchResult
     
    length() - Method in class jregex.Matcher
     
    length(int) - Method in interface jregex.MatchResult
     
    length(int) - Method in class jregex.Matcher
     
    ListEnumerator - class jregex.util.io.ListEnumerator.
     
    ListEnumerator.Instantiator - interface jregex.util.io.ListEnumerator.Instantiator.
     
    ListEnumerator(File, ListEnumerator.Instantiator) - Constructor for class jregex.util.io.ListEnumerator
     
    ListEnumerator(File, String[], ListEnumerator.Instantiator) - Constructor for class jregex.util.io.ListEnumerator
     

    M

    MATCH - Static variable in interface jregex.MatchResult
     
    Matcher - class jregex.Matcher.
    Matcher instance is an automaton that actually performs matching.
    matcher() - Method in class jregex.Pattern
    Returns a targetless matcher.
    matcher(char[], int, int) - Method in class jregex.Pattern
    Returns a matcher for a specified region.
    matcher(MatchResult, int) - Method in class jregex.Pattern
    Returns a matcher for a match result (in a performance-friendly way).
    matcher(MatchResult, String) - Method in class jregex.Pattern
    Just as above, yet with symbolic group name.
    matcher(Reader, int) - Method in class jregex.Pattern
    Returns a matcher taking a text stream as target.
    matcher(String) - Method in class jregex.Pattern
    Returns a matcher for a specified string.
    matches() - Method in class jregex.Matcher
    Tells whether a current target matches the whole pattern.
    matches(String) - Method in class jregex.Pattern
    A shorthand for Pattern.matcher(String).matches().
    matches(String) - Method in class jregex.Matcher
    Just a combination of setTarget(String) and matches().
    matchesPrefix() - Method in class jregex.Matcher
    Tells whether the entire target matches the beginning of the pattern.
    MatchIterator - interface jregex.MatchIterator.
     
    MatchResult - interface jregex.MatchResult.
     
    MULTILINE - Static variable in interface jregex.REFlags
    Affects the behaviour of "^" and "$" tags.

    N

    names() - Method in class jregex.util.io.PathPattern
    Deprecated. Is meaningless with regard to variable paths (since v.1.2)
    nextElement() - Method in class jregex.RETokenizer
     
    nextMatch() - Method in interface jregex.MatchIterator
     
    nextToken() - Method in class jregex.RETokenizer
     

    O

    Optimizer - class jregex.Optimizer.
     

    P

    PathPattern - class jregex.util.io.PathPattern.
    A special-purpose subclass of the Pattern class.
    PathPattern(File, String, boolean) - Constructor for class jregex.util.io.PathPattern
     
    PathPattern(File, String, int) - Constructor for class jregex.util.io.PathPattern
     
    PathPattern(String) - Constructor for class jregex.util.io.PathPattern
     
    PathPattern(String, boolean) - Constructor for class jregex.util.io.PathPattern
     
    PathPattern(String, int) - Constructor for class jregex.util.io.PathPattern
     
    Pattern - class jregex.Pattern.
    A handle for a precompiled regular expression.
    To match a regular expression myExpr against a text myString one should first create a Pattern object: Pattern p=new Pattern(myExpr); then obtain a Matcher object: Matcher matcher=p.matcher(myText); The latter is an automaton that actually performs a search.
    pattern() - Method in interface jregex.MatchResult
     
    pattern() - Method in class jregex.Matcher
     
    Pattern() - Constructor for class jregex.Pattern
     
    Pattern(String) - Constructor for class jregex.Pattern
    Compiles an expression with default flags.
    Pattern(String, int) - Constructor for class jregex.Pattern
    Compiles a regular expression using REFlags.
    Pattern(String, String) - Constructor for class jregex.Pattern
    Compiles a regular expression using Perl5-style flags.
    PatternSyntaxException - exception jregex.PatternSyntaxException.
    Is thrown when Pattern constructor's argument doesn't conform the Perl5 regular expression syntax.
    PatternSyntaxException(String) - Constructor for class jregex.PatternSyntaxException
     
    PerlSubstitution - class jregex.PerlSubstitution.
    An implementation of the Substitution interface.
    PerlSubstitution(String) - Constructor for class jregex.PerlSubstitution
     
    PREFIX - Static variable in interface jregex.MatchResult
     
    prefix() - Method in interface jregex.MatchResult
     
    prefix() - Method in class jregex.Matcher
     
    proceed() - Method in class jregex.Matcher
    Continues to search from where the last search left off.
    proceed(int) - Method in class jregex.Matcher
    Continues to search from where the last search left off using specified options: Matcher m=new Pattern("\\w+").matcher("abc"); while(m.proceed(0)){ System.out.println(m.group(0)); } Output: abc ab a bc b c For example, let's find all odd nubmers occuring in a text: Matcher m=new Pattern("\\d+").matcher("123"); while(m.proceed(0)){ String match=m.group(0); if(isOdd(Integer.parseInt(match))) System.out.println(match); } static boolean isOdd(int i){ return (i&1)>0; } This outputs: 123 1 23 3 Note that using find() method we would find '123' only.

    R

    REFlags - interface jregex.REFlags.
     
    replace(char[], int, int) - Method in class jregex.Replacer
     
    replace(char[], int, int, StringBuffer) - Method in class jregex.Replacer
     
    replace(char[], int, int, TextBuffer) - Method in class jregex.Replacer
     
    replace(char[], int, int, Writer) - Method in class jregex.Replacer
     
    replace(Matcher, Substitution, TextBuffer) - Static method in class jregex.Replacer
    Replaces all occurences of a matcher's pattern in a matcher's target by a given substitution appending the result to a buffer.
    The substitution starts from current matcher's position, current match not included.
    replace(Matcher, Substitution, Writer) - Static method in class jregex.Replacer
     
    replace(MatchResult, int) - Method in class jregex.Replacer
     
    replace(MatchResult, int, StringBuffer) - Method in class jregex.Replacer
     
    replace(MatchResult, int, TextBuffer) - Method in class jregex.Replacer
     
    replace(MatchResult, int, Writer) - Method in class jregex.Replacer
     
    replace(MatchResult, String, StringBuffer) - Method in class jregex.Replacer
     
    replace(MatchResult, String, TextBuffer) - Method in class jregex.Replacer
     
    replace(MatchResult, String, Writer) - Method in class jregex.Replacer
     
    replace(Reader, int) - Method in class jregex.Replacer
     
    replace(Reader, int, StringBuffer) - Method in class jregex.Replacer
     
    replace(Reader, int, TextBuffer) - Method in class jregex.Replacer
     
    replace(Reader, int, Writer) - Method in class jregex.Replacer
     
    replace(String) - Method in class jregex.Replacer
     
    replace(String, StringBuffer) - Method in class jregex.Replacer
     
    replace(String, TextBuffer) - Method in class jregex.Replacer
     
    replace(String, Writer) - Method in class jregex.Replacer
     
    Replacer - class jregex.Replacer.
    The Replacer class suggests some methods to replace occurences of a pattern either by a result of evaluation of a perl-like expression, or by a plain string, or according to a custom substitution model, provided as a Substitution interface implementation.
    A Replacer instance may be obtained either using Pattern.replacer(...) method, or by constructor: Pattern p=new Pattern("\\w+"); Replacer perlExpressionReplacer=p.replacer("[$&]"); //or another way to do the same Substitution myOwnModel=new Substitution(){ public void appendSubstitution(MatchResult match,TextBuffer tb){ tb.append('['); match.getGroup(MatchResult.MATCH,tb); tb.append(']'); } } Replacer myVeryOwnReplacer=new Replacer(p,myOwnModel); The second method is much more verbose, but gives more freedom.
    Replacer(Pattern, String) - Constructor for class jregex.Replacer
     
    Replacer(Pattern, String, boolean) - Constructor for class jregex.Replacer
     
    Replacer(Pattern, Substitution) - Constructor for class jregex.Replacer
     
    replacer(String) - Method in class jregex.Pattern
    Returns a replacer of a pattern by specified perl-like expression.
    replacer(Substitution) - Method in class jregex.Pattern
    Returns a replacer will substitute all occurences of a pattern through applying a user-defined substitution model.
    reset() - Method in class jregex.RETokenizer
     
    RETokenizer - class jregex.RETokenizer.
    The Tokenizer class suggests a methods to break a text into tokens using occurences of a pattern as delimiters.
    RETokenizer(Matcher, boolean) - Constructor for class jregex.RETokenizer
     
    RETokenizer(Pattern, char[], int, int) - Constructor for class jregex.RETokenizer
     
    RETokenizer(Pattern, Reader, int) - Constructor for class jregex.RETokenizer
     
    RETokenizer(Pattern, String) - Constructor for class jregex.RETokenizer
     

    S

    setEmptyEnabled(boolean) - Method in class jregex.RETokenizer
     
    setPosition(int) - Method in class jregex.Matcher
    Allows to set a position the subsequent find()/find(int) will start from.
    setSubstitution(String, boolean) - Method in class jregex.Replacer
     
    setTarget(char[], int, int) - Method in class jregex.Matcher
    Supplies a text to search in/match with, as a part of char array.
    setTarget(char[], int, int, boolean) - Method in class jregex.Matcher
    To be used with much care.
    setTarget(Matcher, int) - Method in class jregex.Matcher
    This method allows to efficiently pass data between matchers.
    setTarget(Reader, int) - Method in class jregex.Matcher
    Supplies a text to search in/match with through a stream.
    setTarget(String) - Method in class jregex.Matcher
    Supplies a text to search in/match with.
    setTarget(String, int, int) - Method in class jregex.Matcher
    Supplies a text to search in/match with, as a part of String.
    skip() - Method in class jregex.Matcher
    Sets the current search position just after the end of last match.
    split() - Method in class jregex.RETokenizer
     
    start() - Method in interface jregex.MatchResult
     
    start() - Method in class jregex.Matcher
     
    start(int) - Method in interface jregex.MatchResult
     
    start(int) - Method in class jregex.Matcher
     
    startsWith(String) - Method in class jregex.Pattern
    A shorthand for Pattern.matcher(String).matchesPrefix().
    Substitution - interface jregex.Substitution.
     
    SUFFIX - Static variable in interface jregex.MatchResult
     
    suffix() - Method in interface jregex.MatchResult
     
    suffix() - Method in class jregex.Matcher
     

    T

    TARGET - Static variable in interface jregex.MatchResult
     
    target() - Method in interface jregex.MatchResult
     
    target() - Method in class jregex.Matcher
     
    targetChars() - Method in interface jregex.MatchResult
     
    targetChars() - Method in class jregex.Matcher
     
    targetEnd() - Method in interface jregex.MatchResult
     
    targetEnd() - Method in class jregex.Matcher
     
    targetStart() - Method in interface jregex.MatchResult
     
    targetStart() - Method in class jregex.Matcher
     
    TextBuffer - interface jregex.TextBuffer.
     
    THRESHOLD - Static variable in class jregex.Optimizer
     
    tokenizer(char[], int, int) - Method in class jregex.Pattern
    Tokenizes a specified region by an occurences of the pattern.
    tokenizer(Reader, int) - Method in class jregex.Pattern
    Tokenizes a specified region by an occurences of the pattern.
    tokenizer(String) - Method in class jregex.Pattern
    Tokenizes a text by an occurences of the pattern.
    toString_d() - Method in class jregex.Pattern
    Returns a less or more readable representation of a bytecode for the pattern.
    toString_d() - Method in class jregex.Matcher
     
    toString() - Method in class jregex.Pattern
     
    toString() - Method in class jregex.WildcardPattern
     
    toString() - Method in class jregex.Matcher
     
    toString() - Method in class jregex.PerlSubstitution
     
    toString() - Method in class jregex.util.io.PathPattern
     

    U

    UNICODE - Static variable in interface jregex.REFlags
    Affects whether the predefined classes("\d","\s","\w",etc) in the expression are interpreted as belonging to Unicode.

    V

    value(MatchResult) - Method in class jregex.PerlSubstitution
     

    W

    WildcardFilter - class jregex.util.io.WildcardFilter.
     
    WildcardFilter(String) - Constructor for class jregex.util.io.WildcardFilter
     
    WildcardFilter(String, boolean) - Constructor for class jregex.util.io.WildcardFilter
     
    WildcardPattern - class jregex.WildcardPattern.
    A Pattern subclass that accepts a simplified pattern syntax: ?
    WildcardPattern() - Constructor for class jregex.WildcardPattern
     
    WildcardPattern(String) - Constructor for class jregex.WildcardPattern
     
    WildcardPattern(String, boolean) - Constructor for class jregex.WildcardPattern
     
    WildcardPattern(String, int) - Constructor for class jregex.WildcardPattern
     
    WildcardPattern(String, String, int) - Constructor for class jregex.WildcardPattern
     
    WORD_CHAR - Static variable in class jregex.WildcardPattern
     
    wrap(StringBuffer) - Static method in class jregex.Replacer
     
    wrap(Writer) - Static method in class jregex.Replacer
     

    X

    XML_SCHEMA - Static variable in interface jregex.REFlags
    Turns on the compatibility with XML Schema regular expressions.

    A C D E F G H I J L M N O P R S T U V W X

    jregex/docs/api/help-doc.html0000644000175000017500000001627107503220222016321 0ustar andriusandrius : API Help

    How This API Document Is Organized

    This API (Application Programming Interface) document has pages corresponding to the items in the navigation bar, described as follows.

    Overview

    The Overview page is the front page of this API document and provides a list of all packages with a summary for each. This page can also contain an overall description of the set of packages.

    Package

    Each package has a page that contains a list of its classes and interfaces, with a summary for each. This page can contain four categories:

    • Interfaces (italic)
    • Classes
    • Exceptions
    • Errors

    Class/Interface

    Each class, interface, inner class and inner interface has its own separate page. Each of these pages has three sections consisting of a class/interface description, summary tables, and detailed member descriptions:

    • Class inheritance diagram
    • Direct Subclasses
    • All Known Subinterfaces
    • All Known Implementing Classes
    • Class/interface declaration
    • Class/interface description

    • Inner Class Summary
    • Field Summary
    • Constructor Summary
    • Method Summary

    • Field Detail
    • Constructor Detail
    • Method Detail
    Each summary entry contains the first sentence from the detailed description for that item. The summary entries are alphabetical, while the detailed descriptions are in the order they appear in the source code. This preserves the logical groupings established by the programmer.

    Tree (Class Hierarchy)

    There is a Class Hierarchy page for all packages, plus a hierarchy for each package. Each hierarchy page contains a list of classes and a list of interfaces. The classes are organized by inheritance structure starting with java.lang.Object. The interfaces do not inherit from java.lang.Object.
    • When viewing the Overview page, clicking on "Tree" displays the hierarchy for all packages.
    • When viewing a particular package, class or interface page, clicking "Tree" displays the hierarchy for only that package.

    Deprecated API

    The Deprecated API page lists all of the API that have been deprecated. A deprecated API is not recommended for use, generally due to improvements, and a replacement API is usually given. Deprecated APIs may be removed in future implementations.

    Index

    The Index contains an alphabetic list of all classes, interfaces, constructors, methods, and fields.

    Prev/Next

    These links take you to the next or previous class, interface, package, or related page.

    Frames/No Frames

    These links show and hide the HTML frames. All pages are available with or without frames.

    Serialized Form

    Each serializable or externalizable class has a description of its serialization fields and methods. This information is of interest to re-implementors, not to developers using the API. While there is no link in the navigation bar, you can get to this information by going to any serialized class and clicking "Serialized Form" in the "See also" section of the class description.

    This help file applies to API documentation generated using the standard doclet.



    jregex/docs/api/overview-frame.html0000644000175000017500000000175207503220220017560 0ustar andriusandrius : Overview
    All Classes

    Packages
    jregex
    jregex.util.io

      jregex/docs/api/stylesheet.css0000644000175000017500000000236507503220222016642 0ustar andriusandrius/* Javadoc style sheet */ /* Define colors, fonts and other style attributes here to override the defaults */ /* Page background color */ body { background-color: #FFFFFF } /* Table colors */ .TableHeadingColor { background: #CCCCFF } /* Dark mauve */ .TableSubHeadingColor { background: #EEEEFF } /* Light mauve */ .TableRowColor { background: #FFFFFF } /* White */ /* Font used in left-hand frame lists */ .FrameTitleFont { font-size: normal; font-family: normal } .FrameHeadingFont { font-size: normal; font-family: normal } .FrameItemFont { font-size: normal; font-family: normal } /* Example of smaller, sans-serif font in frames */ /* .FrameItemFont { font-size: 10pt; font-family: Helvetica, Arial, sans-serif } */ /* Navigation bar fonts and colors */ .NavBarCell1 { background-color:#EEEEFF;}/* Light mauve */ .NavBarCell1Rev { background-color:#00008B;}/* Dark Blue */ .NavBarFont1 { font-family: Arial, Helvetica, sans-serif; color:#000000;} .NavBarFont1Rev { font-family: Arial, Helvetica, sans-serif; color:#FFFFFF;} .NavBarCell2 { font-family: Arial, Helvetica, sans-serif; background-color:#FFFFFF;} .NavBarCell3 { font-family: Arial, Helvetica, sans-serif; background-color:#FFFFFF;} jregex/docs/license.txt0000644000175000017500000000276407427377030015373 0ustar andriusandriusCopyright (c) 2001, Sergey A. Samokhodkin All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. - Neither the name of jregex nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. @versionjregex/jregex/0000755000175000017500000000000013732322456013531 5ustar andriusandriusjregex/jregex/Optimizer.java0000644000175000017500000001161607503220206016350 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; import java.util.*; public class Optimizer{ public static final int THRESHOLD=20; static Optimizer find(Term entry){ return find(entry,0); } private static Optimizer find(Term term,int dist){ //System.out.println("term="+term+", dist="+dist); if(term==null) return null; Term next=term.next; int type=term.type; switch(type){ case Term.CHAR: case Term.REG: case Term.REG_I: return new Optimizer(term,dist); case Term.BITSET: case Term.BITSET2: if(term.weight<=THRESHOLD) return new Optimizer(term,dist); else return find(term.next,dist+1); case Term.ANY_CHAR: case Term.ANY_CHAR_NE: return find(next,dist+1); case Term.REPEAT_MIN_INF: case Term.REPEAT_MIN_MAX: if(term.minCount>0){ return find(term.target,dist); } else return null; } if(type>=Term.FIRST_TRANSPARENT && type<=Term.LAST_TRANSPARENT){ return find(next,dist); } return null; } private Term atom; private int distance; private Optimizer(Term atom,int distance){ this.atom=atom; this.distance=distance; } Term makeFirst(Term theFirst){ return new Find(atom,distance,theFirst); } Term makeBacktrack(Term back){ int min=back.minCount; switch(back.type){ case Term.BACKTRACK_0: min=0; case Term.BACKTRACK_MIN: return new FindBack(atom,distance,min,back); case Term.BACKTRACK_REG_MIN: return back; default: throw new Error("unexpected iterator's backtracker:"+ back); //return back; } } } class Find extends Term{ Find(Term target, int distance, Term theFirst){ switch(target.type){ case Term.CHAR: case Term.BITSET: case Term.BITSET2: type=Term.FIND; break; case Term.REG: case Term.REG_I: type=Term.FINDREG; break; default: throw new IllegalArgumentException("wrong target type: "+target.type); } this.target=target; this.distance=distance; if(target==theFirst){ next=target.next; eat=true; //eat the next } else{ next=theFirst; eat=false; } } } class FindBack extends Term{ FindBack(Term target, int distance, int minCount, Term backtrack){ this.minCount=minCount; switch(target.type){ case Term.CHAR: case Term.BITSET: case Term.BITSET2: type=Term.BACKTRACK_FIND_MIN; break; case Term.REG: case Term.REG_I: type=Term.BACKTRACK_FINDREG_MIN; break; default: throw new IllegalArgumentException("wrong target type: "+target.type); } this.target=target; this.distance=distance; Term next=backtrack.next; if(target==next){ this.next=next.next; this.eat=true; } else{ this.next=next; this.eat=false; } } }jregex/jregex/PerlSubstitution.java0000644000175000017500000002076607503220206017733 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; import java.io.*; import java.util.Hashtable; import java.util.Vector; /** * An implementation of the Substitution interface. Performs substitutions in accordance with Perl-like substitution scripts.
    * The latter is a string, containing a mix of memory register references and plain text blocks.
    * It may look like "some_chars $1 some_chars$2some_chars" or "123${1}45${2}67".
    * A tag consisting of '$',not preceeded by the escape character'\' and followed by some digits (possibly enclosed in the curled brackets) is interpreted as a memory register reference, the digits forming a register ID. * All the rest is considered as a plain text.
    * Upon the Replacer has found a text block that matches the pattern, a references in a replacement string are replaced by the contents of * corresponding memory registers, and the resulting text replaces the matched block.
    * For example, the following code: *

     * System.out.println("\""+
     *    new Replacer(new Pattern("\\b(\\d+)\\b"),new PerlSubstitution("'$1'")).replace("abc 123 def")
     *    +"\"");
     * 
    * will print "abc '123' def".
    * @see Substitution * @see Replacer * @see Pattern */ public class PerlSubstitution implements Substitution{ //private static Pattern refPtn,argsPtn; private static Pattern refPtn; private static int NAME_ID; private static int ESC_ID; //private static int FN_NAME_ID; //private static int FN_ARGS_ID; //private static int ARG_NAME_ID; private static final String groupRef="\\$(?:\\{({=name}\\w+)\\}|({=name}\\d+|&))|\\\\({esc}.)"; //private static final String fnRef="\\&({fn_name}\\w+)\\(({fn_args}"+groupRef+"(?:,"+groupRef+")*)*\\)"; static{ try{ //refPtn=new Pattern("(?=match.pattern().groupCount()) return; if(match.isCaptured(i))match.getGroup(i,dest); } } private static class StringRefHandler extends Element{ private String index; StringRefHandler(String s,String ind){ prefix=s; index=ind; } void append(MatchResult match,TextBuffer dest){ if(prefix!=null) dest.append(prefix); if(index==null) return; Integer id=match.pattern().groupId(index); //if(id==null) return; //??? int i=id.intValue(); if(match.isCaptured(i))match.getGroup(i,dest); } } } abstract class GReference{ public abstract String stringValue(MatchResult match); public static GReference createInstance(MatchResult match,int grp){ if(match.length(grp)==0) throw new IllegalArgumentException("arg name cannot be an empty string"); if(Character.isDigit(match.charAt(0,grp))){ try{ return new IntReference(Integer.parseInt(match.group(grp))); } catch(NumberFormatException e){ throw new IllegalArgumentException("illegal arg name, starts with digit but is not a number"); } } return new StringReference((match.group(grp))); } } class IntReference extends GReference{ protected int id; IntReference(int id){ this.id=id; } public String stringValue(MatchResult match){ return match.group(id); } } class StringReference extends GReference{ protected String name; StringReference(String name){ this.name=name; } public String stringValue(MatchResult match){ return match.group(name); } }jregex/jregex/Replacer.java0000644000175000017500000002311607503220206016121 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; import java.io.*; import java.util.Hashtable; /** * The Replacer class suggests some methods to replace occurences of a pattern * either by a result of evaluation of a perl-like expression, or by a plain string, * or according to a custom substitution model, provided as a Substitution interface implementation.
    * A Replacer instance may be obtained either using Pattern.replacer(...) method, or by constructor:
     * Pattern p=new Pattern("\\w+");
     * Replacer perlExpressionReplacer=p.replacer("[$&]");
     * //or another way to do the same
     * Substitution myOwnModel=new Substitution(){
     *    public void appendSubstitution(MatchResult match,TextBuffer tb){
     *       tb.append('[');
     *       match.getGroup(MatchResult.MATCH,tb);
     *       tb.append(']');
     *    }
     * }
     * Replacer myVeryOwnReplacer=new Replacer(p,myOwnModel);
     * 
    * The second method is much more verbose, but gives more freedom. * To perform a replacement call replace(someInput):
     * System.out.print(perlExpressionReplacer.replace("All your base "));
     * System.out.println(myVeryOwnReplacer.replace("are belong to us"));
     * //result: "[All] [your] [base] [are] [belong] [to] [us]"
     * 
    * @see Substitution * @see PerlSubstitution * @see Replacer#Replacer(jregex.Pattern,jregex.Substitution) */ public class Replacer{ private Pattern pattern; private Substitution substitution; /** */ public Replacer(Pattern pattern,Substitution substitution){ this.pattern=pattern; this.substitution=substitution; } /** */ public Replacer(Pattern pattern, String substitution){ this(pattern,substitution,true); } /** */ public Replacer(Pattern pattern, String substitution, boolean isPerlExpr){ this.pattern=pattern; this.substitution= isPerlExpr? (Substitution)new PerlSubstitution(substitution): new DummySubstitution(substitution); } public void setSubstitution(String s, boolean isPerlExpr){ substitution= isPerlExpr? (Substitution)new PerlSubstitution(s): new DummySubstitution(s); } /** */ public String replace(String text){ TextBuffer tb=wrap(new StringBuffer(text.length())); replace(pattern.matcher(text),substitution,tb); return tb.toString(); } /** */ public String replace(char[] chars,int off,int len){ TextBuffer tb=wrap(new StringBuffer(len)); replace(pattern.matcher(chars,off,len),substitution,tb); return tb.toString(); } /** */ public String replace(MatchResult res,int group){ TextBuffer tb=wrap(new StringBuffer()); replace(pattern.matcher(res,group),substitution,tb); return tb.toString(); } /** */ public String replace(Reader text,int length)throws IOException{ TextBuffer tb=wrap(new StringBuffer(length>=0? length: 0)); replace(pattern.matcher(text,length),substitution,tb); return tb.toString(); } /** */ public int replace(String text,StringBuffer sb){ return replace(pattern.matcher(text),substitution,wrap(sb)); } /** */ public int replace(char[] chars,int off,int len,StringBuffer sb){ return replace(chars,off,len,wrap(sb)); } /** */ public int replace(MatchResult res,int group,StringBuffer sb){ return replace(res,group,wrap(sb)); } /** */ public int replace(MatchResult res,String groupName,StringBuffer sb){ return replace(res,groupName,wrap(sb)); } public int replace(Reader text,int length,StringBuffer sb)throws IOException{ return replace(text,length,wrap(sb)); } /** */ public int replace(String text,TextBuffer dest){ return replace(pattern.matcher(text),substitution,dest); } /** */ public int replace(char[] chars,int off,int len,TextBuffer dest){ return replace(pattern.matcher(chars,off,len),substitution,dest); } /** */ public int replace(MatchResult res,int group,TextBuffer dest){ return replace(pattern.matcher(res,group),substitution,dest); } /** */ public int replace(MatchResult res,String groupName,TextBuffer dest){ return replace(pattern.matcher(res,groupName),substitution,dest); } public int replace(Reader text,int length,TextBuffer dest)throws IOException{ return replace(pattern.matcher(text,length),substitution,dest); } /** * Replaces all occurences of a matcher's pattern in a matcher's target * by a given substitution appending the result to a buffer.
    * The substitution starts from current matcher's position, current match * not included. */ public static int replace(Matcher m,Substitution substitution,TextBuffer dest){ boolean firstPass=true; int c=0; while(m.find()){ if(m.end()==0 && !firstPass) continue; //allow to replace at "^" if(m.start()>0) m.getGroup(MatchResult.PREFIX,dest); substitution.appendSubstitution(m,dest); c++; m.setTarget(m,MatchResult.SUFFIX); firstPass=false; } m.getGroup(MatchResult.TARGET,dest); return c; } public static int replace(Matcher m,Substitution substitution,Writer out) throws IOException{ try{ return replace(m,substitution,wrap(out)); } catch(WriteException e){ throw e.reason; } } /** */ public void replace(String text,Writer out) throws IOException{ replace(pattern.matcher(text),substitution,out); } /** */ public void replace(char[] chars,int off,int len,Writer out) throws IOException{ replace(pattern.matcher(chars,off,len),substitution,out); } /** */ public void replace(MatchResult res,int group,Writer out) throws IOException{ replace(pattern.matcher(res,group),substitution,out); } /** */ public void replace(MatchResult res,String groupName,Writer out) throws IOException{ replace(pattern.matcher(res,groupName),substitution,out); } public void replace(Reader in,int length,Writer out)throws IOException{ replace(pattern.matcher(in,length),substitution,out); } private static class DummySubstitution implements Substitution{ String str; DummySubstitution(String s){ str=s; } public void appendSubstitution(MatchResult match,TextBuffer res){ if(str!=null) res.append(str); } } public static TextBuffer wrap(final StringBuffer sb){ return new TextBuffer(){ public void append(char c){ sb.append(c); } public void append(char[] chars,int start,int len){ sb.append(chars,start,len); } public void append(String s){ sb.append(s); } public String toString(){ return sb.toString(); } }; } public static TextBuffer wrap(final Writer writer){ return new TextBuffer(){ public void append(char c){ try{ writer.write(c); } catch(IOException e){ throw new WriteException(e); } } public void append(char[] chars,int off,int len){ try{ writer.write(chars,off,len); } catch(IOException e){ throw new WriteException(e); } } public void append(String s){ try{ writer.write(s); } catch(IOException e){ throw new WriteException(e); } } }; } private static class WriteException extends RuntimeException{ IOException reason; WriteException(IOException io){ reason=io; } } }jregex/jregex/Matcher.java0000644000175000017500000022623307503220206015754 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; import java.util.*; import java.io.*; /** * Matcher instance is an automaton that actually performs matching. It provides the following methods: *
  • searching for a matching substrings : matcher.find() or matcher.findAll(); *
  • testing whether a text matches a whole pattern : matcher.matches(); *
  • testing whether the text matches the beginning of a pattern : matcher.matchesPrefix(); *
  • searching with custom options : matcher.find(int options) *

    * Obtaining results
    * After the search succeded, i.e. if one of above methods returned true * one may obtain an information on the match: *

  • may check whether some group is captured : matcher.isCaptured(int); *
  • may obtain start and end positions of the match and its length : matcher.start(int),matcher.end(int),matcher.length(int); *
  • may obtain match contents as String : matcher.group(int).
    * The same way can be obtained the match prefix and suffix information. * The appropriate methods are grouped in MatchResult interface, which the Matcher class implements.
    * Matcher objects are not thread-safe, so only one thread may use a matcher instance at a time. * Note, that Pattern objects are thread-safe(the same instanse may be shared between * multiple threads), and the typical tactics in multithreaded applications is to have one Pattern instance per expression(a singleton), * and one Matcher object per thread. */ public class Matcher implements MatchResult{ /* Matching options*/ /** * The same effect as "^" without REFlags.MULTILINE. * @see Matcher#find(int) */ public static final int ANCHOR_START=1; /** * The same effect as "\\G". * @see Matcher#find(int) */ public static final int ANCHOR_LASTMATCH=2; /** * The same effect as "$" without REFlags.MULTILINE. * @see Matcher#find(int) */ public static final int ANCHOR_END=4; /** * Experimental option; if a text ends up before the end of a pattern,report a match. * @see Matcher#find(int) */ public static final int ACCEPT_INCOMPLETE=8; //see search(ANCHOR_START|...) private static Term startAnchor=new Term(Term.START); //see search(ANCHOR_LASTMATCH|...) private static Term lastMatchAnchor=new Term(Term.LAST_MATCH_END); private Pattern re; private int[] counters; private MemReg[] memregs; private LAEntry[] lookaheads; private int counterCount; private int memregCount; private int lookaheadCount; private char[] data; private int offset,end,wOffset,wEnd; private boolean shared; private SearchEntry top; //stack entry private SearchEntry first; //object pool entry private SearchEntry defaultEntry; //called when moving the window private boolean called; private int minQueueLength; private String cache; //cache may be longer than the actual data //and contrariwise; so cacheOffset may have both signs. //cacheOffset is actually -(data offset). private int cacheOffset,cacheLength; private MemReg prefixBounds,suffixBounds,targetBounds; Matcher(Pattern regex){ this.re=regex; //int memregCount=(memregs=new MemReg[regex.memregs]).length; //for(int i=0;i0){ MemReg[] memregs=new MemReg[memregCount]; for(int i=0;i0) counters=new int[counterCount]; if((lookaheadCount=regex.lookaheads)>0){ LAEntry[] lookaheads=new LAEntry[lookaheadCount]; for(int i=0;i * Matcher m=new Pattern("\\w+").matcher(myString); * if(m.find())m.setTarget(m,m.SUFFIX); //forget all that is not a suffix *
  • * Resets current search position to zero. * @param m - a matcher that is a source of data * @param groupId - which group to take data from * @see Matcher#setTarget(java.lang.String) * @see Matcher#setTarget(java.lang.String,int,int) * @see Matcher#setTarget(char[],int,int) * @see Matcher#setTarget(java.io.Reader,int) */ public final void setTarget(Matcher m, int groupId){ MemReg mr=m.bounds(groupId); //System.out.println("setTarget("+m+","+groupId+")"); //System.out.println(" in="+mr.in); //System.out.println(" out="+mr.out); if(mr==null) throw new IllegalArgumentException("group #"+groupId+" is not assigned"); data=m.data; offset=mr.in; end=mr.out; cache=m.cache; cacheLength=m.cacheLength; cacheOffset=m.cacheOffset; if(m!=this){ shared=true; m.shared=true; } init(); } /** * Supplies a text to search in/match with. * Resets current search position to zero. * @param text - a data * @see Matcher#setTarget(jregex.Matcher,int) * @see Matcher#setTarget(java.lang.String,int,int) * @see Matcher#setTarget(char[],int,int) * @see Matcher#setTarget(java.io.Reader,int) */ public void setTarget(String text){ setTarget(text,0,text.length()); } /** * Supplies a text to search in/match with, as a part of String. * Resets current search position to zero. * @param text - a data source * @param start - where the target starts * @param len - how long is the target * @see Matcher#setTarget(jregex.Matcher,int) * @see Matcher#setTarget(java.lang.String) * @see Matcher#setTarget(char[],int,int) * @see Matcher#setTarget(java.io.Reader,int) */ public void setTarget(String text,int start,int len){ char[] mychars=data; if(mychars==null || shared || mychars.lengthshared=false:
       *   myMatcher.setTarget(myCharArray,x,y,false); //we declare that array contents is NEITHER shared NOR will be used later, so may modifications on it are permitted
       * 
    * then we should expect the array contents to be changed on subsequent setTarget(..) operations. * Such method may yield some increase in perfomanse in the case of multiple setTarget() calls. * Resets current search position to zero. * @param text - a data source * @param start - where the target starts * @param len - how long is the target * @param shared - if true: data are shared or used later, don't modify it; if false: possible modifications of the text on subsequent setTarget() calls are perceived and allowed. * @see Matcher#setTarget(jregex.Matcher,int) * @see Matcher#setTarget(java.lang.String) * @see Matcher#setTarget(java.lang.String,int,int) * @see Matcher#setTarget(char[],int,int) * @see Matcher#setTarget(java.io.Reader,int) */ public final void setTarget(char[] text,int start,int len,boolean shared){ cache=null; data=text; offset=start; end=start+len; this.shared=shared; init(); } /** * Supplies a text to search in/match with through a stream. * Resets current search position to zero. * @param in - a data stream; * @param len - how much characters should be read; if len is -1, read the entire stream. * @see Matcher#setTarget(jregex.Matcher,int) * @see Matcher#setTarget(java.lang.String) * @see Matcher#setTarget(java.lang.String,int,int) * @see Matcher#setTarget(char[],int,int) */ public void setTarget(Reader in,int len)throws IOException{ if(len<0){ setAll(in); return; } char[] mychars=data; boolean shared=this.shared; if(mychars==null || shared || mychars.length=0){ len-=c; count+=c; if(len==0) break; } setTarget(mychars,0,count,shared); } private void setAll(Reader in)throws IOException{ char[] mychars=data; int free; boolean shared=this.shared; if(mychars==null || shared){ mychars=new char[free=1024]; shared=false; } else free=mychars.length; int count=0; int c; while((c=in.read(mychars,count,free))>=0){ free-=c; count+=c; if(free==0){ int newsize=count*3; char[] newchars=new char[newsize]; System.arraycopy(mychars,0,newchars,0,count); mychars=newchars; free=newsize-count; shared=false; } } setTarget(mychars,0,count,shared); } private final String getString(int start,int end){ String src=cache; if(src!=null){ int co=cacheOffset; return src.substring(start-co,end-co); } int tOffset,tEnd,tLen=(tEnd=this.end)-(tOffset=this.offset); char[] data=this.data; if((end-start)>=(tLen/3)){ //it makes sence to make a cache cache=src=new String(data,tOffset,tLen); cacheOffset=tOffset; cacheLength=tLen; return src.substring(start-tOffset,end-tOffset); } return new String(data,start,end-start); } /* Matching */ /** * Tells whether the entire target matches the beginning of the pattern. * The whole pattern is also regarded as its beginning.
    * This feature allows to find a mismatch by examining only a beginning part of * the target (as if the beginning of the target doesn't match the beginning of the pattern, then the entire target * also couldn't match).
    * For example the following assertions yield true:
       *   Pattern p=new Pattern("abcd"); 
       *   p.matcher("").matchesPrefix();
       *   p.matcher("a").matchesPrefix();
       *   p.matcher("ab").matchesPrefix();
       *   p.matcher("abc").matchesPrefix();
       *   p.matcher("abcd").matchesPrefix();
       * 
    * and the following yield false:
       *   p.matcher("b").isPrefix();
       *   p.matcher("abcdef").isPrefix();
       *   p.matcher("x").isPrefix();
       * 
    * @return true if the entire target matches the beginning of the pattern */ public final boolean matchesPrefix(){ setPosition(0); return search(ANCHOR_START|ACCEPT_INCOMPLETE|ANCHOR_END); } /** * Just an old name for isPrefix().
    * Retained for backwards compatibility. * @deprecated Replaced by isPrefix() */ public final boolean isStart(){ return matchesPrefix(); } /** * Tells whether a current target matches the whole pattern. * For example the following yields the true:
       *   Pattern p=new Pattern("\\w+"); 
       *   p.matcher("a").matches();
       *   p.matcher("ab").matches();
       *   p.matcher("abc").matches();
       * 
    * and the following yields the false:
       *   p.matcher("abc def").matches();
       *   p.matcher("bcd ").matches();
       *   p.matcher(" bcd").matches();
       *   p.matcher("#xyz#").matches();
       * 
    * @return whether a current target matches the whole pattern. */ public final boolean matches(){ if(called) setPosition(0); return search(ANCHOR_START|ANCHOR_END); } /** * Just a combination of setTarget(String) and matches(). * @param s the target string; * @return whether the specified string matches the whole pattern. */ public final boolean matches(String s){ setTarget(s); return search(ANCHOR_START|ANCHOR_END); } /** * Allows to set a position the subsequent find()/find(int) will start from. * @param pos the position to start from; * @see Matcher#find() * @see Matcher#find(int) */ public void setPosition(int pos){ wOffset=offset+pos; wEnd=-1; called=false; flush(); } /** * Searches through a target for a matching substring, starting from just after the end of last match. * If there wasn't any search performed, starts from zero. * @return true if a match found. */ public final boolean find(){ if(called) skip(); return search(0); } /** * Searches through a target for a matching substring, starting from just after the end of last match. * If there wasn't any search performed, starts from zero. * @param anchors a zero or a combination(bitwise OR) of ANCHOR_START,ANCHOR_END,ANCHOR_LASTMATCH,ACCEPT_INCOMPLETE * @return true if a match found. */ public final boolean find(int anchors){ if(called) skip(); return search(anchors); } /** * The same as findAll(int), but with default behaviour; */ public MatchIterator findAll(){ return findAll(0); } /** * Returns an iterator over the matches found by subsequently calling find(options), the search starts from the zero position. */ public MatchIterator findAll(final int options){ //setPosition(0); return new MatchIterator(){ private boolean checked=false; private boolean hasMore=false; public boolean hasMore(){ if(!checked) check(); return hasMore; } public MatchResult nextMatch(){ if(!checked) check(); if(!hasMore) throw new NoSuchElementException(); checked=false; return Matcher.this; } private final void check(){ hasMore=find(options); checked=true; } public int count(){ if(!checked) check(); if(!hasMore) return 0; int c=1; while(find(options))c++; checked=false; return c; } }; } /** * Continues to search from where the last search left off. * The same as proceed(0). * @see Matcher#proceed(int) */ public final boolean proceed(){ return proceed(0); } /** * Continues to search from where the last search left off using specified options:
       * Matcher m=new Pattern("\\w+").matcher("abc");
       * while(m.proceed(0)){
       *    System.out.println(m.group(0));
       * }
       * 
    * Output:
       * abc
       * ab
       * a
       * bc
       * b
       * c
       * 
    * For example, let's find all odd nubmers occuring in a text:
       *    Matcher m=new Pattern("\\d+").matcher("123");
       *    while(m.proceed(0)){
       *       String match=m.group(0);
       *       if(isOdd(Integer.parseInt(match))) System.out.println(match);
       *    }
       *    
       *    static boolean isOdd(int i){
       *       return (i&1)>0;
       *    }
       * 
    * This outputs:
       * 123
       * 1
       * 23
       * 3
       * 
    * Note that using find() method we would find '123' only. * @param options search options, some of ANCHOR_START|ANCHOR_END|ANCHOR_LASTMATCH|ACCEPT_INCOMPLETE; zero value(default) stands for usual search for substring. */ public final boolean proceed(int options){ //System.out.println("next() : top="+top); if(called){ if(top==null){ wOffset++; } } return search(0); } /** * Sets the current search position just after the end of last match. */ public final void skip(){ int we=wEnd; if(wOffset==we){ //requires special handling //if no variants at 'wOutside',advance pointer and clear if(top==null){ wOffset++; flush(); } //otherwise, if there exist a variant, //don't clear(), i.e. allow it to match return; } else{ if(we<0) wOffset=0; else wOffset=we; } //rflush(); //rflush() works faster on simple regexes (with a small group/branch number) flush(); } private final void init(){ //wOffset=-1; //System.out.println("init(): offset="+offset+", end="+end); wOffset=offset; wEnd=-1; called=false; flush(); } /** * Resets the internal state. */ private final void flush(){ top=null; defaultEntry.reset(0); /* int c=0; SearchEntry se=first; while(se!=null){ c++; se=se.on; } System.out.println("queue: allocated="+c+", truncating to "+minQueueLength); new Exception().printStackTrace(); */ first.reset(minQueueLength); //first.reset(0); for(int i=memregs.length-1;i>0;i--){ MemReg mr=memregs[i]; mr.in=mr.out=-1; } for(int i=memregs.length-1;i>0;i--){ MemReg mr=memregs[i]; mr.in=mr.out=-1; } called=false; } //reverse flush //may work significantly faster, //need testing private final void rflush(){ SearchEntry entry=top; top=null; MemReg[] memregs=this.memregs; int[] counters=this.counters; while(entry!=null){ SearchEntry next=entry.sub; SearchEntry.popState(entry,memregs,counters); entry=next; } SearchEntry.popState(defaultEntry,memregs,counters); } /** */ public String toString(){ return getString(wOffset,wEnd); } public Pattern pattern(){ return re; } public String target(){ return getString(offset,end); } /** */ public char[] targetChars(){ shared=true; return data; } /** */ public int targetStart(){ return offset; } /** */ public int targetEnd(){ return end; } public char charAt(int i){ int in=this.wOffset; int out=this.wEnd; if(in<0 || out(mr.out-in)) throw new StringIndexOutOfBoundsException(""+i); return data[in+i]; } public final int length(){ return wEnd-wOffset; } /** */ public final int start(){ return wOffset-offset; } /** */ public final int end(){ return wEnd-offset; } /** */ public String prefix(){ return getString(offset,wOffset); } /** */ public String suffix(){ return getString(wEnd,end); } /** */ public int groupCount(){ return memregs.length; } /** */ public String group(int n){ MemReg mr=bounds(n); if(mr==null) return null; return getString(mr.in,mr.out); } /** */ public String group(String name){ Integer id=re.groupId(name); if(id==null) throw new IllegalArgumentException("<"+name+"> isn't defined"); return group(id.intValue()); } /** */ public boolean getGroup(int n,TextBuffer tb){ MemReg mr=bounds(n); if(mr==null) return false; int in; tb.append(data,in=mr.in,mr.out-in); return true; } /** */ public boolean getGroup(String name,TextBuffer tb){ Integer id=re.groupId(name); if(id==null) throw new IllegalArgumentException("unknown group: \""+name+"\""); return getGroup(id.intValue(),tb); } /** */ public boolean getGroup(int n,StringBuffer sb){ MemReg mr=bounds(n); if(mr==null) return false; int in; sb.append(data,in=mr.in,mr.out-in); return true; } /** */ public boolean getGroup(String name,StringBuffer sb){ Integer id=re.groupId(name); if(id==null) throw new IllegalArgumentException("unknown group: \""+name+"\""); return getGroup(id.intValue(),sb); } /** */ public String[] groups(){ MemReg[] memregs=this.memregs; String[] groups=new String[memregs.length]; int in,out; MemReg mr; for(int i=0;i=0){ mr=memregs[id]; } else switch(id){ case PREFIX: mr=prefixBounds; if(mr==null) prefixBounds=mr=new MemReg(PREFIX); mr.in=offset; mr.out=wOffset; break; case SUFFIX: mr=suffixBounds; if(mr==null) suffixBounds=mr=new MemReg(SUFFIX); mr.in=wEnd; mr.out=end; break; case TARGET: mr=targetBounds; if(mr==null) targetBounds=mr=new MemReg(TARGET); mr.in=offset; mr.out=end; break; default: throw new IllegalArgumentException("illegal group id: "+id+"; must either nonnegative int, or MatchResult.PREFIX, or MatchResult.SUFFIX"); } //System.out.println(" mr=["+mr.in+","+mr.out+"]"); int in; if((in=mr.in)<0 || mr.out=0 && wEnd>=wOffset; } /** */ public final boolean isCaptured(int id){ return bounds(id)!=null; } /** */ public final boolean isCaptured(String groupName){ Integer id=re.groupId(groupName); if(id==null) throw new IllegalArgumentException("unknown group: \""+groupName+"\""); return isCaptured(id.intValue()); } /** */ public final int length(int id){ MemReg mr=bounds(id); return mr.out-mr.in; } /** */ public final int start(int id){ return bounds(id).in-offset; } /** */ public final int end(int id){ return bounds(id).out-offset; } private final boolean search(int anchors){ called=true; final int end=this.end; int offset=this.offset; char[] data=this.data; int wOffset=this.wOffset; int wEnd=this.wEnd; MemReg[] memregs=this.memregs; int[] counters=this.counters; LAEntry[] lookaheads=this.lookaheads; //int memregCount=memregs.length; //int cntCount=counters.length; int memregCount=this.memregCount; int cntCount=this.counterCount; SearchEntry defaultEntry=this.defaultEntry; SearchEntry first=this.first; SearchEntry top=this.top; SearchEntry actual=null; int cnt,regLen; int i; final boolean matchEnd=(anchors&ANCHOR_END)>0; final boolean allowIncomplete=(anchors&ACCEPT_INCOMPLETE)>0; Pattern re=this.re; Term root=re.root; Term term; if(top==null){ if((anchors&ANCHOR_START)>0){ term=re.root0; //raw root root=startAnchor; } else if((anchors&ANCHOR_LASTMATCH)>0){ term=re.root0; //raw root root=lastMatchAnchor; } else{ term=root; //optimized root } i=wOffset; actual=first; SearchEntry.popState(defaultEntry,memregs,counters); } else{ top=(actual=top).sub; term=actual.term; i=actual.index; SearchEntry.popState(actual,memregs,counters); } cnt=actual.cnt; regLen=actual.regLen; main: while(wOffset<=end){ matchHere: for(;;){ /* System.out.print("char: "+i+", term: "); System.out.print(term.toString()); System.out.print(" // mrs:{"); for(int dbi=0;dbiend) break; } term=term.next; continue matchHere; } case Term.VOID: term=term.next; continue matchHere; case Term.CHAR: //can only be 1-char-wide // \/ if(i>=end || data[i]!=term.c) break; //System.out.println("CHAR: "+data[i]+", i="+i); i++; term=term.next; continue matchHere; case Term.ANY_CHAR: //can only be 1-char-wide // \/ if(i>=end) break; i++; term=term.next; continue matchHere; case Term.ANY_CHAR_NE: //can only be 1-char-wide // \/ if(i>=end || (c=data[i])=='\r' || c=='\n') break; i++; term=term.next; continue matchHere; case Term.END: if(i>=end){ //meets term=term.next; continue matchHere; } break; case Term.END_EOL: //perl's $ if(i>=end){ //meets term=term.next; continue matchHere; } else{ boolean matches= i>=end | ((i+1)==end && data[i]=='\n') | ((i+2)==end && data[i]=='\r' && data[i+1]=='\n'); if(matches){ term=term.next; continue matchHere; } else break; } case Term.LINE_END: if(i>=end){ //meets term=term.next; continue matchHere; } else{ /* if(((c=data[i])=='\r' || c=='\n') && (c=data[i-1])!='\r' && c!='\n'){ term=term.next; continue matchHere; } */ //5 aug 2001 if((c=data[i])=='\r' || c=='\n'){ term=term.next; continue matchHere; } } break; case Term.START: //Perl's "^" if(i==offset){ //meets term=term.next; continue matchHere; } //break; //changed on 27-04-2002 //due to a side effect: if ALLOW_INCOMPLETE is enabled, //the anchorStart moves up to the end and succeeds //(see comments at the last lines of matchHere, ~line 1830) //Solution: if there are some entries on the stack ("^a|b$"), //try them; otherwise it's a final 'no' //if(top!=null) break; //else break main; //changed on 25-05-2002 //rationale: if the term is startAnchor, //it's the root term by definition, //so if it doesn't match, the entire pattern //couldn't match too; //otherwise we could have the following problem: //"c|^a" against "abc" finds only "a" if(top!=null) break; if(term!=startAnchor) break; else break main; case Term.LAST_MATCH_END: if(i==wEnd){ //meets term=term.next; continue matchHere; } break main; //return false case Term.LINE_START: if(i==offset){ //meets term=term.next; continue matchHere; } else if(i=end) break; c=data[i]; if(!(c<=255 && term.bitset[c])^term.inverse) break; i++; term=term.next; continue matchHere; } case Term.BITSET2:{ //can only be 1-char-wide // \/ if(i>=end) break; c=data[i]; boolean[] arr=term.bitset2[c>>8]; if(arr==null || !arr[c&255]^term.inverse) break; i++; term=term.next; continue matchHere; } case Term.BOUNDARY:{ boolean ch1Meets=false,ch2Meets=false; boolean[] bitset=term.bitset; test1:{ int j=i-1; //if(j=end) break test1; if(j=end) break test2; if(i>=end) break test2; c= data[i]; ch2Meets= (c<256 && bitset[c]); } if(ch1Meets^ch2Meets^term.inverse){ //meets term=term.next; continue matchHere; } else break; } case Term.UBOUNDARY:{ boolean ch1Meets=false,ch2Meets=false; boolean[][] bitset2=term.bitset2; test1:{ int j=i-1; //if(j=end) break test1; if(j>8]; ch1Meets= bits!=null && bits[c&0xff]; } test2:{ //if(i=end) break test2; if(i>=end) break test2; c= data[i]; boolean[] bits=bitset2[c>>8]; ch2Meets= bits!=null && bits[c&0xff]; } if(ch1Meets^ch2Meets^term.inverse){ //is boundary ^ inv term=term.next; continue matchHere; } else break; } case Term.DIRECTION:{ boolean ch1Meets=false,ch2Meets=false; boolean[] bitset=term.bitset; boolean inv=term.inverse; //System.out.println("i="+i+", inv="+inv+", bitset="+CharacterClass.stringValue0(bitset)); int j=i-1; //if(j>=offset && j=offset){ c= data[j]; ch1Meets= c<256 && bitset[c]; //System.out.println(" ch1Meets="+ch1Meets); } if(ch1Meets^inv) break; //if(i>=offset && i=offset && j=offset){ c= data[j]; boolean[] bits=bitset2[c>>8]; ch1Meets= bits!=null && bits[c&0xff]; } if(ch1Meets^inv) break; //if(i>=offset && i>8]; ch2Meets= bits!=null && bits[c&0xff]; } if(!ch2Meets^inv) break; term=term.next; continue matchHere; } case Term.REG:{ MemReg mr=memregs[term.memreg]; int sampleOffset=mr.in; int sampleOutside=mr.out; int rLen; if(sampleOffset<0 || (rLen=sampleOutside-sampleOffset)<0){ break; } else if(rLen==0){ term=term.next; continue matchHere; } // don't prevent us from reaching the 'end' if((i+rLen)>end) break; if(compareRegions(data,sampleOffset,i,rLen,end)){ i+=rLen; term=term.next; continue matchHere; } break; } case Term.REG_I:{ MemReg mr=memregs[term.memreg]; int sampleOffset=mr.in; int sampleOutside=mr.out; int rLen; if(sampleOffset<0 || (rLen=sampleOutside-sampleOffset)<0){ break; } else if(rLen==0){ term=term.next; continue matchHere; } // don't prevent us from reaching the 'end' if((i+rLen)>end) break; if(compareRegionsI(data,sampleOffset,i,rLen,end)){ i+=rLen; term=term.next; continue matchHere; } break; } case Term.REPEAT_0_INF:{ //System.out.println("REPEAT, i="+i+", term.minCount="+term.minCount+", term.maxCount="+term.maxCount); //i+=(cnt=repeat(data,i,end,term.target)); if((cnt=repeat(data,i,end,term.target))<=0){ term=term.next; continue; } i+=cnt; //branch out the backtracker (that is term.failNext, see Term.make*()) actual.cnt=cnt; actual.term=term.failNext; actual.index=i; actual=(top=actual).on; if(actual==null){ actual=new SearchEntry(); top.on=actual; actual.sub=top; } term=term.next; continue; } case Term.REPEAT_MIN_INF:{ //System.out.println("REPEAT, i="+i+", term.minCount="+term.minCount+", term.maxCount="+term.maxCount); cnt=repeat(data,i,end,term.target); if(cnt0 && compareRegions(data,i,sampleOffset,bitset,end)){ cnt++; i+=bitset; countBack--; } if(cnt0){ cnt--; i--; actual.cnt=cnt; actual.index=i; actual.term=term; actual=(top=actual).on; if(actual==null){ actual=new SearchEntry(); top.on=actual; actual.sub=top; } term=term.next; continue; } else break; case Term.BACKTRACK_MIN: //System.out.println("<<"); cnt=actual.cnt; if(cnt>term.minCount){ cnt--; i--; actual.cnt=cnt; actual.index=i; actual.term=term; actual=(top=actual).on; if(actual==null){ actual=new SearchEntry(); top.on=actual; actual.sub=top; } term=term.next; continue; } else break; case Term.BACKTRACK_FIND_MIN:{ //System.out.print("<<<[cnt="); cnt=actual.cnt; //System.out.print(cnt+", minCnt="); //System.out.print(term.minCount+", target="); //System.out.print(term.target+"]"); int minCnt; if(cnt>(minCnt=term.minCount)){ int start=i+term.distance; if(start>end){ int exceed=start-end; cnt-=exceed; if(cnt<=minCnt) break; i-=exceed; start=end; } int back=findBack(data,i+term.distance,cnt-minCnt,term.target); //System.out.print("[back="+back+"]"); if(back<0) break; //cnt-=back; //i-=back; if((cnt-=back)<=minCnt){ i-=back; if(term.eat)i++; term=term.next; continue; } i-=back; actual.cnt=cnt; actual.index=i; if(term.eat)i++; actual.term=term; actual=(top=actual).on; if(actual==null){ actual=new SearchEntry(); top.on=actual; actual.sub=top; } term=term.next; continue; } else break; } case Term.BACKTRACK_FINDREG_MIN:{ //System.out.print("<<<[cnt="); cnt=actual.cnt; //System.out.print(cnt+", minCnt="); //System.out.print(term.minCount+", target="); //System.out.print(term.target); //System.out.print("reg=<"+memregs[term.target.memreg].in+","+memregs[term.target.memreg].out+">]"); int minCnt; if(cnt>(minCnt=term.minCount)){ int start=i+term.distance; if(start>end){ int exceed=start-end; cnt-=exceed; if(cnt<=minCnt) break; i-=exceed; start=end; } MemReg mr=memregs[term.target.memreg]; int sampleOff=mr.in; int sampleLen=mr.out-sampleOff; //if(sampleOff<0 || sampleLen<0) throw new Error("backreference used before definition: \\"+term.memreg); //int back=findBackReg(data,i+term.distance,sampleOff,sampleLen,cnt-minCnt,term.target,end); //if(back<0) break; /*@since 1.2*/ int back; if(sampleOff<0 || sampleLen<0){ //the group is not def., as in the case of '(\w+)\1' //treat as usual BACKTRACK_MIN cnt--; i--; actual.cnt=cnt; actual.index=i; actual.term=term; actual=(top=actual).on; if(actual==null){ actual=new SearchEntry(); top.on=actual; actual.sub=top; } term=term.next; continue; } else if(sampleLen==0){ back=-1; } else{ back=findBackReg(data,i+term.distance,sampleOff,sampleLen,cnt-minCnt,term.target,end); //System.out.print("[back="+back+"]"); if(back<0) break; } cnt-=back; i-=back; actual.cnt=cnt; actual.index=i; if(term.eat)i+=sampleLen; actual.term=term; actual=(top=actual).on; if(actual==null){ actual=new SearchEntry(); top.on=actual; actual.sub=top; } term=term.next; continue; } else break; } case Term.BACKTRACK_REG_MIN: //System.out.println("<<"); cnt=actual.cnt; if(cnt>term.minCount){ regLen=actual.regLen; cnt--; i-=regLen; actual.cnt=cnt; actual.index=i; actual.term=term; //actual.regLen=regLen; actual=(top=actual).on; if(actual==null){ actual=new SearchEntry(); top.on=actual; actual.sub=top; } term=term.next; continue; } else break; case Term.GROUP_IN:{ memreg=term.memreg; //memreg=0 is a regex itself; we don't need to handle it //because regex bounds already are in wOffset and wEnd if(memreg>0){ //MemReg mr=memregs[memreg]; //saveMemregState((top!=null)? top: defaultEntry,memreg,mr); //mr.in=i; memregs[memreg].tmp=i; //assume } term=term.next; continue; } case Term.GROUP_OUT: memreg=term.memreg; //see above if(memreg>0){ //if(term.saveState)saveMemregState((top!=null)? top: defaultEntry,memreg,memregs); MemReg mr=memregs[memreg]; SearchEntry.saveMemregState((top!=null)? top: defaultEntry,memreg,mr); mr.in=mr.tmp; //commit mr.out=i; } term=term.next; continue; case Term.PLOOKBEHIND_IN:{ int tmp=i-term.distance; if(tmp0;c--,p1--,p2--){ if(arr[p1]!=arr[p2]){ //System.out.println(" : no"); return false; } } //System.out.println(" : yes"); return true; } private static final boolean compareRegionsI(char[] arr, int off1, int off2, int len,int out){ int p1=off1+len-1; int p2=off2+len-1; if(p1>=out || p2>=out){ return false; } char c1,c2; for(int c=len;c>0;c--,p1--,p2--){ if((c1=arr[p1])!=Character.toLowerCase(c2=arr[p2]) && c1!=Character.toUpperCase(c2) && c1!=Character.toTitleCase(c2)) return false; } return true; } //repeat while matches private static final int repeat(char[] data,int off,int out,Term term){ //System.out.print("off="+off+", out="+out+", term="+term); switch(term.type){ case Term.CHAR:{ char c=term.c; int i=off; while(i>8]; if(arr!=null && arr[c&0xff]) break; else i++; } else while(i>8]; if(arr!=null && arr[c&0xff]) i++; else break; } return i-off; } } throw new Error("this kind of term can't be quantified:"+term.type); } //repeat while doesn't match private static final int find(char[] data,int off,int out,Term term){ //System.out.print("off="+off+", out="+out+", term="+term); if(off>=out) return -1; switch(term.type){ case Term.CHAR:{ char c=term.c; int i=off; while(i>8]; if(arr!=null && arr[c&0xff]) break; else i++; } else while(i>8]; if(arr!=null && arr[c&0xff]) i++; else break; } return i-off; } } throw new IllegalArgumentException("can't seek this kind of term:"+term.type); } private static final int findReg(char[] data,int off,int regOff,int regLen,Term term,int out){ //System.out.print("off="+off+", out="+out+", term="+term); if(off>=out) return -1; int i=off; if(term.type==Term.REG){ while(i255 || !arr[c]) break; if(i<=iMin) return -1; } return off-i; } case Term.BITSET2:{ boolean[][] bitset2=term.bitset2; int i=off; char c; int iMin=off-maxCount; if(!term.inverse) for(;;){ boolean[] arr=bitset2[(c=data[--i])>>8]; if(arr!=null && arr[c&0xff]) break; if(i<=iMin) return -1; } else for(;;){ boolean[] arr=bitset2[(c=data[--i])>>8]; if(arr==null || arr[c&0xff]) break; if(i<=iMin) return -1; } return off-i; } } throw new IllegalArgumentException("can't find this kind of term:"+term.type); } private static final int findBackReg(char[] data,int off,int regOff,int regLen,int maxCount,Term term,int out){ //assume that the cases when regLen==0 or maxCount==0 are handled by caller int i=off; int iMin=off-maxCount; if(term.type==Term.REG){ /*@since 1.2*/ char first=data[regOff]; regOff++; regLen--; for(;;){ i--; if(data[i]==first && compareRegions(data,i+1,regOff,regLen,out)) break; if(i<=iMin) return -1; } } else if(term.type==Term.REG_I){ /*@since 1.2*/ char c=data[regOff]; char firstLower=Character.toLowerCase(c); char firstUpper=Character.toUpperCase(c); char firstTitle=Character.toTitleCase(c); regOff++; regLen--; for(;;){ i--; if(((c=data[i])==firstLower || c==firstUpper || c==firstTitle) && compareRegionsI(data,i+1,regOff,regLen,out)) break; if(i<=iMin) return -1; } return off-i; } else throw new IllegalArgumentException("wrong findBackReg() target type :"+term.type); return off-i; } public String toString_d(){ StringBuffer s=new StringBuffer(); s.append("counters: "); s.append(counters==null? 0: counters.length); s.append("\r\nmemregs: "); s.append(memregs.length); for(int i=0;i0) on.reset(restQueue-1); else{ this.on=null; on.sub=null; } } //sub=on=null; } } class MemReg{ int index; int in=-1,out=-1; int tmp=-1; //for assuming at GROUP_IN MemReg(int index){ this.index=index; } void reset(){ in=out=-1; } } class LAEntry{ int index; SearchEntry top,actual; }jregex/jregex/REFlags.java0000644000175000017500000000714407503220206015652 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; public interface REFlags{ /** * All the foolowing options turned off */ public int DEFAULT=0; /** * Pattern "a" matches both "a" and "A". * Corresponds to "i" in Perl notation. */ public int IGNORE_CASE=1<<0; /** * Affects the behaviour of "^" and "$" tags. When switched off: *
  • the "^" matches the beginning of the whole text; *
  • the "$" matches the end of the whole text, or just before the '\n' or "\r\n" at the end of text. * When switched on: *
  • the "^" additionally matches the line beginnings (that is just after the '\n'); *
  • the "$" additionally matches the line ends (that is just before "\r\n" or '\n'); * Corresponds to "m" in Perl notation. */ public int MULTILINE=1<<1; /** * Affects the behaviour of dot(".") tag. When switched off: *
  • the dot matches any character but EOLs('\r','\n'); * When switched on: *
  • the dot matches any character, including EOLs. * This flag is sometimes referenced in regex tutorials as SINGLELINE, which confusingly seems opposite to MULTILINE, but in fact is orthogonal. * Corresponds to "s" in Perl notation. */ public int DOTALL=1<<2; /** * Affects how the space characters are interpeted in the expression. When switched off: *
  • the spaces are interpreted literally; * When switched on: *
  • the spaces are ingnored, allowing an expression to be slightly more readable. * Corresponds to "x" in Perl notation. */ public int IGNORE_SPACES=1<<3; /** * Affects whether the predefined classes("\d","\s","\w",etc) in the expression are interpreted as belonging to Unicode. When switched off: *
  • the predefined classes are interpreted as ASCII; * When switched on: *
  • the predefined classes are interpreted as Unicode categories; */ public int UNICODE=1<<4; /** * Turns on the compatibility with XML Schema regular expressions. */ public int XML_SCHEMA=1<<5; }jregex/jregex/WildcardPattern.java0000644000175000017500000001172207503220206017453 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; /** * A Pattern subclass that accepts a simplified pattern syntax: *
  • ? - matches any single character; *
  • * - matches any number of any characters; *
  • all the rest - matches itself. * Each wildcard takes a capturing group withing a pattern. * * @see Pattern */ public class WildcardPattern extends Pattern{ //a wildcard class, see WildcardPattern(String,String,int) public static final String WORD_CHAR="\\w"; //a wildcard class, see WildcardPattern(String,String,int) public static final String ANY_CHAR="."; private static final String defaultSpecials="[]().{}+|^$\\"; private static final String defaultWcClass=ANY_CHAR; protected static String convertSpecials(String s,String wcClass,String specials){ int len=s.length(); StringBuffer sb=new StringBuffer(); for(int i=0;i=0) sb.append('\\'); sb.append(c); } } return sb.toString(); } private String str; /** * @param wc The pattern */ public WildcardPattern(String wc){ this(wc,true); } /** * @param wc The pattern * @param icase If true, the pattern is case-insensitive. */ public WildcardPattern(String wc,boolean icase){ this(wc,icase? DEFAULT|IGNORE_CASE: DEFAULT); } /** * @param wc The pattern * @param flags The bitwise OR of any of REFlags.* . The only meaningful * flags are REFlags.IGNORE_CASE and REFlags.DOTALL (the latter allows * the wildcards to match the EOL characters). */ public WildcardPattern(String wc,int flags){ compile(wc,defaultWcClass,defaultSpecials,flags); } /** * @param wc The pattern * @param wcClass The wildcard class, could be any of WORD_CHAR or ANY_CHAR * @param flags The bitwise OR of any of REFlags.* . The only meaningful * flags are REFlags.IGNORE_CASE and REFlags.DOTALL (the latter allows * the wildcards to match the EOL characters). */ public WildcardPattern(String wc,String wcClass,int flags){ compile(wc,wcClass,defaultSpecials,flags); } protected WildcardPattern(){} protected void compile(String wc,String wcClass,String specials,int flags){ String converted=convertSpecials(wc,wcClass,specials); try{ compile(converted,flags); } catch(PatternSyntaxException e){ //something unexpected throw new Error(e.getMessage()+"; original expr: "+wc+", converted: "+converted); } str=wc; } public String toString(){ return str; } /* public static void main(String[] args){ Pattern p=new WildcardPattern("*.???"); Matcher m=p.matcher("abc.def"); //System.out.println(p.toString_d()); while(m.proceed()){ System.out.println(m); System.out.println("groups: "+m.groupv()); } } */ }jregex/jregex/CharacterClass.java0000644000175000017500000007511507503220206017254 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; import java.util.*; class CharacterClass extends Term implements UnicodeConstants{ static final Bitset DIGIT=new Bitset(); static final Bitset WORDCHAR=new Bitset(); static final Bitset SPACE=new Bitset(); static final Bitset UDIGIT=new Bitset(); static final Bitset UWORDCHAR=new Bitset(); static final Bitset USPACE=new Bitset(); static final Bitset NONDIGIT=new Bitset(); static final Bitset NONWORDCHAR=new Bitset(); static final Bitset NONSPACE=new Bitset(); static final Bitset UNONDIGIT=new Bitset(); static final Bitset UNONWORDCHAR=new Bitset(); static final Bitset UNONSPACE=new Bitset(); private static boolean namesInitialized=false; static final Hashtable namedClasses=new Hashtable(); static final Vector unicodeBlocks=new Vector(); static final Vector posixClasses=new Vector(); static final Vector unicodeCategories=new Vector(); //modes; used in parseGroup(() private final static int ADD=1; private final static int SUBTRACT=2; private final static int INTERSECT=3; private static final String blockData= "0000..007F:InBasicLatin;0080..00FF:InLatin-1Supplement;0100..017F:InLatinExtended-A;" +"0180..024F:InLatinExtended-B;0250..02AF:InIPAExtensions;02B0..02FF:InSpacingModifierLetters;" +"0300..036F:InCombiningDiacriticalMarks;0370..03FF:InGreek;0400..04FF:InCyrillic;0530..058F:InArmenian;" +"0590..05FF:InHebrew;0600..06FF:InArabic;0700..074F:InSyriac;0780..07BF:InThaana;0900..097F:InDevanagari;" +"0980..09FF:InBengali;0A00..0A7F:InGurmukhi;0A80..0AFF:InGujarati;0B00..0B7F:InOriya;0B80..0BFF:InTamil;" +"0C00..0C7F:InTelugu;0C80..0CFF:InKannada;0D00..0D7F:InMalayalam;0D80..0DFF:InSinhala;0E00..0E7F:InThai;" +"0E80..0EFF:InLao;0F00..0FFF:InTibetan;1000..109F:InMyanmar;10A0..10FF:InGeorgian;1100..11FF:InHangulJamo;" +"1200..137F:InEthiopic;13A0..13FF:InCherokee;1400..167F:InUnifiedCanadianAboriginalSyllabics;" +"1680..169F:InOgham;16A0..16FF:InRunic;1780..17FF:InKhmer;1800..18AF:InMongolian;" +"1E00..1EFF:InLatinExtendedAdditional;1F00..1FFF:InGreekExtended;2000..206F:InGeneralPunctuation;" +"2070..209F:InSuperscriptsAndSubscripts;20A0..20CF:InCurrencySymbols;" +"20D0..20FF:InCombiningMarksForSymbols;2100..214F:InLetterLikeSymbols;2150..218F:InNumberForms;" +"2190..21FF:InArrows;2200..22FF:InMathematicalOperators;2300..23FF:InMiscellaneousTechnical;" +"2400..243F:InControlPictures;2440..245F:InOpticalCharacterRecognition;" +"2460..24FF:InEnclosedAlphanumerics;2500..257F:InBoxDrawing;2580..259F:InBlockElements;" +"25A0..25FF:InGeometricShapes;2600..26FF:InMiscellaneousSymbols;2700..27BF:InDingbats;" +"2800..28FF:InBraillePatterns;2E80..2EFF:InCJKRadicalsSupplement;2F00..2FDF:InKangxiRadicals;" +"2FF0..2FFF:InIdeographicDescriptionCharacters;3000..303F:InCJKSymbolsAndPunctuation;" +"3040..309F:InHiragana;30A0..30FF:InKatakana;3100..312F:InBopomofo;3130..318F:InHangulCompatibilityJamo;" +"3190..319F:InKanbun;31A0..31BF:InBopomofoExtended;3200..32FF:InEnclosedCJKLettersAndMonths;" +"3300..33FF:InCJKCompatibility;3400..4DB5:InCJKUnifiedIdeographsExtensionA;" +"4E00..9FFF:InCJKUnifiedIdeographs;A000..A48F:InYiSyllables;A490..A4CF:InYiRadicals;" +"AC00..D7A3:InHangulSyllables;D800..DB7F:InHighSurrogates;DB80..DBFF:InHighPrivateUseSurrogates;" +"DC00..DFFF:InLowSurrogates;E000..F8FF:InPrivateUse;F900..FAFF:InCJKCompatibilityIdeographs;" +"FB00..FB4F:InAlphabeticPresentationForms;FB50..FDFF:InArabicPresentationForms-A;" +"FE20..FE2F:InCombiningHalfMarks;FE30..FE4F:InCJKCompatibilityForms;FE50..FE6F:InSmallFormVariants;" +"FE70..FEFE:InArabicPresentationForms-B;FEFF..FEFF:InSpecials;FF00..FFEF:InHalfWidthAndFullWidthForms;" +"FFF0..FFFD:InSpecials"; static{ //* DIGIT.setDigit(false); WORDCHAR.setWordChar(false); SPACE.setSpace(false); UDIGIT.setDigit(true); UWORDCHAR.setWordChar(true); USPACE.setSpace(true); NONDIGIT.setDigit(false); NONDIGIT.setPositive(false); NONWORDCHAR.setWordChar(false); NONWORDCHAR.setPositive(false); NONSPACE.setSpace(false); NONSPACE.setPositive(false); UNONDIGIT.setDigit(true); UNONDIGIT.setPositive(false); UNONWORDCHAR.setWordChar(true); UNONWORDCHAR.setPositive(false); UNONSPACE.setSpace(true); UNONSPACE.setPositive(false); initPosixClasses(); } private static void registerClass(String name,Bitset cls,Vector realm){ namedClasses.put(name,cls); if(!realm.contains(name))realm.addElement(name); } private static void initPosixClasses(){ Bitset lower=new Bitset(); lower.setRange('a','z'); registerClass("Lower",lower,posixClasses); Bitset upper=new Bitset(); upper.setRange('A','Z'); registerClass("Upper",upper,posixClasses); Bitset ascii=new Bitset(); ascii.setRange((char)0,(char)0x7f); registerClass("ASCII",ascii,posixClasses); Bitset alpha=new Bitset(); alpha.add(lower); alpha.add(upper); registerClass("Alpha",alpha,posixClasses); Bitset digit=new Bitset(); digit.setRange('0','9'); registerClass("Digit",digit,posixClasses); Bitset alnum=new Bitset(); alnum.add(alpha); alnum.add(digit); registerClass("Alnum",alnum,posixClasses); Bitset punct=new Bitset(); punct.setChars("!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"); registerClass("Punct",punct,posixClasses); Bitset graph=new Bitset(); graph.add(alnum); graph.add(punct); registerClass("Graph",graph,posixClasses); registerClass("Print",graph,posixClasses); Bitset blank=new Bitset(); blank.setChars(" \t"); registerClass("Blank",blank,posixClasses); Bitset cntrl=new Bitset(); cntrl.setRange((char)0,(char)0x1f); cntrl.setChar((char)0x7f); registerClass("Cntrl",cntrl,posixClasses); Bitset xdigit=new Bitset(); xdigit.setRange('0','9'); xdigit.setRange('a','f'); xdigit.setRange('A','F'); registerClass("XDigit",xdigit,posixClasses); Bitset space=new Bitset(); space.setChars(" \t\n\r\f\u000b"); registerClass("Space",space,posixClasses); } private static void initNames(){ initNamedCategory("C",new int[]{Cn,Cc,Cf,Co,Cs}); initNamedCategory("Cn",Cn); initNamedCategory("Cc",Cc); initNamedCategory("Cf",Cf); initNamedCategory("Co",Co); initNamedCategory("Cs",Cs); initNamedCategory("L",new int[]{Lu,Ll,Lt,Lm,Lo}); initNamedCategory("Lu",Lu); initNamedCategory("Ll",Ll); initNamedCategory("Lt",Lt); initNamedCategory("Lm",Lm); initNamedCategory("Lo",Lo); initNamedCategory("M",new int[]{Mn,Me,Mc}); initNamedCategory("Mn",Mn); initNamedCategory("Me",Me); initNamedCategory("Mc",Mc); initNamedCategory("N",new int[]{Nd,Nl,No}); initNamedCategory("Nd",Nd); initNamedCategory("Nl",Nl); initNamedCategory("No",No); initNamedCategory("Z",new int[]{Zs,Zl,Zp}); initNamedCategory("Zs",Zs); initNamedCategory("Zl",Zl); initNamedCategory("Zp",Zp); initNamedCategory("P",new int[]{Pd,Ps,Pi,Pe,Pf,Pc,Po}); initNamedCategory("Pd",Pd); initNamedCategory("Ps",Ps); initNamedCategory("Pi",Pi); initNamedCategory("Pe",Pe); initNamedCategory("Pf",Pf); initNamedCategory("Pc",Pc); initNamedCategory("Po",Po); initNamedCategory("S",new int[]{Sm,Sc,Sk,So}); initNamedCategory("Sm",Sm); initNamedCategory("Sc",Sc); initNamedCategory("Sk",Sk); initNamedCategory("So",So); Bitset bs=new Bitset(); bs.setCategory(Cn); registerClass("UNASSIGNED",bs,unicodeCategories); bs=new Bitset(); bs.setCategory(Cn); bs.setPositive(false); registerClass("ASSIGNED",bs,unicodeCategories); StringTokenizer st=new StringTokenizer(blockData,".,:;"); while(st.hasMoreTokens()){ try{ int first=Integer.parseInt(st.nextToken(),16); int last=Integer.parseInt(st.nextToken(),16); String name=st.nextToken(); initNamedBlock(name,first,last); } catch(Exception e){ e.printStackTrace(); } } initNamedBlock("ALL",0,0xffff); namesInitialized=true; //*/ } private static void initNamedBlock(String name,int first,int last){ if(firstCharacter.MAX_VALUE) throw new IllegalArgumentException("wrong start code ("+first+") in block "+name); if(lastCharacter.MAX_VALUE) throw new IllegalArgumentException("wrong end code ("+last+") in block "+name); if(last=0){ char c1=(char)prev; if(icase){ bs.setChar(Character.toLowerCase(c1)); bs.setChar(Character.toUpperCase(c1)); bs.setChar(Character.toTitleCase(c1)); } else bs.setChar(c1); } return i; case '-': if(isFirst) break; //if(isFirst) throw new PatternSyntaxException("[-...] is illegal"); if(inRange) break; //if(inRange) throw new PatternSyntaxException("[...--...] is illegal"); inRange=true; continue; case '[': if(inRange && xml){ //[..-[..]] if(prev>=0) bs.setChar((char)prev); if(bs1==null) bs1=new Bitset(); else bs1.reset(); i=parseClass(data,i,out,bs1,icase,skipspaces,unicode,xml); //System.out.println(" i="+i); bs.subtract(bs1); inRange=false; prev=-1; continue; } else break handle_special; case '^': //if(!isFirst) throw new PatternSyntaxException("'^' isn't a first char in a class def"); //bs.setPositive(false); //setFirst=true; //continue; if(isFirst){ bs.setPositive(false); setFirst=true; continue; } //treat as normal char break; case ' ': case '\r': case '\n': case '\t': case '\f': if(skipspaces) continue; else break handle_special; case '\\': Bitset negatigeClass=null; boolean inv=false; handle_escape: switch(c=data[i++]){ case 'r': c='\r'; break handle_special; case 'n': c='\n'; break handle_special; case 't': c='\t'; break handle_special; case 'f': c='\f'; break handle_special; case 'u': if(i>=out-4) throw new PatternSyntaxException("incomplete escape sequence \\uXXXX"); c=(char)((toHexDigit(c)<<12) +(toHexDigit(data[i++])<<8) +(toHexDigit(data[i++])<<4) +toHexDigit(data[i++])); break handle_special; case 'v': c=(char)((toHexDigit(c)<<24)+ (toHexDigit(data[i++])<<16)+ (toHexDigit(data[i++])<<12)+ (toHexDigit(data[i++])<<8)+ (toHexDigit(data[i++])<<4)+ toHexDigit(data[i++])); break handle_special; case 'b': c=8; // backspace break handle_special; case 'x':{ // hex 2-digit number int hex=0; char d; if((d=data[i++])=='{'){ while((d=data[i++])!='}'){ hex=(hex<<4)+toHexDigit(d); } if(hex>0xffff) throw new PatternSyntaxException("\\x{}"); } else{ hex=(toHexDigit(d)<<4)+toHexDigit(data[i++]); } c=(char)hex; break handle_special; } case 'o': // oct 2- or 3-digit number int oct=0; for(;;){ char d=data[i++]; if(d>='0' && d<='7'){ oct*=8; oct+=d-'0'; if(oct>0xffff) break; } else break; } c=(char)oct; break handle_special; case 'm': // decimal number -> char int dec=0; for(;;){ char d=data[i++]; if(d>='0' && d<='9'){ dec*=10; dec+=d-'0'; if(dec>0xffff) break; } else break; } c=(char)dec; break handle_special; case 'c': // ctrl-char c=(char)(data[i++]&0x1f); break handle_special; //classes; // case 'D': // non-digit negatigeClass=unicode? UNONDIGIT: NONDIGIT; break handle_escape; case 'S': // space negatigeClass=unicode? UNONSPACE: NONSPACE; break handle_escape; case 'W': // space negatigeClass=unicode? UNONWORDCHAR: NONWORDCHAR; break handle_escape; case 'd': // digit if(inRange) throw new PatternSyntaxException("illegal range: [..."+prev+"-\\d...]"); bs.setDigit(unicode); continue; case 's': // digit if(inRange) throw new PatternSyntaxException("illegal range: [..."+prev+"-\\s...]"); bs.setSpace(unicode); continue; case 'w': // digit if(inRange) throw new PatternSyntaxException("illegal range: [..."+prev+"-\\w...]"); bs.setWordChar(unicode); continue; case 'P': // \\P{..} inv=true; case 'p': // \\p{..} if(inRange) throw new PatternSyntaxException("illegal range: [..."+prev+"-\\w...]"); if(sb==null) sb=new StringBuffer(); else sb.setLength(0); i=parseName(data,i,out,sb,skipspaces); Bitset nc=getNamedClass(sb.toString()); if(nc==null) throw new PatternSyntaxException("unknown named class: {"+sb+"}"); bs.add(nc,inv); continue; default: //other escaped treat as normal break handle_special; } //negatigeClass; //\S,\D,\W if(inRange) throw new PatternSyntaxException("illegal range: [..."+prev+"-\\"+c+"...]"); bs.add(negatigeClass); continue; case '{': // if(inRange) throw new PatternSyntaxException("illegal range: [..."+prev+"-\\w...]"); if(sb==null) sb=new StringBuffer(); else sb.setLength(0); i=parseName(data,i-1,out,sb,skipspaces); Bitset nc=getNamedClass(sb.toString()); if(nc==null) throw new PatternSyntaxException("unknown named class: {"+sb+"}"); bs.add(nc,false); continue; default: } //c is a normal char //System.out.println(" normal c="+c+", inRange="+inRange+", prev="+(char)prev); if(prev<0){ prev=c; inRange=false; continue; } if(!inRange){ char c1=(char)prev; if(icase){ bs.setChar(Character.toLowerCase(c1)); bs.setChar(Character.toUpperCase(c1)); bs.setChar(Character.toTitleCase(c1)); } else bs.setChar(c1); prev=c; } else{ if(prev>c) throw new PatternSyntaxException("illegal range: "+prev+">"+c); char c0=(char)prev; inRange=false; prev=-1; if(icase){ bs.setRange(Character.toLowerCase(c0),Character.toLowerCase(c)); bs.setRange(Character.toUpperCase(c0),Character.toUpperCase(c)); bs.setRange(Character.toTitleCase(c0),Character.toTitleCase(c)); } else bs.setRange(c0,c); } } throw new PatternSyntaxException("unbalanced brackets in a class def"); } final static int parseName(char[] data,int i,int out,StringBuffer sb, boolean skipspaces) throws PatternSyntaxException{ char c; int start=-1; while(i=0xff) break loop; } int first=c; while(arr[c]){ //System.out.println(c+": "+arr[c]); c++; if(c>0xff) break; } int last=c-1; if(last==first) b.append(stringValue(last)); else{ b.append(stringValue(first)); b.append('-'); b.append(stringValue(last)); } if(c>0xff) break; } return b.toString(); } /* Mmm.. what is it? static String stringValueC(boolean[] categories){ StringBuffer sb=new StringBuffer(); for(int i=0;i>8]; if(marks!=null && marks[c&255]) break; c++; if(c>0xffff) break loop; } int first=c; for(;c<=0xffff;){ boolean[] marks=arr[c>>8]; if(marks==null || !marks[c&255]) break; c++; } int last=c-1; if(last==first) b.append(stringValue(last)); else{ b.append(stringValue(first)); b.append('-'); b.append(stringValue(last)); } if(c>0xffff) break; } return b.toString(); } static String stringValue(int c){ StringBuffer b=new StringBuffer(5); if(c<32){ switch(c){ case '\r': b.append("\\r"); break; case '\n': b.append("\\n"); break; case '\t': b.append("\\t"); break; case '\f': b.append("\\f"); break; default: b.append('('); b.append((int)c); b.append(')'); } } else if(c<256){ b.append((char)c); } else{ b.append('\\'); b.append('x'); b.append(Integer.toHexString(c)); } return b.toString(); } static int toHexDigit(char d) throws PatternSyntaxException{ int val=0; if(d>='0' && d<='9') val=d-'0'; else if(d>='a' && d<='f') val=10+d-'a'; else if(d>='A' && d<='F') val=10+d-'A'; else throw new PatternSyntaxException("hexadecimal digit expected: "+d); return val; } public static void main(String[] args){ if(!namesInitialized)initNames(); if(args.length==0){ System.out.println("Class usage: \\p{Class},\\P{Class}"); printRealm(posixClasses,"Posix classes"); printRealm(unicodeCategories,"Unicode categories"); printRealm(unicodeBlocks,"Unicode blocks"); } else{ for(int i=0;i>8)&0xff; if(data[cat][b]==0){ data[cat][b]=1; data[cat][BLOCK_SIZE+1]++; } } for(int i=0;i to search files by their paths using special patterns; *
  • to match path strings * Syntax: *
  • ? - any character but path separator *
  • * - any string no including path separators *
  • ** - any path
    *
    * Usage:
     * PathPattern pp=new PathPattern("jregex/**"); //all files and directories
     *                                              //under the jregex directory
     * Enumeration files=pp.enumerateFiles();
     * Matcher m=pp.matcher();
     * while(files.hasMoreElements()){
     *    File f=(File)files.nextElement();
     *    m.setTarget(f.getPath());
     *    if(!m.matches()) System.out.println("Error in jregex.io.PathPattern");
     * }
     * 
    * @see jregex.WildcardPattern */ public class PathPattern extends Pattern{ private static final int RESERVED=1; private static int GRP_NO=RESERVED+1; private static final int ANY_G=GRP_NO++; private static final int FS_G=GRP_NO++; private static final int STAR_G=GRP_NO++; private static final int QMARK_G=GRP_NO++; private static final int SPCHAR_G=GRP_NO++; private static final int NONROOT_G=GRP_NO++; private static final String grp(int gno,String s){ return "({"+gno+"}"+s+")"; } private static final String fsChars="/\\"+File.separator; private static final String fsClass="["+fsChars+"]"; private static final String nfsClass="[^"+fsChars+"]"; private static final String fName=nfsClass+"+"; private static final Pattern fs=new Pattern(fsClass); private static final Pattern spCharPattern=new Pattern( grp(NONROOT_G,"^(?!"+fsClass+")")+ "|"+ grp(ANY_G,fsClass+"?\\*\\*"+fsClass+"?")+ "|"+ grp(FS_G,fsClass)+ "|"+ grp(STAR_G,"\\*")+ "|"+ grp(QMARK_G,"\\?")+ "|"+ grp(SPCHAR_G,"[.()\\{\\}+|^$\\[\\]\\\\]") ); private static final Replacer spCharProcessor=new Replacer( spCharPattern, new Substitution(){ public void appendSubstitution(MatchResult mr,TextBuffer dest){ //System.out.println("spCharProcessor.appendSubstitution(): "+((Matcher)mr).groupv()); if(mr.isCaptured(FS_G)){ dest.append(fsClass); } else if(mr.isCaptured(ANY_G)){ dest.append("(?:(?:"); dest.append(fsClass); dest.append("|^)((?:"); dest.append(fName); dest.append("(?:"); dest.append(fsClass); dest.append(fName); dest.append(")*)?))?"); dest.append("(?:"); dest.append(fsClass); dest.append("|$)"); } else if(mr.isCaptured(STAR_G)){ dest.append("("); dest.append(nfsClass); dest.append("*)"); } else if(mr.isCaptured(QMARK_G)){ dest.append("("); dest.append(nfsClass); dest.append(")"); } else if(mr.isCaptured(SPCHAR_G)){ dest.append("\\"); mr.getGroup(SPCHAR_G,dest); } else if(mr.isCaptured(NONROOT_G)){ dest.append("(?:\\."); dest.append(fsClass); dest.append(")?"); } } } ); private String str; private String root; private File rootf; private PathElementMask queue,last; public PathPattern(String ptn){ this(ptn,DEFAULT); } public PathPattern(String ptn,boolean icase){ this(ptn,icase? DEFAULT|IGNORE_CASE: DEFAULT); } public PathPattern(String path,int flags){ this(null,path,flags); } public PathPattern(File dir,String path,boolean icase){ this(null,path,icase? DEFAULT|IGNORE_CASE: DEFAULT); } public PathPattern(File dir,String path,int flags){ if(path==null || path.length()==0)throw new IllegalArgumentException("empty path not allowed"); str=path; RETokenizer tok=new RETokenizer(fs.matcher(path),true); String s=tok.nextToken(); if(s.equals("")){ if(dir!=null)rootf=dir; else root="/"; } else{ if(dir!=null)rootf=dir; else root="."; addElement(newMask(s,flags,tok.hasMore())); } while(tok.hasMore()){ s=tok.nextToken(); boolean hasMore=tok.hasMore(); if(s.equals("")){ if(hasMore)throw new IllegalArgumentException("\"//\" not allowed"); else break; } addElement(newMask(s,flags,hasMore)); } compile(spCharProcessor.replace(path),flags); //System.out.println(spCharProcessor.replace(path)); } private void addElement(PathElementMask mask){ if(queue==null){ queue=last=mask; } else{ last=(last.next=mask); } } public Enumeration enumerateFiles(){ PathElementEnumerator fe=queue.newEnumerator(); fe.setDir(rootf!=null? rootf: new File(root)); return fe; } public File[] files(){ Enumeration e=enumerateFiles(); Vector v=new Vector(); while(e.hasMoreElements()) v.addElement(e.nextElement()); File[] files=new File[v.size()]; v.copyInto(files); return files; } /** * @deprecated Is meaningless with regard to variable paths (since v.1.2) */ public String[] names(){ return null; } /** * @deprecated Is meaningless with regard to variable paths (since v.1.2) */ public File directory(){ return null; } private static PathElementMask newMask(String s,int flags,boolean dirsOnly){ if(s==null || s.length()==0)throw new IllegalArgumentException("Error: empty path element not allowed"); if(s.indexOf('*')<0 && s.indexOf('?')<0){ //if((flags&IGNORE_CASE)==0) return PathElementMask.fixedMask(s,dirsOnly); //just a dirty trick, //on windows this could be a disk name ("D:"), //and so won't be listed, so the RegularMask won't help if((flags&IGNORE_CASE)==0 || s.indexOf(':')>=0) return PathElementMask.fixedMask(s,dirsOnly); else return PathElementMask.regularMask(s,flags,dirsOnly); } else if(s.equals("*")) return PathElementMask.anyFile(dirsOnly); else if(s.equals("**")) return PathElementMask.anyPath(dirsOnly); else return PathElementMask.regularMask(s,flags,dirsOnly); } public String toString(){ return str; } //public static void main(String[] args)throws Exception{ // PathPattern path=new PathPattern(args.length>0? args[0]: "/**/*tmp*/**",true); // //PathPattern path=new PathPattern(args.length>0? args[0]: "*/*",true); // //PathPattern path=new PathPattern(args.length>0? args[0]: "/**/*abc*",true); // Enumeration e=path.enumerateFiles(); // int c=0; // int err=0; // Matcher m=path.matcher(); // long t0=System.currentTimeMillis(); // //while(e.hasMoreElements()){ // //while(e.hasMoreElements() && c<30){ // while(e.hasMoreElements() && err<10){ // File f=(File)e.nextElement(); // if(!m.matches(f.getPath())){ // System.out.println("error with file: "+f); // err++; // } // else{ // //System.out.println("file matches: "+m.groupv()); // } // c++; // } // long t1=System.currentTimeMillis(); // System.out.println("found "+err+" errors in "+c+" files, time="+(t1-t0)); //} }jregex/jregex/util/io/Enumerator.java0000644000175000017500000000407307503220206020072 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex.util.io; import java.util.Enumeration; import java.util.NoSuchElementException; abstract class Enumerator implements Enumeration{ protected Object currObj; protected abstract boolean find(); public boolean hasMoreElements(){ return currObj!=null || find(); } public Object nextElement(){ if(currObj==null && !find()) throw new NoSuchElementException(); Object tmp=currObj; currObj=null; return tmp; } }jregex/jregex/MatchIterator.java0000644000175000017500000000336107503220206017132 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; import java.io.*; public interface MatchIterator{ public boolean hasMore(); public MatchResult nextMatch(); public int count(); }jregex/jregex/Bitset.java0000644000175000017500000004514207503220206015621 0ustar andriusandrius/** * Copyright (c) 2001, Sergey A. Samokhodkin * All rights reserved. * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * - Redistributions in binary form * must reproduce the above copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided with the distribution. * - Neither the name of jregex nor the names of its contributors may be used * to endorse or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * @version 1.2_01 */ package jregex; class Bitset implements UnicodeConstants{ private static final Block[][] categoryBits=new Block[CATEGORY_COUNT][BLOCK_COUNT]; static{ for(int i=Character.MIN_VALUE;i<=Character.MAX_VALUE;i++){ int cat=Character.getType((char)i); int blockNo=(i>>8)&0xff; Block b=categoryBits[cat][blockNo]; if(b==null) categoryBits[cat][blockNo]=b=new Block(); //if(i>32 && i<127)System.out.println((char)i+" -> ["+cat+"]["+blockNo+"].("+i+")"); b.set(i&0xff); } } private boolean positive=true; private boolean isLarge=false; boolean[] block0; //1-byte bit set private static final boolean[] emptyBlock0=new boolean[BLOCK_SIZE]; Block[] blocks; //2-byte bit set private int weight; final void reset(){ positive=true; block0=null; blocks=null; isLarge=false; weight=0; } final static void unify(Bitset bs,Term term){ if(bs.isLarge){ term.type=Term.BITSET2; term.bitset2=Block.toBitset2(bs.blocks); } else{ term.type=Term.BITSET; term.bitset=bs.block0==null? emptyBlock0: bs.block0; } term.inverse=!bs.positive; term.weight=bs.positive? bs.weight: MAX_WEIGHT-bs.weight; } final void setPositive(boolean b){ positive=b; } final boolean isPositive(){ return positive; } final boolean isLarge(){ return isLarge; } private final void enableLargeMode(){ if(isLarge) return; Block[] blocks=new Block[BLOCK_COUNT]; this.blocks=blocks; if(block0!=null){ blocks[0]=new Block(block0); } isLarge=true; } final int getWeight(){ return positive? weight: MAX_WEIGHT-weight; } final void setWordChar(boolean unicode){ if(unicode){ setCategory(Lu); setCategory(Ll); setCategory(Lt); setCategory(Lo); setCategory(Nd); setChar('_'); } else{ setRange('a','z'); setRange('A','Z'); setRange('0','9'); setChar('_'); } } final void setDigit(boolean unicode){ if(unicode){ setCategory(Nd); } else{ setRange('0','9'); } } final void setSpace(boolean unicode){ if(unicode){ setCategory(Zs); setCategory(Zp); setCategory(Zl); } else{ setChar(' '); setChar('\r'); setChar('\n'); setChar('\t'); setChar('\f'); } } final void setCategory(int c){ if(!isLarge) enableLargeMode(); Block[] catBits=categoryBits[c]; weight+=Block.add(this.blocks,catBits,0,BLOCK_COUNT-1,false); //System.out.println("["+this+"].setCategory("+c+"): weight="+weight); } final void setChars(String chars){ for(int i=chars.length()-1;i>=0;i--) setChar(chars.charAt(i)); } final void setChar(char c){ setRange(c,c); } final void setRange(char c1,char c2){ //System.out.println("["+this+"].setRange("+c1+","+c2+"):"); //if(c1>31 && c1<=126 && c2>31 && c2<=126) System.out.println("setRange('"+c1+"','"+c2+"'):"); //else System.out.println("setRange(["+Integer.toHexString(c1)+"],["+Integer.toHexString(c2)+"]):"); if(c2>=256 || isLarge){ int s=0; if(!isLarge){ enableLargeMode(); } Block[] blocks=this.blocks; for(int c=c1;c<=c2;c++){ int i2=(c>>8)&0xff; int i=c&0xff; Block block=blocks[i2]; if(block==null){ blocks[i2]=block=new Block(); } if(block.set(i))s++; } weight+=s; } else{ boolean[] block0=this.block0; if(block0==null){ this.block0=block0=new boolean[BLOCK_SIZE]; } weight+=set(block0,true,c1,c2); } } final void add(Bitset bs){ add(bs,false); } final void add(Bitset bs,boolean inverse){ weight+=addImpl(this,bs,!bs.positive^inverse); } private final static int addImpl(Bitset bs1, Bitset bs2, boolean inv){ int s=0; if(!bs1.isLarge && !bs2.isLarge && !inv){ if(bs2.block0!=null){ boolean[] bits=bs1.block0; if(bits==null) bs1.block0=bits=new boolean[BLOCK_SIZE]; s+=add(bits,bs2.block0,0,BLOCK_SIZE-1,false); } } else { if(!bs1.isLarge) bs1.enableLargeMode(); if(!bs2.isLarge) bs2.enableLargeMode(); s+=Block.add(bs1.blocks,bs2.blocks,0,BLOCK_COUNT-1,inv); } return s; } final void subtract(Bitset bs){ subtract(bs,false); } final void subtract(Bitset bs,boolean inverse){ //System.out.println("["+this+"].subtract(["+bs+"],"+inverse+"):"); weight+=subtractImpl(this,bs,!bs.positive^inverse); } private final static int subtractImpl(Bitset bs1,Bitset bs2,boolean inv){ int s=0; if(!bs1.isLarge && !bs2.isLarge && !inv){ boolean[] bits1,bits2; if((bits2=bs2.block0)!=null){ bits1=bs1.block0; if(bits1==null) return 0; s+=subtract(bits1,bits2,0,BLOCK_SIZE-1,false); } } else { if(!bs1.isLarge) bs1.enableLargeMode(); if(!bs2.isLarge) bs2.enableLargeMode(); s+=Block.subtract(bs1.blocks,bs2.blocks,0,BLOCK_COUNT-1,inv); } return s; } final void intersect(Bitset bs){ intersect(bs,false); } final void intersect(Bitset bs,boolean inverse){ //System.out.println("["+this+"].intersect(["+bs+"],"+inverse+"):"); subtract(bs,!inverse); } static final int add(boolean[] bs1,boolean[] bs2,int from,int to,boolean inv){ //System.out.println("Bitset.add(boolean[],boolean[],"+inv+"):"); int s=0; for(int i=from;i<=to;i++){ if(bs1[i]) continue; if(!(bs2[i]^inv)) continue; //System.out.println(" "+i+": value0="+value0+", value="+value); s++; bs1[i]=true; //System.out.println(" s="+s+", bs1[i]->"+bs1[i]); } return s; } static final int subtract(boolean[] bs1,boolean[] bs2,int from,int to,boolean inv){ //System.out.println("Bitset.subtract(boolean[],boolean[],"+inv+"):"); int s=0; for(int i=from;i<=to;i++){ if(!bs1[i]) continue; if(!(bs2[i]^inv)) continue; s--; bs1[i]=false; //if(i>32 && i<127) System.out.println(" s="+s+", bs1['"+(char)i+"']->"+bs1[i]); //else System.out.println(" s="+s+", bs1["+i+"]->"+bs1[i]); } return s; } static final int set(boolean[] arr,boolean value,int from,int to){ int s=0; for(int i=from;i<=to;i++){ if(arr[i]==value) continue; if(value) s++; else s--; arr[i]=value; } return s; } public String toString(){ StringBuffer sb=new StringBuffer(); if(!positive) sb.append('^'); if(isLarge) sb.append(CharacterClass.stringValue2(Block.toBitset2(blocks))); else if(block0!=null) sb.append(CharacterClass.stringValue0(block0)); sb.append('('); sb.append(getWeight()); sb.append(')'); return sb.toString(); } /* public static void main(String[] args){ //System.out.print("blocks(Lu)="); //System.out.println(CharacterClass.stringValue2(Block.toBitset2(categoryBits[Lu]))); //System.out.println("[1][0].get('a')="+categoryBits[1][0].get('a')); //System.out.println("[1][0].get('A')="+categoryBits[1][0].get('A')); //System.out.println("[1][0].get(65)="+categoryBits[1][0].get(65)); //System.out.println(""+categoryBits[1][0].get('A')); Bitset b1=new Bitset(); //b1.setCategory(Lu); //b1.enableLargeMode(); b1.setRange('a','z'); b1.setRange('à','ÿ'); Bitset b2=new Bitset(); //b2.setCategory(Ll); //b2.enableLargeMode(); b2.setRange('A','Z'); b2.setRange('À','ß'); Bitset b=new Bitset(); //bs.setRange('a','z'); //bs.setRange('A','Z'); b.add(b1); b.add(b2,true); System.out.println("b1="+b1); System.out.println("b2="+b2); System.out.println("b=b1+^b2="+b); b.subtract(b1,true); System.out.println("(b1+^b2)-^b1="+b); } */ } class Block implements UnicodeConstants{ private boolean isFull; //private boolean[] bits; boolean[] bits; private boolean shared=false; Block(){} Block(boolean[] bits){ this.bits=bits; shared=true; } final boolean set(int c){ //System.out.println("Block.add("+CharacterClass.stringValue2(toBitset2(targets))+","+CharacterClass.stringValue2(toBitset2(addends))+","+from*BLOCK_SIZE+","+to*BLOCK_SIZE+","+inv+"):"); if(isFull) return false; boolean[] bits=this.bits; if(bits==null){ this.bits=bits=new boolean[BLOCK_SIZE]; shared=false; bits[c]=true; return true; } if(bits[c]) return false; if(shared) bits=copyBits(this); bits[c]=true; return true; } final boolean get(int c){ if(isFull) return true; boolean[] bits=this.bits; if(bits==null){ return false; } return bits[c]; } final static int add(Block[] targets,Block[] addends,int from,int to,boolean inv){ //System.out.println("Block.add("+CharacterClass.stringValue2(toBitset2(targets))+","+CharacterClass.stringValue2(toBitset2(addends))+","+from*BLOCK_SIZE+","+to*BLOCK_SIZE+","+inv+"):"); //System.out.println("Block.add():"); int s=0; for(int i=from;i<=to;i++){ Block addend=addends[i]; //System.out.println(" "+i+": "); //System.out.println(" target="+(target==null? "null": i==0? CharacterClass.stringValue0(target.bits): "{"+count(target.bits,0,BLOCK_SIZE-1)+"}")); //System.out.println(" addend="+(addend==null? "null": i==0? CharacterClass.stringValue0(addend.bits): "{"+count(addend.bits,0,BLOCK_SIZE-1)+"}")); if(addend==null){ if(!inv) continue; } else if(addend.isFull && inv) continue; Block target=targets[i]; if(target==null) targets[i]=target=new Block(); else if(target.isFull) continue; s+=add(target,addend,inv); //System.out.println(" result="+(target==null? "null": i==0? CharacterClass.stringValue0(target.bits): "{"+count(target.bits,0,BLOCK_SIZE-1)+"}")); //System.out.println(" s="+s); } //System.out.println(" s="+s); return s; } private final static int add(Block target,Block addend,boolean inv){ //System.out.println("Block.add(Block,Block):"); //there is provided that !target.isFull boolean[] targetbits,addbits; if(addend==null){ if(!inv) return 0; int s=BLOCK_SIZE; if((targetbits=target.bits)!=null){ s-=count(targetbits,0,BLOCK_SIZE-1); } target.isFull=true; target.bits=null; target.shared=false; return s; } else if(addend.isFull){ if(inv) return 0; int s=BLOCK_SIZE; if((targetbits=target.bits)!=null){ s-=count(targetbits,0,BLOCK_SIZE-1); } target.isFull=true; target.bits=null; target.shared=false; return s; } else if((addbits=addend.bits)==null){ if(!inv) return 0; int s=BLOCK_SIZE; if((targetbits=target.bits)!=null){ s-=count(targetbits,0,BLOCK_SIZE-1); } target.isFull=true; target.bits=null; target.shared=false; return s; } else{ if((targetbits=target.bits)==null){ if(!inv){ target.bits=addbits; target.shared=true; return count(addbits,0,BLOCK_SIZE-1); } else{ target.bits=targetbits=emptyBits(null); target.shared=false; return Bitset.add(targetbits,addbits,0,BLOCK_SIZE-1,inv); } } else{ if(target.shared) targetbits=copyBits(target); return Bitset.add(targetbits,addbits,0,BLOCK_SIZE-1,inv); } } } final static int subtract(Block[] targets,Block[] subtrahends,int from,int to,boolean inv){ //System.out.println("Block.subtract(Block[],Block[],"+inv+"):"); int s=0; for(int i=from;i<=to;i++){ //System.out.println(" "+i+": "); Block target=targets[i]; if(target==null || (!target.isFull && target.bits==null)) continue; //System.out.println(" target="+(target==null? "null": i==0? CharacterClass.stringValue0(target.bits): "{"+ (target.isFull? BLOCK_SIZE: count(target.bits,0,BLOCK_SIZE-1))+"}")); Block subtrahend=subtrahends[i]; //System.out.println(" subtrahend="+(subtrahend==null? "null": i==0? CharacterClass.stringValue0(subtrahend.bits): "{"+(subtrahend.isFull? BLOCK_SIZE: count(subtrahend.bits,0,BLOCK_SIZE-1))+"}")); if(subtrahend==null){ if(!inv) continue; else{ if(target.isFull){ s-=BLOCK_SIZE; } else{ s-=count(target.bits,0,BLOCK_SIZE-1); } target.isFull=false; target.bits=null; target.shared=false; } } else{ s+=subtract(target,subtrahend,inv); } //System.out.println(" result="+(target==null? "null": i==0? CharacterClass.stringValue0(target.bits): "{"+ (target.isFull? BLOCK_SIZE: target.bits==null? 0: count(target.bits,0,BLOCK_SIZE-1))+"}")); //System.out.println(" s="+s); } //System.out.println(" s="+s); return s; } private final static int subtract(Block target,Block subtrahend,boolean inv){ boolean[] targetbits,subbits; //System.out.println("subtract(Block,Block,"+inv+")"); //there is provided that target.isFull or target.bits!=null if(subtrahend.isFull){ if(inv) return 0; int s=0; if(target.isFull){ s=BLOCK_SIZE; } else{ s=count(target.bits,0,BLOCK_SIZE-1); } target.isFull=false; target.bits=null; target.shared=false; return s; } else if((subbits=subtrahend.bits)==null){ if(!inv) return 0; int s=0; if(target.isFull){ s=BLOCK_SIZE; } else{ s=count(target.bits,0,BLOCK_SIZE-1); } target.isFull=false; target.bits=null; target.shared=false; return s; } else{ if(target.isFull){ boolean[] bits=fullBits(target.bits); int s=Bitset.subtract(bits,subbits,0,BLOCK_SIZE-1,inv); target.isFull=false; target.shared=false; target.bits=bits; return s; } else{ if(target.shared) targetbits=copyBits(target); else targetbits=target.bits; return Bitset.subtract(targetbits,subbits,0,BLOCK_SIZE-1,inv); } } } private static boolean[] copyBits(Block block){ boolean[] bits=new boolean[BLOCK_SIZE]; System.arraycopy(block.bits,0,bits,0,BLOCK_SIZE); block.bits=bits; block.shared=false; return bits; } private static boolean[] fullBits(boolean[] bits){ if(bits==null) bits=new boolean[BLOCK_SIZE]; System.arraycopy(FULL_BITS,0,bits,0,BLOCK_SIZE); return bits; } private static boolean[] emptyBits(boolean[] bits){ if(bits==null) bits=new boolean[BLOCK_SIZE]; else System.arraycopy(EMPTY_BITS,0,bits,0,BLOCK_SIZE); return bits; } final static int count(boolean[] arr, int from, int to){ int s=0; for(int i=from;i<=to;i++){ if(arr[i]) s++; } return s; } final static boolean[][] toBitset2(Block[] blocks){ int len=blocks.length; boolean[][] result=new boolean[len][]; for(int i=0;i * To match a regular expression myExpr against a text myString one should first create a Pattern object:
     * Pattern p=new Pattern(myExpr);
     * 
    * then obtain a Matcher object:
     * Matcher matcher=p.matcher(myText);
     * 
    * The latter is an automaton that actually performs a search. It provides the following methods: *
  • search for matching substrings : matcher.find() or matcher.findAll(); *
  • test whether the text matches the whole pattern : matcher.matches(); *
  • test whether the text matches the beginning of the pattern : matcher.matchesPrefix(); *
  • search with custom options : matcher.find(int options) *

    * Flags
    * Flags (see REFlags interface) change the meaning of some regular expression elements at compiletime. * These flags may be passed both as string(see Pattern(String,String)) and as bitwise OR of: *

  • REFlags.IGNORE_CASE - enables case insensitivity *
  • REFlags.MULTILINE - forces "^" and "$" to match both at the start and the end of line; *
  • REFlags.DOTALL - forces "." to match eols('\r' and '\n' in ASCII); *
  • REFlags.IGNORE_SPACES - literal spaces in expression are ignored for better readability; *
  • REFlags.UNICODE - the predefined classes('\w','\d',etc) are referenced to Unicode; *
  • REFlags.XML_SCHEMA - permits XML Schema regular expressions syntax extentions. *

    * Multithreading
    * Pattern instances are thread-safe, i.e. the same Pattern object may be used * by any number of threads simultaniously. On the other hand, the Matcher objects * are NOT thread safe, so, given a Pattern instance, each thread must obtain * and use its own Matcher. * * @see REFlags * @see Matcher * @see Matcher#setTarget(java.lang.String) * @see Matcher#setTarget(java.lang.String,int,int) * @see Matcher#setTarget(char[],int,int) * @see Matcher#setTarget(java.io.Reader,int) * @see MatchResult * @see MatchResult#group(int) * @see MatchResult#start(int) * @see MatchResult#end(int) * @see MatchResult#length(int) * @see MatchResult#charAt(int,int) * @see MatchResult#prefix() * @see MatchResult#suffix() */ public class Pattern implements Serializable,REFlags{ String stringRepr; // tree entry Term root,root0; // required number of memory slots int memregs; // required number of iteration counters int counters; // number of lookahead groups int lookaheads; Hashtable namedGroupMap; protected Pattern() throws PatternSyntaxException{} /** * Compiles an expression with default flags. * @param regex the Perl5-compatible regular expression string. * @exception PatternSyntaxException if the argument doesn't correspond to perl5 regex syntax. * @see Pattern#Pattern(java.lang.String,java.lang.String) * @see Pattern#Pattern(java.lang.String,int) */ public Pattern(String regex) throws PatternSyntaxException{ this(regex,DEFAULT); } /** * Compiles a regular expression using Perl5-style flags. * The flag string should consist of letters 'i','m','s','x','u','X'(the case is significant) and a hyphen. * The meaning of letters: *

      *
    • i - case insensitivity, corresponds to REFLlags.IGNORE_CASE; *
    • m - multiline treatment(BOLs and EOLs affect the '^' and '$'), corresponds to REFLlags.MULTILINE flag; *
    • s - single line treatment('.' matches \r's and \n's),corresponds to REFLlags.DOTALL; *
    • x - extended whitespace comments (spaces and eols in the expression are ignored), corresponds to REFLlags.IGNORE_SPACES. *
    • u - predefined classes are regarded as belonging to Unicode, corresponds to REFLlags.UNICODE; this may yield some performance penalty. *
    • X - compatibility with XML Schema, corresponds to REFLlags.XML_SCHEMA. *
    * @param regex the Perl5-compatible regular expression string. * @param flags the Perl5-compatible flags. * @exception PatternSyntaxException if the argument doesn't correspond to perl5 regex syntax. * see REFlags */ public Pattern(String regex,String flags) throws PatternSyntaxException{ stringRepr=regex; compile(regex,parseFlags(flags)); } /** * Compiles a regular expression using REFlags. * The flags parameter is a bitwise OR of the folloing values: *
      *
    • REFLlags.IGNORE_CASE - case insensitivity, corresponds to 'i' letter; *
    • REFLlags.MULTILINE - multiline treatment(BOLs and EOLs affect the '^' and '$'), corresponds to 'm'; *
    • REFLlags.DOTALL - single line treatment('.' matches \r's and \n's),corresponds to 's'; *
    • REFLlags.IGNORE_SPACES - extended whitespace comments (spaces and eols in the expression are ignored), corresponds to 'x'. *
    • REFLlags.UNICODE - predefined classes are regarded as belonging to Unicode, corresponds to 'u'; this may yield some performance penalty. *
    • REFLlags.XML_SCHEMA - compatibility with XML Schema, corresponds to 'X'. *
    * @param regex the Perl5-compatible regular expression string. * @param flags the Perl5-compatible flags. * @exception PatternSyntaxException if the argument doesn't correspond to perl5 regex syntax. * see REFlags */ public Pattern(String regex, int flags) throws PatternSyntaxException{ compile(regex,flags); } /* //java.util.regex.* compatibility public static Pattern compile(String regex,int flags) throws PatternSyntaxException{ Pattern p=new Pattern(); p.compile(regex,flags); return flags; } */ protected void compile(String regex,int flags) throws PatternSyntaxException{ stringRepr=regex; Term.makeTree(regex,flags,this); } /** * How many capturing groups this expression includes? */ public int groupCount(){ return memregs; } /** * Get numeric id for a group name. * @return null if no such name found. * @see MatchResult#group(java.lang.String) * @see MatchResult#isCaptured(java.lang.String) */ public Integer groupId(String name){ return ((Integer)namedGroupMap.get(name)); } /** * A shorthand for Pattern.matcher(String).matches().
    * @param s the target * @return true if the entire target matches the pattern * @see Matcher#matches() * @see Matcher#matches(String) */ public boolean matches(String s){ return matcher(s).matches(); } /** * A shorthand for Pattern.matcher(String).matchesPrefix().
    * @param s the target * @return true if the entire target matches the beginning of the pattern * @see Matcher#matchesPrefix() */ public boolean startsWith(String s){ return matcher(s).matchesPrefix(); } /** * Returns a targetless matcher. * Don't forget to supply a target. */ public Matcher matcher(){ return new Matcher(this); } /** * Returns a matcher for a specified string. */ public Matcher matcher(String s){ Matcher m=new Matcher(this); m.setTarget(s); return m; } /** * Returns a matcher for a specified region. */ public Matcher matcher(char[] data,int start,int end){ Matcher m=new Matcher(this); m.setTarget(data,start,end); return m; } /** * Returns a matcher for a match result (in a performance-friendly way). * groupId parameter specifies which group is a target. * @param groupId which group is a target; either positive integer(group id), or one of MatchResult.MATCH,MatchResult.PREFIX,MatchResult.SUFFIX,MatchResult.TARGET. */ public Matcher matcher(MatchResult res,int groupId){ Matcher m=new Matcher(this); if(res instanceof Matcher){ m.setTarget((Matcher)res,groupId); } else{ m.setTarget(res.targetChars(),res.start(groupId)+res.targetStart(),res.length(groupId)); } return m; } /** * Just as above, yet with symbolic group name. * @exception NullPointerException if there is no group with such name */ public Matcher matcher(MatchResult res,String groupName){ Integer id=res.pattern().groupId(groupName); if(id==null) throw new IllegalArgumentException("group not found:"+groupName); int group=id.intValue(); return matcher(res,group); } /** * Returns a matcher taking a text stream as target. * Note that this is not a true POSIX-style stream matching, i.e. the whole length of the text is preliminary read and stored in a char array. * @param text a text stream * @param len the length to read from a stream; if len is -1, the whole stream is read in. * @exception IOException indicates an IO problem * @exception OutOfMemoryException if a stream is too lengthy */ public Matcher matcher(Reader text,int length)throws IOException{ Matcher m=new Matcher(this); m.setTarget(text,length); return m; } /** * Returns a replacer of a pattern by specified perl-like expression. * Such replacer will substitute all occurences of a pattern by an evaluated expression * ("$&" and "$0" will substitute by the whole match, "$1" will substitute by group#1, etc). * Example:
       * String text="The quick brown fox jumped over the lazy dog";
       * Pattern word=new Pattern("\\w+");
       * System.out.println(word.replacer("[$&]").replace(text));
       * //prints "[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog]"
       * Pattern swap=new Pattern("(fox|dog)(.*?)(fox|dog)");
       * System.out.println(swap.replacer("$3$2$1").replace(text));
       * //prints "The quick brown dog jumped over the lazy fox"
       * Pattern scramble=new Pattern("(\\w+)(.*?)(\\w+)");
       * System.out.println(scramble.replacer("$3$2$1").replace(text));
       * //prints "quick The fox brown over jumped lazy the dog"
       * 
    * @param expr a perl-like expression, the "$&" and "${&}" standing for whole match, the "$N" and "${N}" standing for group#N, and "${Foo}" standing for named group Foo. * @see Replacer */ public Replacer replacer(String expr){ return new Replacer(this,expr); } /** * Returns a replacer will substitute all occurences of a pattern * through applying a user-defined substitution model. * @param model a Substitution object which is in charge for match substitution * @see Replacer */ public Replacer replacer(Substitution model){ return new Replacer(this,model); } /** * Tokenizes a text by an occurences of the pattern. * Note that a series of adjacent matches are regarded as a single separator. * The same as new RETokenizer(Pattern,String); * @see RETokenizer * @see RETokenizer#RETokenizer(jregex.Pattern,java.lang.String) * */ public RETokenizer tokenizer(String text){ return new RETokenizer(this,text); } /** * Tokenizes a specified region by an occurences of the pattern. * Note that a series of adjacent matches are regarded as a single separator. * The same as new RETokenizer(Pattern,char[],int,int); * @see RETokenizer * @see RETokenizer#RETokenizer(jregex.Pattern,char[],int,int) */ public RETokenizer tokenizer(char[] data,int off,int len){ return new RETokenizer(this,data,off,len); } /** * Tokenizes a specified region by an occurences of the pattern. * Note that a series of adjacent matches are regarded as a single separator. * The same as new RETokenizer(Pattern,Reader,int); * @see RETokenizer * @see RETokenizer#RETokenizer(jregex.Pattern,java.io.Reader,int) */ public RETokenizer tokenizer(Reader in,int length) throws IOException{ return new RETokenizer(this,in,length); } public String toString(){ return stringRepr; } /** * Returns a less or more readable representation of a bytecode for the pattern. */ public String toString_d(){ return root.toStringAll(); } static int parseFlags(String flags)throws PatternSyntaxException{ boolean enable=true; int len=flags.length(); int result=DEFAULT; for(int i=0;i * Pattern p=new Pattern("\\s+"); //any number of space characters * String text="blah blah blah"; * //by factory method * RETokenizer tok1=p.tokenizer(text); * //or by constructor * RETokenizer tok2=new RETokenizer(p,text); * * Now the one way is to use the tokenizer as a token enumeration/iterator:
     * while(tok1.hasMore()) System.out.println(tok1.nextToken());
     * 
    * and another way is to split it into a String array:
     
     * String[] arr=tok2.split();
     * for(int i=0;i
     * @see        Pattern#tokenizer(java.lang.String)
     */
    
    public class RETokenizer implements Enumeration{
       private Matcher matcher;
       private boolean checked;
       private boolean hasToken;
       private String token;
       private int pos=0;
       private boolean endReached=false;
       private boolean emptyTokensEnabnled=false;
       
       public RETokenizer(Pattern pattern,String text){
          this(pattern.matcher(text),false);
       }
       
       public RETokenizer(Pattern pattern,char[] chars,int off,int len){
          this(pattern.matcher(chars,off,len),false);
       }
       
       public RETokenizer(Pattern pattern,Reader r,int len) throws IOException{
          this(pattern.matcher(r,len),false);
       }
       
       public RETokenizer(Matcher m, boolean emptyEnabled){
          matcher=m;
          emptyTokensEnabnled=emptyEnabled;
       }
       
       public void setEmptyEnabled(boolean b){
          emptyTokensEnabnled=b;
       }
       
       public boolean isEmptyEnabled(){
          return emptyTokensEnabnled;
       }
       
       public boolean hasMore(){
          if(!checked) check();
          return hasToken;
       }
       
       public String nextToken(){
          if(!checked) check();
          if(!hasToken) throw new NoSuchElementException();
          checked=false;
          return token;
       }
       
       public String[] split(){
          return collect(this,null,0);
       }
       
       public void reset(){
          matcher.setPosition(0);
       }
       
       private static final String[] collect(RETokenizer tok,String[] arr,int count){
          if(tok.hasMore()){
             String s=tok.nextToken();
    //System.out.println("collect(,,"+count+"): token="+s);
             arr=collect(tok,arr,count+1);
             arr[count]=s;
          }
          else{
             arr=new String[count];
          }
          return arr;
       }
       
       private void check(){
          final boolean emptyOk=this.emptyTokensEnabnled;
          checked=true;
          if(endReached){
             hasToken=false;
             return;
          }
          Matcher m=matcher;
          boolean hasMatch=false;
          while(m.find()){
             if(m.start()>0){
                hasMatch=true;
                break;
             }
             else if(m.end()>0){
                if(emptyOk){
                   hasMatch=true;
                   break;
                }
                else m.setTarget(m,MatchResult.SUFFIX);
             }
          }
          if(!hasMatch){
             endReached=true;
             if(m.length(m.TARGET)==0 && !emptyOk){
                hasToken=false;
             }
             else{
                hasToken=true;
                token=m.target();
             }
             return;
          }
    //System.out.println(m.target()+": "+m.groupv());
    //System.out.println("prefix: "+m.prefix());
    //System.out.println("suffix: "+m.suffix());
          hasToken=true;
          token=m.prefix();
          m.setTarget(m,MatchResult.SUFFIX);
          //m.setTarget(m.suffix());
       }
       
       public boolean hasMoreElements(){
          return hasMore();
       }
       
      /**
       * @return a next token as a String
       */
       public Object nextElement(){
          return nextToken();
       }
       
       /*
       public static void main(String[] args){
          RETokenizer rt=new RETokenizer(new Pattern("/").matcher("/a//b/c/"),false);
          while(rt.hasMore()){
             System.out.println("<"+rt.nextToken()+">");
          }
       }
       */
    }jregex/jregex/TreeInfo.java0000644000175000017500000000342307503220206016076 0ustar  andriusandrius/**
     * Copyright (c) 2001, Sergey A. Samokhodkin
     * All rights reserved.
     * 
     * Redistribution and use in source and binary forms, with or without modification, 
     * are permitted provided that the following conditions are met:
     * 
     * - Redistributions of source code must retain the above copyright notice, 
     * this list of conditions and the following disclaimer. 
     * - Redistributions in binary form 
     * must reproduce the above copyright notice, this list of conditions and the following 
     * disclaimer in the documentation and/or other materials provided with the distribution.
     * - Neither the name of jregex nor the names of its contributors may be used 
     * to endorse or promote products derived from this software without specific prior 
     * written permission. 
     * 
     * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY 
     * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 
     * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 
     * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 
     * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
     * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 
     * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
     * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY 
     * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     * 
     * @version 1.2_01
     */
    
    package jregex;
    
    import java.util.Hashtable;
    
    class TreeInfo{
       int memregs;
       int counters;
       int lookaheadCount;
       Term root,optimized;
       Hashtable groupMap;
       
       TreeInfo(){}
    }jregex/jregex/TextBuffer.java0000644000175000017500000000341607503220206016443 0ustar  andriusandrius/**
     * Copyright (c) 2001, Sergey A. Samokhodkin
     * All rights reserved.
     * 
     * Redistribution and use in source and binary forms, with or without modification, 
     * are permitted provided that the following conditions are met:
     * 
     * - Redistributions of source code must retain the above copyright notice, 
     * this list of conditions and the following disclaimer. 
     * - Redistributions in binary form 
     * must reproduce the above copyright notice, this list of conditions and the following 
     * disclaimer in the documentation and/or other materials provided with the distribution.
     * - Neither the name of jregex nor the names of its contributors may be used 
     * to endorse or promote products derived from this software without specific prior 
     * written permission. 
     * 
     * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY 
     * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 
     * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 
     * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 
     * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
     * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 
     * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
     * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY 
     * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     * 
     * @version 1.2_01
     */
    
    package jregex;
    
    import java.io.*;
    
    public interface TextBuffer{
       public void append(char c);
       public void append(char[] chars,int start,int len);
       public void append(String s);
    }jregex/jregex/UnicodeConstants.java0000644000175000017500000000562607503220206017655 0ustar  andriusandrius/**
     * Copyright (c) 2001, Sergey A. Samokhodkin
     * All rights reserved.
     * 
     * Redistribution and use in source and binary forms, with or without modification, 
     * are permitted provided that the following conditions are met:
     * 
     * - Redistributions of source code must retain the above copyright notice, 
     * this list of conditions and the following disclaimer. 
     * - Redistributions in binary form 
     * must reproduce the above copyright notice, this list of conditions and the following 
     * disclaimer in the documentation and/or other materials provided with the distribution.
     * - Neither the name of jregex nor the names of its contributors may be used 
     * to endorse or promote products derived from this software without specific prior 
     * written permission. 
     * 
     * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY 
     * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 
     * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 
     * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 
     * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
     * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 
     * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
     * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY 
     * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     * 
     * @version 1.2_01
     */
    
    package jregex;
    
    interface UnicodeConstants{
       int CATEGORY_COUNT=32;
       int Cc=Character.CONTROL;
       int Cf=Character.FORMAT;
       int Co=Character.PRIVATE_USE;
       int Cn=Character.UNASSIGNED;
       int Lu=Character.UPPERCASE_LETTER;
       int Ll=Character.LOWERCASE_LETTER;
       int Lt=Character.TITLECASE_LETTER;
       int Lm=Character.MODIFIER_LETTER;
       int Lo=Character.OTHER_LETTER;
       int Mn=Character.NON_SPACING_MARK;
       int Me=Character.ENCLOSING_MARK;
       int Mc=Character.COMBINING_SPACING_MARK;
       int Nd=Character.DECIMAL_DIGIT_NUMBER;
       int Nl=Character.LETTER_NUMBER;
       int No=Character.OTHER_NUMBER;
       int Zs=Character.SPACE_SEPARATOR;
       int Zl=Character.LINE_SEPARATOR;
       int Zp=Character.PARAGRAPH_SEPARATOR;
       int Cs=Character.SURROGATE;
       int Pd=Character.DASH_PUNCTUATION;
       int Ps=Character.START_PUNCTUATION;
       int Pi=Character.START_PUNCTUATION;
       int Pe=Character.END_PUNCTUATION;
       int Pf=Character.END_PUNCTUATION;
       int Pc=Character.CONNECTOR_PUNCTUATION;
       int Po=Character.OTHER_PUNCTUATION;
       int Sm=Character.MATH_SYMBOL;
       int Sc=Character.CURRENCY_SYMBOL;
       int Sk=Character.MODIFIER_SYMBOL;
       int So=Character.OTHER_SYMBOL;
       
       int BLOCK_COUNT=256;
       int BLOCK_SIZE=256;
       
       int MAX_WEIGHT=Character.MAX_VALUE+1;
       int[] CATEGORY_WEIGHTS=new int[CATEGORY_COUNT];
    }jregex/jregex/Term.java0000644000175000017500000020667707503220206015312 0ustar  andriusandrius/**
     * Copyright (c) 2001, Sergey A. Samokhodkin
     * All rights reserved.
     * 
     * Redistribution and use in source and binary forms, with or without modification, 
     * are permitted provided that the following conditions are met:
     * 
     * - Redistributions of source code must retain the above copyright notice, 
     * this list of conditions and the following disclaimer. 
     * - Redistributions in binary form 
     * must reproduce the above copyright notice, this list of conditions and the following 
     * disclaimer in the documentation and/or other materials provided with the distribution.
     * - Neither the name of jregex nor the names of its contributors may be used 
     * to endorse or promote products derived from this software without specific prior 
     * written permission. 
     * 
     * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY 
     * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 
     * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 
     * IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 
     * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
     * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 
     * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
     * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY 
     * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     * 
     * @version 1.2_01
     */
    
    package jregex;
    
    import java.util.*;
    
    class Term implements REFlags{
       //runtime Term types
       static final int CHAR        = 0;
       static final int BITSET      = 1;
       static final int BITSET2     = 2;
       static final int ANY_CHAR    = 4;
       static final int ANY_CHAR_NE = 5;
       
       static final int REG         = 6;
       static final int REG_I       = 7;
       static final int FIND        = 8;
       static final int FINDREG     = 9;
       static final int SUCCESS     = 10;
       
       /*optimization-transparent types*/
       static final int BOUNDARY    = 11;
       static final int DIRECTION   = 12;
       static final int UBOUNDARY    = 13;
       static final int UDIRECTION   = 14;
       
       static final int GROUP_IN          = 15;
       static final int GROUP_OUT         = 16;
       static final int VOID              = 17;
       
       static final int START             = 18;
       static final int END               = 19;
       static final int END_EOL           = 20;
       static final int LINE_START        = 21;
       static final int LINE_END          = 22;
       static final int LAST_MATCH_END    = 23;
       
       static final int CNT_SET_0      = 24;
       static final int CNT_INC        = 25;
       static final int CNT_GT_EQ      = 26;
       static final int READ_CNT_LT    = 27;
    
       static final int CRSTORE_CRINC = 28; //store on 'actual' search entry
       static final int CR_SET_0      = 29;
       static final int CR_LT         = 30;
       static final int CR_GT_EQ      = 31;
       
       /*optimization-nontransparent types*/
       static final int BRANCH                 = 32;
       static final int BRANCH_STORE_CNT       = 33;
       static final int BRANCH_STORE_CNT_AUX1  = 34;
       
       static final int PLOOKAHEAD_IN           = 35;
       static final int PLOOKAHEAD_OUT          = 36;
       static final int NLOOKAHEAD_IN           = 37;
       static final int NLOOKAHEAD_OUT          = 38;
       static final int PLOOKBEHIND_IN           = 39;
       static final int PLOOKBEHIND_OUT          = 40;
       static final int NLOOKBEHIND_IN           = 41;
       static final int NLOOKBEHIND_OUT          = 42;
       static final int INDEPENDENT_IN          = 43; //functionally the same as NLOOKAHEAD_IN
       static final int INDEPENDENT_OUT         = 44;
       
       static final int REPEAT_0_INF       = 45;
       static final int REPEAT_MIN_INF     = 46;
       static final int REPEAT_MIN_MAX     = 47;
       static final int REPEAT_REG_MIN_INF = 48;
       static final int REPEAT_REG_MIN_MAX = 49;
       
       static final int BACKTRACK_0           = 50;
       static final int BACKTRACK_MIN         = 51;
       static final int BACKTRACK_FIND_MIN    = 52;
       static final int BACKTRACK_FINDREG_MIN = 53;
       static final int BACKTRACK_REG_MIN     = 54;
       
       static final int MEMREG_CONDITION        = 55;
       static final int LOOKAHEAD_CONDITION_IN  = 56;
       static final int LOOKAHEAD_CONDITION_OUT = 57;
       static final int LOOKBEHIND_CONDITION_IN  = 58;
       static final int LOOKBEHIND_CONDITION_OUT = 59;
       
       //optimization
       static final int FIRST_TRANSPARENT = BOUNDARY;
       static final int LAST_TRANSPARENT  = CR_GT_EQ;
       
       // compiletime: length of vars[] (see makeTree())
       static final int VARS_LENGTH=4;
    
       // compiletime variable indicies:
       private static final int MEMREG_COUNT=0;    //refers current memreg index
       private static final int CNTREG_COUNT=1;   //refers current counters number
       private static final int DEPTH=2;      //refers current depth: (((depth=3)))
       private static final int LOOKAHEAD_COUNT=3;    //refers current memreg index
       
       private static final int LIMITS_LENGTH=3;
       private static final int LIMITS_PARSE_RESULT_INDEX=2;
       private static final int LIMITS_OK=1;
       private static final int LIMITS_FAILURE=2;
       
       //static CustomParser[] customParsers=new CustomParser[256];
       
       // **** CONTROL FLOW **** 
    
       // next-to-execute and next-if-failed commands;
       Term next,failNext;
       
       // **** TYPES ****
       
       int type=VOID;
       boolean inverse;
       
       // used with type=CHAR
       char c;
       
       // used with type=FIND
       int distance;
       boolean eat;
       
       // used with type=BITSET(2);
       boolean[] bitset;
       boolean[][] bitset2;
       boolean[] categoryBitset;  //types(unicode categories)
       
       // used with type=BALANCE;
       char[] brackets;
       
       // used for optimization with type=BITSET,BITSET2
       int weight;
    
       // **** MEMORISATION ****
    
       // memory slot, used with type=REG,GROUP_IN,GROUP_OUT
       int memreg=-1;
    
    
       // **** COUNTERS ****
    
       // max|min number of iterations
       // used with CNT_GT_EQ ,REPEAT_* etc.;
       int minCount,maxCount;
       
       // used with REPEAT_*,REPEAT_REG_*;
       Term target;
       
       // a counter slot to increment & compare with maxCount (CNT_INC etc.);
       int cntreg=0;
       
       // lookahead group id;
       int lookaheadId;
       
       // **** COMPILE HELPERS ****
    
       protected Term prev,in,out,out1,first,current;
       
       //new!!
       protected Term branchOut;
       
       //protected  boolean newBranch=false,closed=false;
       //protected  boolean newBranch=false;
    
       //for debugging
       static int instances;
       int instanceNum;
       
       Term(){
          //for debugging
          instanceNum=instances;
          instances++;
          in=out=this;
       }
       
       Term(int type){
          this();
          this.type=type;
       }
       
       static void makeTree(String s, int flags,Pattern re) throws PatternSyntaxException{
          char[] data=s.toCharArray();
          makeTree(data,0,data.length,flags,re);
       }
       
       static void makeTree(char[] data,int offset,int end,
             int flags,Pattern re) throws PatternSyntaxException{
          // memreg,counter,depth,lookahead
          int[] vars={1,0,0,0}; //don't use counters[0]
          
          //collect iterators for subsequent optimization
          Vector iterators=new Vector();
          Hashtable groupNames=new Hashtable();
          
          Pretokenizer t=new Pretokenizer(data,offset,end);
          Term term=makeTree(t,data,vars,flags,new Group(),iterators,groupNames);
          // term=(0-...-0)
    
          // convert closing outer bracket into success term
          term.out.type=SUCCESS;
          // term=(0-...-!!!
          
          //throw out opening bracket
          Term first=term.next;
          // term=...-!!!
          
          // Optimisation: 
          Term optimized=first;
          Optimizer opt=Optimizer.find(first);
          if(opt!=null) optimized=opt.makeFirst(first);
          
          Enumeration en=iterators.elements();
          while(en.hasMoreElements()){
             Iterator i=(Iterator)en.nextElement();
             i.optimize();
          }
          // ===
          
          re.root=optimized;
          re.root0=first;
          re.memregs=vars[MEMREG_COUNT];
          re.counters=vars[CNTREG_COUNT];
          re.lookaheads=vars[LOOKAHEAD_COUNT];
          re.namedGroupMap=groupNames;
       }
    
       private static Term makeTree(Pretokenizer t,char[] data,int[] vars,
             int flags,Term term,Vector iterators,Hashtable groupNames) throws PatternSyntaxException{
    //System.out.println("Term.makeTree(): flags="+flags);
          if(vars.length!=VARS_LENGTH) throw new IllegalArgumentException("vars.length should be "+VARS_LENGTH+", not "+vars.length);
          //Term term=new Term(isMemReg? vars[MEMREG_COUNT]: -1);
          // use memreg 0 as unsignificant
          //Term term=new Group(isMemReg? vars[MEMREG_COUNT]: 0);
          while(true){
             t.next();
             term.append(t.tOffset,t.tOutside,data,vars,flags,iterators,groupNames);
             switch(t.ttype){
                case Pretokenizer.FLAGS:
                   flags=t.flags(flags);
                   continue;
                case Pretokenizer.CLASS_GROUP:
                   t.next();
                   Term clg=new Term();
                   CharacterClass.parseGroup(data,t.tOffset,t.tOutside,clg,
                                   (flags&IGNORE_CASE)>0, (flags&IGNORE_SPACES)>0,
                                   (flags&UNICODE)>0, (flags&XML_SCHEMA)>0);
                   term.append(clg);
                   continue;
                case Pretokenizer.PLAIN_GROUP:
                   vars[DEPTH]++;
    //System.out.println("PLAIN_GROUP, t.tOffset="+t.tOffset+", t.tOutside="+t.tOutside+", t.flags("+flags+")="+t.flags(flags));
                   term.append(makeTree(t,data,vars,t.flags(flags),new Group(),iterators,groupNames));
                   break;
                case Pretokenizer.NAMED_GROUP:
                   String gname=t.groupName;
                   int id;
                   if(Character.isDigit(gname.charAt(0))){
                      try{
                         id=Integer.parseInt(gname);
                      }
                      catch(NumberFormatException e){
                         throw new PatternSyntaxException("group name starts with digit but is not a number");
                      }
                      if(groupNames.contains(new Integer(id))){
                         if(t.groupDeclared) throw new PatternSyntaxException("group redeclaration: "+gname+"; use ({=id}...) for multiple group assignments");
                      }
                      if(vars[MEMREG_COUNT]<=id)vars[MEMREG_COUNT]=id+1;
                   }
                   else{
                      Integer no=(Integer)groupNames.get(gname);
                      if(no==null){
                         id=vars[MEMREG_COUNT]++;
                         groupNames.put(t.groupName,new Integer(id));
                      }
                      else{
                         if(t.groupDeclared) throw new PatternSyntaxException("group redeclaration "+gname+"; use ({=name}...) for group reassignments");
                         id=no.intValue();
                      }
                   }
                   vars[DEPTH]++;
                   term.append(makeTree(t,data,vars,flags,new Group(id),iterators,groupNames));
                   break;
                case '(':
                   vars[DEPTH]++;
                   term.append(makeTree(t,data,vars,flags,new Group(vars[MEMREG_COUNT]++),iterators,groupNames));
                   break;
                case Pretokenizer.POS_LOOKAHEAD:
                   vars[DEPTH]++;
                   term.append(makeTree(t,data,vars,flags,new Lookahead(vars[LOOKAHEAD_COUNT]++,true),iterators,groupNames));
                   break;
                case Pretokenizer.NEG_LOOKAHEAD:
                   vars[DEPTH]++;
                   term.append(makeTree(t,data,vars,flags,new Lookahead(vars[LOOKAHEAD_COUNT]++,false),iterators,groupNames));
                   break;
                case Pretokenizer.POS_LOOKBEHIND:
                   vars[DEPTH]++;
                   term.append(makeTree(t,data,vars,flags,new Lookbehind(vars[LOOKAHEAD_COUNT]++,true),iterators,groupNames));
                   break;
                case Pretokenizer.NEG_LOOKBEHIND:
                   vars[DEPTH]++;
                   term.append(makeTree(t,data,vars,flags,new Lookbehind(vars[LOOKAHEAD_COUNT]++,false),iterators,groupNames));
                   break;
                case Pretokenizer.INDEPENDENT_REGEX:
                   vars[DEPTH]++;
                   term.append(makeTree(t,data,vars,flags,new IndependentGroup(vars[LOOKAHEAD_COUNT]++),iterators,groupNames));
                   break;
                case Pretokenizer.CONDITIONAL_GROUP:
                   vars[DEPTH]++;
                   t.next();
                   Term fork=null;
                   boolean positive=true;
                   switch(t.ttype){
                      case Pretokenizer.NEG_LOOKAHEAD:
                         positive=false;
                      case Pretokenizer.POS_LOOKAHEAD:
                         vars[DEPTH]++;
                         Lookahead la=new Lookahead(vars[LOOKAHEAD_COUNT]++,positive);
                         makeTree(t,data,vars,flags,la,iterators,groupNames);
                         fork=new ConditionalExpr(la);
                         break;
                      case Pretokenizer.NEG_LOOKBEHIND:
                         positive=false;
                      case Pretokenizer.POS_LOOKBEHIND:
                         vars[DEPTH]++;
                         Lookbehind lb=new Lookbehind(vars[LOOKAHEAD_COUNT]++,positive);
                         makeTree(t,data,vars,flags,lb,iterators,groupNames);
                         fork=new ConditionalExpr(lb);
                         break;
                      case '(':
                         t.next();
                         if(t.ttype!=')') throw new PatternSyntaxException("malformed condition");
                         int memregNo;
                         if(Character.isDigit(data[t.tOffset])) memregNo=makeNumber(t.tOffset,t.tOutside,data);
                         else{
                            String gn=new String(data,t.tOffset,t.tOutside-t.tOffset);
                            Integer gno=(Integer)groupNames.get(gn);
                            if(gno==null) throw new PatternSyntaxException("unknown group name in conditional expr.: "+gn);
                            memregNo=gno.intValue();
                         }
                         fork=new ConditionalExpr(memregNo);
                         break;
                      default:
                         throw new PatternSyntaxException("malformed conditional expression: "+t.ttype+" '"+(char)t.ttype+"'");
                   }
                   term.append(makeTree(t,data,vars,flags,fork,iterators,groupNames));
                   break;
                case '|':
                   term.newBranch();
                   break;
                case Pretokenizer.END:
                   if(vars[DEPTH]>0) throw new PatternSyntaxException("unbalanced parenthesis");
                   term.close();
                   return term;
                case ')':
                   if(vars[DEPTH]<=0) throw new PatternSyntaxException("unbalanced parenthesis");
                   term.close();
                   vars[DEPTH]--;
                   return term;
                case Pretokenizer.COMMENT:
                   while(t.ttype!=')') t.next();
                   continue;
                default:
                   throw new PatternSyntaxException("unknown token type: "+t.ttype);
             }
          }
       }
       
       static int makeNumber(int off, int out, char[] data){
          int n=0;
          for(int i=off;i9) return -1;
             n*=10;
             n+=d;
          }
          return n;
       }
       
       protected void append(int offset,int end,char[] data,
             int[] vars,int flags,Vector iterators,Hashtable gmap) throws PatternSyntaxException{
    //System.out.println("append("+new String(data,offset,end-offset)+")");
    //System.out.println("current="+this.current);
          int[] limits=new int[3];
          int i=offset;
          Term tmp,current=this.current;
          while(i0);
                         i=parseGroupId(data,p,end,br,gmap);
                         current=append(br);
                         continue;
                      }
                      else{
                         Term t=new Term();
                         i=CharacterClass.parseName(data,i,end,t,false,(flags&IGNORE_SPACES)>0);
                         current=append(t);
                         continue;
                      }
                   }
                   
                case ' ':
                case '\t':
                case '\r':
                case '\n':
                   if((flags&IGNORE_SPACES)>0){
                      i++;
                      continue;
                   }
                   //else go on as default
                   
                //symbolic items
                default:
                   tmp=new Term();
                   i=parseTerm(data,i,end,tmp,flags);
                   
                   if(tmp.type==END && i");
                   }
                   //"\A" 
                   //if(tmp.type==START && i>(offset+1)){
                   //   throw new PatternSyntaxException("'^' is not a first term in the group: <"+new String(data,offset,end-offset)+">");
                   //}
                   
                   current=append(tmp);
                   break;
             }
    //System.out.println("next term: "+next);
    //System.out.println("  next.out="+next.out);
    //System.out.println("  next.out1="+next.out1);
    //System.out.println("  next.branchOut="+next.branchOut);
          }
    //System.out.println(in.toStringAll());
    //System.out.println("current="+current);
    //System.out.println();
       }
       
       
       private static int parseGroupId(char[] data, int i, int end, Term term, Hashtable gmap) throws PatternSyntaxException{
          int id;
          int nstart=i;
          if(Character.isDigit(data[i])){
             while(Character.isDigit(data[i])){
                i++;
                if(i==end) throw new PatternSyntaxException("group_id expected");
             }
             id=makeNumber(nstart,i,data);
          }
          else{
             while(Character.isJavaIdentifierPart(data[i])){
                i++;
                if(i==end) throw new PatternSyntaxException("group_id expected");
             }
             String s=new String(data,nstart,i-nstart);
             Integer no=(Integer)gmap.get(s);
             if(no==null)throw new PatternSyntaxException("backreference to unknown group: "+s);
             id=no.intValue();
          }
          while(Character.isWhitespace(data[i])){
             i++;
             if(i==end) throw new PatternSyntaxException("'}' expected");
          }
          
          int c=data[i++];
          
          if(c!='}') throw new PatternSyntaxException("'}' expected");
          
          term.memreg=id;
          return i;
       }
       
       protected Term append(Term term) throws PatternSyntaxException{
    //System.out.println("append("+term.toStringAll()+"), this="+toStringAll());
          //Term prev=this.prev;
          Term current=this.current;
          if(current==null){
    //System.out.println("2");
    //System.out.println("  term="+term);
    //System.out.println("  term.in="+term.in);
             in.next=term;
             term.prev=in;
             this.current=term;
    //System.out.println("  result: "+in.toStringAll()+"\r\n");
             return term;
          }
    //System.out.println("3");
          link(current,term);
          //this.prev=current;
          this.current=term;
    //System.out.println(in.toStringAll());
    //System.out.println("current="+this.current);
    //System.out.println();
          return term;
       }
       
       protected Term replaceCurrent(Term term) throws PatternSyntaxException{
    //System.out.println("replaceCurrent("+term+"), current="+current+", current.prev="+current.prev);
          //Term prev=this.prev;
          Term prev=current.prev;
          if(prev!=null){
             Term in=this.in;
             if(prev==in){
                //in.next=term;
                //term.prev=in;
                in.next=term.in;
                term.in.prev=in;
             }
             else link(prev,term);
          }
          this.current=term;
    //System.out.println("   new current="+this.current);
          return term;
       }
    
    
       protected void newBranch() throws PatternSyntaxException{
    //System.out.println("newBranch()");
          close();
          startNewBranch();
    //System.out.println(in.toStringAll());
    //System.out.println("current="+current);
    //System.out.println();
       }
    
    
       protected void close() throws PatternSyntaxException{
    //System.out.println("close(), current="+current+", this="+toStringAll());
    //System.out.println();
    //System.out.println("close()");
    //System.out.println("current="+this.current);
    //System.out.println("prev="+this.prev);
    //System.out.println();
          /*
          Term prev=this.prev;
          if(prev!=null){
             Term current=this.current;
             if(current!=null){
                link(prev,current);
                prev=current;
                this.current=null;
             }
             link(prev,out);
             this.prev=null;
          }
          */
          Term current=this.current;
          if(current!=null) linkd(current,out);
          else in.next=out;
    //System.out.println(in.toStringAll());
    //System.out.println("current="+this.current);
    //System.out.println("prev="+this.prev);
    //System.out.println();
       }
       
       private final static void link(Term term,Term next){
          linkd(term,next.in);
          next.prev=term;
       }
       
       private final static void linkd(Term term,Term next){
    //System.out.println("linkDirectly(\""+term+"\" -> \""+next+"\")");
          Term prev_out=term.out;
          if(prev_out!=null){
    //System.out.println("   prev_out="+prev_out);
             prev_out.next=next;
          }
          Term prev_out1=term.out1;
          if(prev_out1!=null){
    //System.out.println("   prev_out1="+prev_out1);
             prev_out1.next=next;
          }
          Term prev_branch=term.branchOut;
          if(prev_branch!=null){
    //System.out.println("   prev_branch="+prev_branch);
             prev_branch.failNext=next;
          }
       }
       
       protected void startNewBranch() throws PatternSyntaxException{
    //System.out.println("newBranch()");
    //System.out.println("before startNewBranch(), this="+toStringAll());
    //System.out.println();
          Term tmp=in.next;
          Term b=new Branch();
          in.next=b;
          b.next=tmp;
          b.in=null;
          b.out=null;
          b.out1=null;
          b.branchOut=b;
          current=b;
    //System.out.println("startNewBranch(), this="+toStringAll());
    //System.out.println();
       }
    
       private final static Term makeGreedyStar(int[] vars,Term term,Vector iterators) throws PatternSyntaxException{
          //vars[STACK_SIZE]++;
          switch(term.type){
             case GROUP_IN:{
                Term b=new Branch();
                b.next=term.in;
                term.out.next=b;
                
                b.in=b;
                b.out=null;
                b.out1=null;
                b.branchOut=b;
                
                return b;
             }
             default:{
                Iterator i=new Iterator(term,0,-1,iterators);
                return i;
             }
          }
       }
    
       private final static Term makeLazyStar(int[] vars,Term term){
          //vars[STACK_SIZE]++;
          switch(term.type){
             case GROUP_IN:{
                Term b=new Branch();
                b.failNext=term.in;
                term.out.next=b;
                
                b.in=b;
                b.out=b;
                b.out1=null;
                b.branchOut=null;
                
                return b;
             }
             default:{
                Term b=new Branch();
                b.failNext=term;
                term.next=b;
                
                b.in=b;
                b.out=b;
                b.out1=null;
                b.branchOut=null;
                
                return b;
             }
          }
       }
    
       private final static Term makeGreedyPlus(int[] vars,Term term,Vector iterators) throws PatternSyntaxException{
          //vars[STACK_SIZE]++;
          switch(term.type){
             case INDEPENDENT_IN://?
             case GROUP_IN:{
    //System.out.println("makeGreedyPlus():");
    //System.out.println("   in="+term.in);
    //System.out.println("   out="+term.out);
                Term b=new Branch();
                b.next=term.in;
                term.out.next=b;
                
                b.in=term.in;
                b.out=null;
                b.out1=null;
                b.branchOut=b;
    
    //System.out.println("   returning "+b.in);
                
                return b;
             }
             default:{
                return new Iterator(term,1,-1,iterators);
             }
          }
       }
       
       private final static Term makeLazyPlus(int[] vars,Term term){
          //vars[STACK_SIZE]++;
          switch(term.type){
             case GROUP_IN:{
                Term b=new Branch();
                term.out.next=b;
                b.failNext=term.in;
                
                b.in=term.in;
                b.out=b;
                b.out1=null;
                b.branchOut=null;
                
                return b;
             }
             case REG:
             default:{
                Term b=new Branch();
                term.next=b;
                b.failNext=term;
                
                b.in=term;
                b.out=b;
                b.out1=null;
                b.branchOut=null;
                
                return b;
             }
          }
       }
    
       private final static Term makeGreedyQMark(int[] vars,Term term){
          //vars[STACK_SIZE]++;
          switch(term.type){
             case GROUP_IN:{
                Term b=new Branch();
                b.next=term.in;
                
                b.in=b;
                b.out=term.out;
                b.out1=null;
                b.branchOut=b;
                
                return b;
             }
             case REG:
             default:{
                Term b=new Branch();
                b.next=term;
                
                b.in=b;
                b.out=term;
                b.out1=null;
                b.branchOut=b;
                
                return b;
             }
          }
       }
       
       private final static Term makeLazyQMark(int[] vars,Term term){
          //vars[STACK_SIZE]++;
          switch(term.type){
             case GROUP_IN:{
                Term b=new Branch();
                b.failNext=term.in;
                
                b.in=b;
                b.out=b;
                b.out1=term.out;
                b.branchOut=null;
                
                return b;
             }
             case REG:
             default:{
                Term b=new Branch();
                b.failNext=term;
                
                b.in=b;
                b.out=b;
                b.out1=term;
                b.branchOut=null;
                
                return b;
             }
          }
       }
    
       private final static Term makeGreedyLimits(int[] vars,Term term,int[] limits,Vector iterators) throws PatternSyntaxException{
          //vars[STACK_SIZE]++;
          int m=limits[0];
          int n=limits[1];
          switch(term.type){
             case GROUP_IN:{
                int cntreg=vars[CNTREG_COUNT]++;
                Term reset=new Term(CR_SET_0);
                   reset.cntreg=cntreg;
                Term b=new Term(BRANCH);
                
                Term inc=new Term(CRSTORE_CRINC);
                   inc.cntreg=cntreg;
                
                reset.next=b;
                
                if(n>=0){
                   Term lt=new Term(CR_LT);
                      lt.cntreg=cntreg;
                      lt.maxCount=n;
                   b.next=lt;
                   lt.next=term.in;
                }
                else{
                   b.next=term.in;
                }
                term.out.next=inc;
                inc.next=b;
                
                if(m>=0){
                   Term gt=new Term(CR_GT_EQ);
                      gt.cntreg=cntreg;
                      gt.maxCount=m;
                   b.failNext=gt;
                   
                   reset.in=reset;
                   reset.out=gt;
                   reset.out1=null;
                   reset.branchOut=null;
                }
                else{
                   reset.in=reset;
                   reset.out=null;
                   reset.out1=null;
                   reset.branchOut=b;
                }
                return reset;
             }
             default:{
                return new Iterator(term,limits[0],limits[1],iterators);
             }
          }
       }
    
       private final static Term makeLazyLimits(int[] vars,Term term,int[] limits){
          //vars[STACK_SIZE]++;
          int m=limits[0];
          int n=limits[1];
          switch(term.type){
             case GROUP_IN:{
                int cntreg=vars[CNTREG_COUNT]++;
                Term reset=new Term(CR_SET_0);
                   reset.cntreg=cntreg;
                Term b=new Term(BRANCH);
                Term inc=new Term(CRSTORE_CRINC);
                   inc.cntreg=cntreg;
                   
                reset.next=b;
                
                if(n>=0){
                   Term lt=new Term(CR_LT);
                      lt.cntreg=cntreg;
                      lt.maxCount=n;
                   b.failNext=lt;
                   lt.next=term.in;
                }
                else{
                   b.failNext=term.in;
                }
                term.out.next=inc;
                inc.next=b;
                
                if(m>=0){
                   Term gt=new Term(CR_GT_EQ);
                      gt.cntreg=cntreg;
                      gt.maxCount=m;
                   b.next=gt;
                   
                   reset.in=reset;
                   reset.out=gt;
                   reset.out1=null;
                   reset.branchOut=null;
                   
                   return reset;
                }
                else{
                	  reset.in=reset;
                   reset.out=b;
                   reset.out1=null;
                   reset.branchOut=null;
                   
                   return reset;
                }
             }
             case REG:
             default:{
                Term reset=new Term(CNT_SET_0);
                Term b=new Branch(BRANCH_STORE_CNT);
                Term inc=new Term(CNT_INC);
                
                reset.next=b;
                
                if(n>=0){
                   Term lt=new Term(READ_CNT_LT);
                      lt.maxCount=n;
                   b.failNext=lt;
                   lt.next=term;
                   term.next=inc;
                   inc.next=b;
                }
                else{
                   b.next=term;
                   term.next=inc;
                   inc.next=term;
                }
                
                if(m>=0){
                   Term gt=new Term(CNT_GT_EQ);
                      gt.maxCount=m;
                   b.next=gt;
                   
                   reset.in=reset;
                   reset.out=gt;
                   reset.out1=null;
                   reset.branchOut=null;
                   
                   return reset;
                }
                else{
                   reset.in=reset;
                   reset.out=b;
                   reset.out1=null;
                   reset.branchOut=null;
                   
                   return reset;
                }
             }
          }
       }
       
       
       private final int parseTerm(char[] data, int i, int out, Term term,
                  int flags) throws PatternSyntaxException{
          char c=data[i++];
          boolean inv=false;
          switch(c){
             case '[':
                return CharacterClass.parseClass(data,i,out,term,(flags&IGNORE_CASE)>0,(flags&IGNORE_SPACES)>0,(flags&UNICODE)>0,(flags&XML_SCHEMA)>0);
                
             case '.':
                term.type=(flags&DOTALL)>0? ANY_CHAR: ANY_CHAR_NE;
                break;
                
             case '$':
                //term.type=mods[MULTILINE_IND]? LINE_END: END; //??
                term.type=(flags&MULTILINE)>0? LINE_END: END_EOL;
                break;
                
             case '^':
                term.type=(flags&MULTILINE)>0? LINE_START: START;
                break;
                
             case '\\':
                if(i>=out) throw new PatternSyntaxException("Escape without a character");
                c=data[i++];
                esc: switch(c){
                   case 'f':
                      c='\f'; // form feed
                      break;
    
                   case 'n':
                      c='\n'; // new line
                      break;
    
                   case 'r':
                      c='\r'; // carriage return
                      break;
    
                   case 't':
                      c='\t'; // tab
                      break;
                   
                   case 'u':
                      c=(char)((CharacterClass.toHexDigit(data[i++])<<12)+
                              (CharacterClass.toHexDigit(data[i++])<<8)+
                              (CharacterClass.toHexDigit(data[i++])<<4)+
                               CharacterClass.toHexDigit(data[i++]));
                      break;
                      
                   case 'v':
                      c=(char)((CharacterClass.toHexDigit(data[i++])<<24)+
                              (CharacterClass.toHexDigit(data[i++])<<16)+
                              (CharacterClass.toHexDigit(data[i++])<<12)+
                              (CharacterClass.toHexDigit(data[i++])<<8)+
                              (CharacterClass.toHexDigit(data[i++])<<4)+
                               CharacterClass.toHexDigit(data[i++]));
                      break;
                      
                   case 'x':{   // hex 2-digit number -> char
                      int hex=0;
                      char d;
    	               if((d=data[i++])=='{'){
    	                  while((d=data[i++])!='}'){
    	                     hex=(hex<<4)+CharacterClass.toHexDigit(d);
    	                     if(hex>0xffff) throw new PatternSyntaxException("\\x{}");
    	                  }
    	               }
    	               else{
                         hex=(CharacterClass.toHexDigit(d)<<4)+
                              CharacterClass.toHexDigit(data[i++]);
    	               }
                      c=(char)hex;
                      break;
                   }
                   case '0':
                   case 'o':   // oct 2- or 3-digit number -> char
                      int oct=0;
                      for(;;){
                         char d=data[i++];
                         if(d>='0' && d<='7'){
                            oct*=8;
                            oct+=d-'0';
                            if(oct>0xffff) break;
                         }
                         else break;
                      }
                      c=(char)oct;
                      break;
                      
                   case 'm':   // decimal number -> char
                      int dec=0;
                      for(;;){
                         char d=data[i++];
                         if(d>='0' && d<='9'){
                            dec*=10;
                            dec+=d-'0';
                            if(dec>0xffff) break;
                         }
                         else break;
                      }
                      c=(char)dec;
                      break;
                      
                   case 'c':   // ctrl-char
                      c=(char)(data[i++]&0x1f);
                      break;
    
                   case 'D':   // non-digit
                      inv=true;
                      // go on
                   case 'd':   // digit
                      CharacterClass.makeDigit(term,inv,(flags&UNICODE)>0);
                      return i;
    
                   case 'S':   // non-space
                      inv=true;
                      // go on
                   case 's':   // space
                      CharacterClass.makeSpace(term,inv,(flags&UNICODE)>0);
                      return i;
    
                   case 'W':   // non-letter
                      inv=true;
                      // go on
                   case 'w':   // letter
                      CharacterClass.makeWordChar(term,inv,(flags&UNICODE)>0);
                      return i;
                      
                   case 'B':   // non-(word boundary)
                      inv=true;
                      // go on
                   case 'b':   // word boundary
                      CharacterClass.makeWordBoundary(term,inv,(flags&UNICODE)>0);
                      return i;
                      
                   case '<':   // non-(word boundary)
                      CharacterClass.makeWordStart(term,(flags&UNICODE)>0);
                      return i;
                      
                   case '>':   // word boundary
                      CharacterClass.makeWordEnd(term,(flags&UNICODE)>0);
                      return i;
                      
                   case 'A':   // text beginning
                      term.type=START;
                      return i;
                      
                   case 'Z':   // text end
                      term.type=END_EOL;
                      return i;
                      
                   case 'z':   // text end
                      term.type=END;
                      return i;
                      
                   case 'G':   // end of last match
                      term.type=LAST_MATCH_END;
                      return i;
                      
                   case 'P':   // \\P{..}
                      inv=true;
                   case 'p':   // \\p{..}
                      i=CharacterClass.parseName(data,i,out,term,inv,(flags&IGNORE_SPACES)>0);
                      return i;
                      
                   default:
                      if(c>='1' && c<='9'){
                         int n=c-'0';
                         while((i='0' && c<='9'){
                            n=(n*10)+c-'0';
                            i++;
                         }
                         term.type=(flags&IGNORE_CASE)>0? REG_I: REG;
                         term.memreg=n;
                         return i;
                      }
                      /*
                      if(c<256){
                         CustomParser termp=customParsers[c];
                         if(termp!=null){
                            i=termp.parse(i,data,term);
                            return i;
                         }
                      }
                      */
                }
                term.type=CHAR;
                term.c=c;
                break;
                
             default:
                if((flags&IGNORE_CASE)==0){
                   term.type=CHAR;
                   term.c=c;
                }
                else{
                   CharacterClass.makeICase(term,c);
                }
                break;
          }
          return i;
       }
    
    
       // one of {n},{n,},{,n},{n1,n2}
       protected static final int parseLimits(int i,int end,char[] data,int[] limits) throws PatternSyntaxException{
          if(limits.length!=LIMITS_LENGTH) throw new IllegalArgumentException("maxTimess.length="+limits.length+", should be 2");
          limits[LIMITS_PARSE_RESULT_INDEX]=LIMITS_OK;
          int ind=0;
          int v=0;
          char c;
          while(i0) throw new PatternSyntaxException("illegal construction: {.. , , ..}");
                   limits[ind++]=v;
                   v=-1;
                   continue;
    
                case '}':
                   limits[ind]=v;
                   if(ind==0) limits[1]=v;
                   return i;
    
                default:
                   if(c>'9' || c<'0'){
                      //throw new PatternSyntaxException("illegal symbol in iterator: '{"+c+"}'");
                      limits[LIMITS_PARSE_RESULT_INDEX]=LIMITS_FAILURE;
                      return i;
                   }
                   if(v<0) v=0;
                   v= v*10 + (c-'0');
             }
          }
          throw new PatternSyntaxException("malformed quantifier");
       }
       
       public String toString(){
          StringBuffer b=new StringBuffer(100);
          b.append(instanceNum);
          b.append(": ");
          if(inverse) b.append('^');
          switch(type){
             case VOID:
                b.append("[]");
                b.append(" , ");
                break;
             case CHAR:
                b.append(CharacterClass.stringValue(c));
                b.append(" , ");
                break;
             case ANY_CHAR:
                b.append("dotall, ");
                break;
             case ANY_CHAR_NE:
                b.append("dot-eols, ");
                break;
             case BITSET:
                b.append('[');
                b.append(CharacterClass.stringValue0(bitset));
                b.append(']');
                b.append(" , weight=");
                b.append(weight);
                b.append(" , ");
                break;
             case BITSET2:
                b.append('[');
                b.append(CharacterClass.stringValue2(bitset2));
                b.append(']');
                b.append(" , weight=");
                b.append(weight);
                b.append(" , ");
                break;
             case START:
                b.append("abs.start");
                break;            
             case END:
                b.append("abs.end");
                break;            
             case END_EOL:
                b.append("abs.end-eol");
                break;            
             case LINE_START:
                b.append("line start");
                break;            
             case LINE_END:
                b.append("line end");
                break;            
             case LAST_MATCH_END:
                if(inverse)b.append("non-");
                b.append("BOUNDARY");
                break;            
             case BOUNDARY:
                if(inverse)b.append("non-");
                b.append("BOUNDARY");
                break;            
             case UBOUNDARY:
                if(inverse)b.append("non-");
                b.append("UBOUNDARY");
                break;            
             case DIRECTION:
                b.append("DIRECTION");
                break;            
             case UDIRECTION:
                b.append("UDIRECTION");
                break;            
             case FIND:
                b.append(">>>{");
                b.append(target);
                b.append("}, <<");
                b.append(distance);
                if(eat){
                   b.append(",eat");
                }
                b.append(", ");
                break;            
             case REPEAT_0_INF:
                b.append("rpt{");
                b.append(target);
                b.append(",0,inf}");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;            
             case REPEAT_MIN_INF:
                b.append("rpt{");
                b.append(target);
                b.append(",");
                b.append(minCount);
                b.append(",inf}");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;            
             case REPEAT_MIN_MAX:
                b.append("rpt{");
                b.append(target);
                b.append(",");
                b.append(minCount);
                b.append(",");
                b.append(maxCount);
                b.append("}");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;            
             case REPEAT_REG_MIN_INF:
                b.append("rpt{$");
                b.append(memreg);
                b.append(',');
                b.append(minCount);
                b.append(",inf}");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;            
             case REPEAT_REG_MIN_MAX:
                b.append("rpt{$");
                b.append(memreg);
                b.append(',');
                b.append(minCount);
                b.append(',');
                b.append(maxCount);
                b.append("}");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;            
             case BACKTRACK_0:
                b.append("back(0)");
                break;            
             case BACKTRACK_MIN:
                b.append("back(");
                b.append(minCount);
                b.append(")");
                break;            
             case BACKTRACK_REG_MIN:
                b.append("back");
                b.append("_$");
                b.append(memreg);
                b.append("(");
                b.append(minCount);
                b.append(")");
                break;            
             case GROUP_IN:
                b.append('(');
                if(memreg>0)b.append(memreg);
                b.append('-');
                b.append(" , ");
                break;
             case GROUP_OUT:
                b.append('-');
                if(memreg>0)b.append(memreg);
                b.append(')');
                b.append(" , ");
                break;
             case PLOOKAHEAD_IN:
                b.append('(');
                b.append("=");
                b.append(lookaheadId);
                b.append(" , ");
                break;
             case PLOOKAHEAD_OUT:
                b.append('=');
                b.append(lookaheadId);
                b.append(')');
                b.append(" , ");
                break;
             case NLOOKAHEAD_IN:
                b.append("(!");
                b.append(lookaheadId);
                b.append(" , ");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;
             case NLOOKAHEAD_OUT:
                b.append('!');
                b.append(lookaheadId);
                b.append(')');
                b.append(" , ");
                break;
             case PLOOKBEHIND_IN:
                b.append('(');
                b.append("<=");
                b.append(lookaheadId);
                b.append(" , dist=");
                b.append(distance);
                b.append(" , ");
                break;
             case PLOOKBEHIND_OUT:
                b.append("<=");
                b.append(lookaheadId);
                b.append(')');
                b.append(" , ");
                break;
             case NLOOKBEHIND_IN:
                b.append("(");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;
             case NLOOKBEHIND_OUT:
                b.append("");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;
             case LOOKAHEAD_CONDITION_IN:
                b.append("(cond");
                b.append(lookaheadId);
                b.append(((Lookahead)this).isPositive? '=': '!');
                b.append(" , ");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;
             case LOOKAHEAD_CONDITION_OUT:
                b.append("cond");
                b.append(lookaheadId);
                b.append(")");
                if(failNext!=null){
                   b.append(", =>");
                   b.append(failNext.instanceNum);
                   b.append(", ");
                }
                break;
             case REG:
                b.append("$");
                b.append(memreg);
                b.append(", ");
                break;
             case SUCCESS:
                b.append("END");
                break;
             case BRANCH_STORE_CNT_AUX1:
                b.append("(aux1)");
             case BRANCH_STORE_CNT:
                b.append("(cnt)");
             case BRANCH:
                b.append("=>");
                if(failNext!=null) b.append(failNext.instanceNum);
                else b.append("null");
                b.append(" , ");
                break;
             default:
                b.append('[');
                switch(type){
                   case CNT_SET_0:
                      b.append("cnt=0");
                      break;
                   case CNT_INC:
                      b.append("cnt++");
                      break;
                   case CNT_GT_EQ:
                      b.append("cnt>="+maxCount);
                      break;
                   case READ_CNT_LT:
                      b.append("->cnt<"+maxCount);
                      break;
                   case CRSTORE_CRINC:
                      b.append("M("+memreg+")->,Cr("+cntreg+")->,Cr("+cntreg+")++");
                      break;
                   case CR_SET_0:
                      b.append("Cr("+cntreg+")=0");
                      break;
                   case CR_LT:
                      b.append("Cr("+cntreg+")<"+maxCount);
                      break;
                   case CR_GT_EQ:
                      b.append("Cr("+cntreg+")>="+maxCount);
                      break;
                   default:
                      b.append("unknown type: "+type);
                }
                b.append("] , ");
          }
          if(next!=null){
             b.append("->");
             b.append(next.instanceNum);
             b.append(", ");
          }
          //b.append("\r\n");
          return b.toString();
       }
       
       public String toStringAll(){
          return toStringAll(new Vector());
       }
       
       public String toStringAll(Vector v){
          v.addElement(new Integer(instanceNum));
          String s=toString();
          if(next!=null){
             if(!v.contains(new Integer(next.instanceNum))){
                s+="\r\n";
                s+=next.toStringAll(v);
             }
          }
          if(failNext!=null){
             if(!v.contains(new Integer(failNext.instanceNum))){
                s+="\r\n";
                s+=failNext.toStringAll(v);
             }
          }
          return s;
       }
    }
    
    class Pretokenizer{
       private static final int START=1;
       static final int END=2;
       static final int PLAIN_GROUP=3;
       static final int POS_LOOKAHEAD=4;
       static final int NEG_LOOKAHEAD=5;
       static final int POS_LOOKBEHIND=6;
       static final int NEG_LOOKBEHIND=7;
       static final int INDEPENDENT_REGEX=8;
       static final int COMMENT=9;
       static final int CONDITIONAL_GROUP=10;
       static final int FLAGS=11;
       static final int CLASS_GROUP=12;
       static final int NAMED_GROUP=13;
       
       int tOffset,tOutside,skip;
       int offset,end;
       int c;
       
       int ttype=START;
       
       char[] data;
       
       //results
       private int flags;
       private boolean flagsChanged;
       
       char[] brackets;
       String groupName;
       boolean groupDeclared;
       
       Pretokenizer(char[] data,int offset,int end){
          if(offset<0 || end>data.length) throw new IndexOutOfBoundsException("offset="+offset+", end="+end+", length="+data.length);
          this.offset=offset;
          this.end=end;
    
          this.tOffset=offset;
          this.tOutside=offset;
    
          this.data=data;
       }
       
       int flags(int def){
          return flagsChanged? flags: def;
       }
       
       void next() throws PatternSyntaxException{
          int tOffset=this.tOutside;
          int skip=this.skip;
          
          tOffset+=skip;
          flagsChanged=false;
          
          int end=this.end; 
          char[] data=this.data; 
          boolean esc=false;
          for(int i=tOffset;i':
                  	     	  ttype=INDEPENDENT_REGEX;
                  	     	  skip=3;  // "(?>"
                  	     	  break;
                  	     case '#':
                  	     	  ttype=COMMENT;
                  	     	  skip=3; // ="(?#".length, the makeTree() skips the rest by itself
                  	     	  break;
                  	     case '(':
                  	     	  ttype=CONDITIONAL_GROUP;
                  	     	  skip=2; //"(?"+"(..." - skip "(?" (2 chars) and parse condition as a group
                  	     	  break;
                  	     case '[':
                  	     	  ttype=CLASS_GROUP;
                  	     	  skip=2; // "(?"+"[..]+...-...&...)" - skip 2 chars and parse a class group
                  	     	  break;
                  	     default:
                  	        int mOff,mLen;
                  	        mLoop:
                  	        for(int p=i+2;p0){
                                      flags=Pattern.parseFlags(data,mOff,mLen);
                                      flagsChanged=true;
                  	                 }
                  	                 ttype=PLAIN_GROUP;
                  	                 skip=mLen+3; // "(?imsx:" mLen=4; skip= "(?".len + ":".len + mLen = 2+1+4=7
                  	                 break mLoop;
                  	              case ')':
                  	                 flags=Pattern.parseFlags(data,mOff=(i+2),mLen=(p-mOff));
                  	                 flagsChanged=true;
                  	                 ttype=FLAGS;
                  	                 skip=mLen+3; // "(?imsx)" mLen=4, skip="(?".len+")".len+mLen=2+1+4=7
                  	                 break mLoop;
                  	              default:
                  	                 throw new PatternSyntaxException("wrong char after \"(?\": "+c2);
                  	           }
                  	        }
                  	        break;
                  	  }
                  }
                  else if(((i+2)=0){
             if(distance!=pd) throw new PatternSyntaxException("non-equal branch lengths within a lookbehind assertion");
          }
          super.close();
       }
    }
    
    class Iterator extends Term{
       
       Iterator(Term term,int min,int max,Vector collection) throws PatternSyntaxException{
          collection.addElement(this);
          switch(term.type){
             case CHAR:
             case ANY_CHAR:
             case ANY_CHAR_NE:
             case BITSET:
             case BITSET2:{
                target=term;
                Term back=new Term();
                if(min<=0 && max<0){
                   type=REPEAT_0_INF;
                   back.type=BACKTRACK_0;
                }
                else if(min>0 && max<0){
                   type=REPEAT_MIN_INF;
                   back.type=BACKTRACK_MIN;
                   minCount=back.minCount=min;
                }
                else{
                   type=REPEAT_MIN_MAX;
                   back.type=BACKTRACK_MIN;
                   minCount=back.minCount=min;
                   maxCount=max;
                }
                
                failNext=back;
                
                in=this;
                out=this;
                out1=back;
                branchOut=null;   
                return;
             }
             case REG:{
                target=term;
                memreg=term.memreg;
                Term back=new Term();
                if(max<0){
                   type=REPEAT_REG_MIN_INF;
                   back.type=BACKTRACK_REG_MIN;
                   minCount=back.minCount=min;
                }
                else{
                   type=REPEAT_REG_MIN_MAX;
                   back.type=BACKTRACK_REG_MIN;
                   minCount=back.minCount=min;
                   maxCount=max;
                }
                
                failNext=back;
                
                in=this;
                out=this;
                out1=back;
                branchOut=null;   
                return; 
             }
             default:
                throw new PatternSyntaxException("can't iterate this type: "+term.type);
          }
       }
       
       void optimize(){
    //System.out.println("optimizing myself: "+this);
    //BACKTRACK_MIN_REG_FIND
          Term back=failNext;
          Optimizer opt=Optimizer.find(back.next);
          if(opt==null) return;
          failNext=opt.makeBacktrack(back);
       }
    }jregex/META-INF/0000755000175000017500000000000007503220226013374 5ustar  andriusandriusjregex/META-INF/MANIFEST.MF0000644000175000017500000000010707503220226015024 0ustar  andriusandriusManifest-Version: 1.0
    Created-By: 1.3.0_02 (Sun Microsystems Inc.)